# An API explainer notebook

We use this notebook to explain APIs. Let's say we want to download the Rembrandt page https://en.wikipedia.org/wiki/Rembrandt

In [20]:
#html
import urllib.request
url = 'https://en.wikipedia.org/wiki/Baldur%27s_Gate_3'
response = urllib.request.urlopen(url)
data = response.read()      
text = data.decode('utf-8')
print(text)

<!DOCTYPE html>
<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-custom-font-size-clientpref-1 vector-feature-appearance-enabled vector-feature-appearance-pinned-clientpref-1 vector-feature-night-mode-enabled skin-theme-clientpref-day vector-toc-available" lang="en" dir="ltr">
<head>
<meta charset="UTF-8">
<title>Baldur's Gate 3 - Wikipedia</title>
<script>(function(){var className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-l

Now, use the API to get the page in a format we like

https://en.wikipedia.org/w/api.php?action=query&titles=Rembrandt&prop=revisions&rvprop=content&format=json

In [22]:
# Wiki source 
baseurl =  "https://en.wikipedia.org/w/api.php?"
action = "action=query"
title = "titles=Baldur%27s_Gate_3"
content = "prop=revisions&rvprop=content"
dataformat ="format=json"

query = "{}{}&{}&{}&{}".format(baseurl, action, content, title, dataformat)
print(query)

https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&titles=Baldur%27s_Gate_3&format=json


#### What is the the difference between the html page and the wiki-source?
Compare with the html, the wiki-source will be much simpler and readable however with less functions

##### What are the various parameters you can set for a query of the wikipedia api?
- **`title`**: The title you want to query.
- **`content`**: The parameter is used to retrieve the actual text content of a Wikipedia page as it existed at the time of the revision.
- **`dataformat`**: The format of the data.


In [23]:
wikiresponse = urllib.request.urlopen(query)
wikidata = wikiresponse.read()
wikitext = wikidata.decode('utf-8')

In [24]:
import json
wikijson = json.loads(wikitext)
print(wikijson)



In [16]:
print(wikijson['query'])

{'pages': {'4254144': {'pageid': 4254144, 'ns': 0, 'title': 'Rembrandt', 'revisions': [{'contentformat': 'text/x-wiki', 'contentmodel': 'wikitext', '*': '{{Short description|Dutch painter and printmaker (1606–1669)}}\n{{About|the Dutch artist}}\n{{family name hatnote|[[Van Rijn]]|Rijn|lang=Dutch}}\n{{Good article}}\n{{Use dmy dates|date=May 2023}}\n{{Infobox artist\n| name          = Rembrandt\n| image         = Rembrandt van Rijn - Self-Portrait - Google Art Project.jpg\n| caption       = \'\'[[Self-Portrait with Beret and Turned-Up Collar]]\'\' (1659)\n| birth_name    = Rembrandt Harmenszoon van Rijn\n| birth_date    = {{Birth date|df=y|1606|7|15}}<ref name="BY">Or possibly 1607 as on 10 June 1634 he himself claimed to be 26 years old. See [http://www.codart.nl/news/82/ Is the Rembrandt Year being celebrated one year too soon? One year too late?] {{Webarchive|url=https://web.archive.org/web/20101121211856/http://codart.nl/news/82/ |date=21 November 2010 }} and {{in lang|nl}} J. de Jo

In [35]:
def wikicatugh(title):
    baseurl =  "https://en.wikipedia.org/w/api.php?"
    action = "action=query"
    title = urllib.parse.urlencode({'titles': title})
    content = "prop=revisions&rvprop=content"
    dataformat ="format=json"
    query = "{}{}&{}&{}&{}".format(baseurl, action, content, title, dataformat)
    wikiresponse = urllib.request.urlopen(query)
    wikidata = wikiresponse.read()
    wikitext = wikidata.decode('utf-8')
    wikijson = json.loads(wikitext)
    print(wikijson)

In [36]:
wikicatugh(title="Baldur%27s_Gate_3")

{'batchcomplete': '', 'query': {'pages': {'-1': {'title': 'Baldur%27s_Gate_3', 'invalidreason': 'The requested page title contains invalid characters: "%27".', 'invalid': ''}}}}
