# Wikipedia API - 'prop'
---
**Goal:** explore the results returned by using different values of the 'prop' parameter and find the values most useful for this project.
  
**Notes:** 
- there are over thirty values for the 'prop' parameter in the calls to wikipedia API
- documentation for the API parameters (including 'prop' values) is at https://www.mediawiki.org/wiki/API:Query
- findings from this notebook are summarized and used in [TADS_wikipedia_tdih_sup_01_wikipedia_api_notes_07feb21]()
- reminder that the page data needed for this project is:
    - page size
    - page views
    - page incoming links
    - coordinates (if any)
    - page image_url and image_file
    - page first paragraph (for short description) and first paragraphs (for long description)
    - page score (this was not part of the original project; I discovered it while exploring the API)
    - other data that was not part of the original project
        - wikibase_short_description
        - wiki_base_item
        - wiki_description (similar but not identical to wikibase_short_description; need to decide which one to keep)
- Section 1 of this notebook shows the 'prop' values most useful for this project:
- Section 2 of the notebook shows results from trying out different 'prop' values (in the order they are listed on the Wikipedia API documentation page)

In [1]:
import requests
import config # file with headers data for privacy purposes

from bs4 import BeautifulSoup

In [2]:
URL = 'https://en.wikipedia.org/w/api.php/'
HEADERS = config.HEADERS

PAGE_TITLE = 'Francis_I_of_France' # test requests for single title
PAGE_TITLES = 'Francis_I_of_France|Albert_Einstein|Nika_riots' # test requests for multiple titles

## Section 1
---
Building up the API request with 'prop' values useful for this project:
- cirrusbuilddoc
- cirruscompsuggestbuilddoc
- extracts
- pageimages
- pageprops
- pageterms
- pageviews

In [3]:
params = {'titles': "Queen Anne's War|Misha: A Mémoire of the Holocaust Years|Çağdaş Atan|%C173465something",
          'action': 'query',
          'format': 'json',
          'prop': 'cirrusbuilddoc|cirruscompsuggestbuilddoc|extracts|pageimages|pageprops|pageterms|pageviews',
          'piprop': 'original',
          'exintro': True # gives the first paragraphs of a page (before detail sections)
         }

In [4]:
r = requests.get(url = URL, headers = HEADERS, params = params)
r.status_code

200

In [5]:
j = r.json()
j

{'batchcomplete': '',
 'query': {'pages': {'-1': {'title': '%C173465something',
    'invalidreason': 'The requested page title contains invalid characters: "%C1".',
    'invalid': ''},
   '16030743': {'pageid': 16030743,
    'ns': 0,
    'title': 'Misha: A Mémoire of the Holocaust Years',
    'cirrusbuilddoc': {'version': 996300337,
     'wiki': 'enwiki',
     'namespace': 0,
     'namespace_text': '',
     'title': 'Misha: A Mémoire of the Holocaust Years',
     'timestamp': '2020-12-25T19:09:35Z',
     'create_timestamp': '2008-03-01T03:13:01Z',
     'redirect': [{'namespace': 0,
       'title': 'Misha: A Memoire of the Holocaust Years'},
      {'namespace': 0, 'title': 'Misha: A Memoir of the Holocaust Years'},
      {'namespace': 0, 'title': 'Misha: A Memoir of the Holocaust'},
      {'namespace': 0, 'title': 'Jane Daniel'},
      {'namespace': 0, 'title': 'Survivre Avec les Loups'},
      {'namespace': 0, 'title': 'Survivre avec les loups'}],
     'incoming_links': 71,
     'categ

In [6]:
print(j['query'].keys())
print(j['query']['pages'].keys())
print(j['query']['pages']['16030743'].keys())

dict_keys(['pages'])
dict_keys(['-1', '16030743', '119954', '5874901'])
dict_keys(['pageid', 'ns', 'title', 'cirrusbuilddoc', 'cirruscompsuggestbuilddoc', 'extract', 'pageprops', 'terms', 'pageviews'])


In [10]:
for key in j['query']['pages']:
    print(j['query']['pages'][key]['title'])

Misha: A Mémoire of the Holocaust Years
Queen Anne's War
Çağdaş Atan


In [7]:
t1 = "Abel Tasman|Christopher Columbus|March 1504 lunar eclipse|Native American (U.S.)|Queen Anne's War"


In [8]:
params = {'titles': t1,
          'action': 'query',
          'format': 'json',
          'prop': 'cirrusbuilddoc|cirruscompsuggestbuilddoc|extracts|pageimages|pageprops|pageterms|pageviews',
          'piprop': 'original',
          'exintro': True # gives the first paragraphs of a page (before detail sections)
         }
r1 = requests.get(url = URL, headers = HEADERS, params = params)
r1.status_code

200

In [9]:
j1 = r1.json()
j1

{'batchcomplete': '',
 'query': {'pages': {'1988': {'pageid': 1988,
    'ns': 0,
    'title': 'Abel Tasman',
    'cirrusbuilddoc': {'version': 1005461400,
     'wiki': 'enwiki',
     'namespace': 0,
     'namespace_text': '',
     'title': 'Abel Tasman',
     'timestamp': '2021-02-07T19:52:48Z',
     'create_timestamp': '2001-09-24T12:29:13Z',
     'redirect': [{'namespace': 0, 'title': 'Abel Janszoon Tasman'},
      {'namespace': 0, 'title': 'Abel Janzoon Tasman'},
      {'namespace': 0, 'title': 'Abel Jansz Tasman'},
      {'namespace': 0, 'title': 'Able tasman'},
      {'namespace': 0, 'title': 'Abel tasman'},
      {'namespace': 0, 'title': 'Abel Janszon Tasman'},
      {'namespace': 0, 'title': 'List of things named after Abel Tasman'}],
     'incoming_links': 711,
     'category': ['All articles with dead external links',
      'Articles with dead external links from February 2021',
      'Articles with permanently dead external links',
      'CS1 Spanish-language sources (es)',


In [13]:
d1 = j1['query']['pages']
print(d1.keys())
print(d1['2735616'].keys())
d1['2735616']

dict_keys(['1988', '5635', '25589194', '2735616', '119954'])
dict_keys(['pageid', 'ns', 'title', 'cirrusbuilddoc', 'cirruscompsuggestbuilddoc', 'extract', 'pageviews'])


{'pageid': 2735616,
 'ns': 0,
 'title': 'Native American (U.S.)',
 'cirrusbuilddoc': {'version': 235540234,
  'wiki': 'enwiki',
  'namespace': 0,
  'namespace_text': '',
  'title': 'Native American (U.S.)',
  'timestamp': '2008-09-01T07:04:23Z',
  'create_timestamp': '2005-09-23T03:17:14Z',
  'redirect': [],
  'incoming_links': 66,
  'category': [],
  'external_link': [],
  'outgoing_link': ['Native_Americans_in_the_United_States'],
  'template': [],
  'text': 'Redirect to: Native Americans in the United States',
  'source_text': '#redirect [[Native Americans in the United States]]',
  'text_bytes': 51,
  'content_model': 'wikitext',
  'coordinates': [],
  'language': 'en',
  'heading': [],
  'opening_text': None,
  'auxiliary_text': [],
  'display_title': None},
 'cirruscompsuggestbuilddoc': {'21217t': {'batch_id': 1613683780,
   'source_doc_id': '21217',
   'target_title': {'title': 'Native Americans in the United States',
    'namespace': 0},
   'suggest': {'input': ['Native America

In [15]:
d1['2735616']['cirruscompsuggestbuilddoc']

{'21217t': {'batch_id': 1613683780,
  'source_doc_id': '21217',
  'target_title': {'title': 'Native Americans in the United States',
   'namespace': 0},
  'suggest': {'input': ['Native Americans in the United States',
    'Native Americans of the United States',
    'Native American in the United States',
    'Native Americans in the United State',
    'Native Americans in the Unietd States',
    'Native americans in the united states'],
   'weight': 5299065},
  'suggest-stop': {'input': ['Native Americans in the United States',
    'Native Americans of the United States',
    'Native American in the United States',
    'Native Americans in the United State',
    'Native Americans in the Unietd States',
    'Native americans in the united states'],
   'weight': 5299065},
  'score_explanation': {'value': 5299065,
   'description': 'Convert to an integer score: 0.52990659307751 * 10000000',
   'details': [{'value': 0.5299065930775079,
     'description': 'Weighted sum of doc quality scor

In [16]:
d2 = j1['query']['pages']
print(d2.keys())
print(d2['1988'].keys())
d2['1988']

dict_keys(['1988', '5635', '25589194', '2735616', '119954'])
dict_keys(['pageid', 'ns', 'title', 'cirrusbuilddoc', 'cirruscompsuggestbuilddoc', 'extract', 'original', 'pageprops', 'terms', 'pageviews'])


{'pageid': 1988,
 'ns': 0,
 'title': 'Abel Tasman',
 'cirrusbuilddoc': {'version': 1005461400,
  'wiki': 'enwiki',
  'namespace': 0,
  'namespace_text': '',
  'title': 'Abel Tasman',
  'timestamp': '2021-02-07T19:52:48Z',
  'create_timestamp': '2001-09-24T12:29:13Z',
  'redirect': [{'namespace': 0, 'title': 'Abel Janszoon Tasman'},
   {'namespace': 0, 'title': 'Abel Janzoon Tasman'},
   {'namespace': 0, 'title': 'Abel Jansz Tasman'},
   {'namespace': 0, 'title': 'Able tasman'},
   {'namespace': 0, 'title': 'Abel tasman'},
   {'namespace': 0, 'title': 'Abel Janszon Tasman'},
   {'namespace': 0, 'title': 'List of things named after Abel Tasman'}],
  'incoming_links': 711,
  'category': ['All articles with dead external links',
   'Articles with dead external links from February 2021',
   'Articles with permanently dead external links',
   'CS1 Spanish-language sources (es)',
   'Articles with short description',
   'Short description matches Wikidata',
   'Use dmy dates from September 20

In [None]:
page_dict['cirruscompsuggestbuilddoc'][f'{page_id}t']['score_explanation']

In [13]:
j['query']['pages']['119954']['pageprops'].keys()

dict_keys(['page_image_free', 'wikibase-badge-Q17437798', 'wikibase-shortdesc', 'wikibase_item'])

In [3]:
# cirrusbuilddoc for text_bytes (aka page_size), incoming_link, coordinates (if any)
# cirruscompsuggestbuilddoc for score
# extracts (+ exintro) for first_paragraph(s) (regex to clean up)
# pageimages for pageimage (with piprop='original') for full size image (but doesn't return image file anymore)
# pageprops for page_image_free, wikibase_short_description, wikibase_item
# pageterms for description (similar but not identital to wikibase_short_description)
# pageviews
params = {'titles': PAGE_TITLE,
          'action': 'query',
          'format': 'json',
          'prop': 'cirrusbuilddoc|cirruscompsuggestbuilddoc|extracts|pageimages|pageprops|pageterms|pageviews',
          'piprop': 'original',
          'exintro': True # gives the first paragraphs of a page (before detail sections)
         }

r00 = requests.get(url = URL, headers = HEADERS, params = params)
r00.status_code

200

In [4]:
j00 = r00.json()
j00

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'cirrusbuilddoc': {'version': 1005165190,
     'wiki': 'enwiki',
     'namespace': 0,
     'namespace_text': '',
     'title': 'Francis I of France',
     'timestamp': '2021-02-06T08:25:34Z',
     'create_timestamp': '2002-04-21T20:28:56Z',
     'redirect': [{'namespace': 0, 'title': 'François I of France'},
      {'namespace': 0, 'title': 'Francis I of france'},
      {'namespace': 0, 'title': 'Francis I, King of France'},
      {'namespace': 0, 'title': 'Francois I of France'},
      {'namespace': 0, 'title': 'François I'},
      {'namespace': 0, 'title': 'Francois I'},
      {'namespace': 0, 'title': 'King François I'},
      {'namespace': 0, 'title': 'François Ier'},
      {'namespace': 0, 'title': 'King of France François I'},
      {'namespace': 0, 'title': 'François 1er'},
  

In [5]:
print(j00.keys())
print(j00['query'].keys())
print(j00['query']['pages'].keys())
print(j00['query']['pages']['50012'].keys())
print(j00['query']['pages']['50012']['title'])
print(j00['query']['pages']['50012']['cirrusbuilddoc'].keys())
print(j00['query']['pages']['50012']['cirruscompsuggestbuilddoc'].keys())
print(j00['query']['pages']['50012']['extract'])
print(j00['query']['pages']['50012']['original'].keys())
print(j00['query']['pages']['50012']['pageprops'].keys())
print(j00['query']['pages']['50012']['terms'].keys())
print(j00['query']['pages']['50012']['pageviews'].keys())

dict_keys(['normalized', 'pages'])
dict_keys(['50012'])
dict_keys(['pageid', 'ns', 'title', 'cirrusbuilddoc', 'cirruscompsuggestbuilddoc', 'extract', 'original', 'pageprops', 'terms', 'pageviews'])
Francis I of France
dict_keys(['version', 'wiki', 'namespace', 'namespace_text', 'title', 'timestamp', 'create_timestamp', 'redirect', 'incoming_links', 'category', 'external_link', 'outgoing_link', 'template', 'text', 'source_text', 'text_bytes', 'content_model', 'coordinates', 'wikibase_item', 'language', 'heading', 'opening_text', 'auxiliary_text', 'defaultsort', 'display_title'])
dict_keys(['50012t', '50012r'])
<p class="mw-empty-elt">
</p>

<p><b>Francis I</b> (French: <i lang="fr">François I<sup>er</sup></i>; Middle French: <i lang="frm">Francoys</i>; 12 September 1494 – 31 March 1547) was King of France from 1515 until his death in 1547. He was the son of Charles, Count of Angoulême, and Louise of Savoy. He succeeded his first cousin once removed Louis XII, who died without a son.
</p

In [6]:
# API json response is a nested dictionary
# page data follows the path response.json()['query']['pages'][page_id]
page_id = list(j00['query']['pages'].keys())[0]
page_id

'50012'

In [7]:
page_dict = j00['query']['pages'][page_id]
page_dict

{'pageid': 50012,
 'ns': 0,
 'title': 'Francis I of France',
 'cirrusbuilddoc': {'version': 1005165190,
  'wiki': 'enwiki',
  'namespace': 0,
  'namespace_text': '',
  'title': 'Francis I of France',
  'timestamp': '2021-02-06T08:25:34Z',
  'create_timestamp': '2002-04-21T20:28:56Z',
  'redirect': [{'namespace': 0, 'title': 'François I of France'},
   {'namespace': 0, 'title': 'Francis I of france'},
   {'namespace': 0, 'title': 'Francis I, King of France'},
   {'namespace': 0, 'title': 'Francois I of France'},
   {'namespace': 0, 'title': 'François I'},
   {'namespace': 0, 'title': 'Francois I'},
   {'namespace': 0, 'title': 'King François I'},
   {'namespace': 0, 'title': 'François Ier'},
   {'namespace': 0, 'title': 'King of France François I'},
   {'namespace': 0, 'title': 'François 1er'},
   {'namespace': 0, 'title': 'Francois 1er'},
   {'namespace': 0, 'title': 'King Francois I'},
   {'namespace': 0, 'title': 'Francis 1 of France'},
   {'namespace': 0, 'title': 'Francis i of fran

In [8]:
# page data I need
data_dict = {'page_size': page_dict['cirrusbuilddoc']['text_bytes'],
             'incoming_links': page_dict['cirrusbuilddoc']['incoming_links'],
             'coordinates': page_dict['cirrusbuilddoc']['coordinates'],
             'page_score': page_dict['cirruscompsuggestbuilddoc'][f'{page_id}t']['score_explanation']['value'],
             'first_paragraphs': BeautifulSoup(page_dict['extract']).text.strip(),
             'image_url': page_dict['original']['source'],
             'image_file': page_dict['pageprops']['page_image_free'],
             'wikibase_shortdesc': page_dict['pageprops']['wikibase-shortdesc'],
             'wikibase_item': page_dict['pageprops']['wikibase_item'],
             'wiki_desc': page_dict['terms']['description'],
             'page_views': page_dict['pageviews']}

In [9]:
for k, v in data_dict.items():
    print(k, ': ', v, '\n')

page_size :  52547 

incoming_links :  2187 

coordinates :  [] 

page_score :  4335947 

first_paragraphs :  Francis I (French: François Ier; Middle French: Francoys; 12 September 1494 – 31 March 1547) was King of France from 1515 until his death in 1547. He was the son of Charles, Count of Angoulême, and Louise of Savoy. He succeeded his first cousin once removed Louis XII, who died without a son.
A prodigious patron of the arts, he promoted the emergent French Renaissance by attracting many Italian artists to work for him, including Leonardo da Vinci, who brought the Mona Lisa with him, which Francis had acquired. Francis' reign saw important cultural changes with the growth of central power in France, the spread of humanism and Protestantism, and the beginning of French exploration of the New World. Jacques Cartier and others claimed lands in the Americas for France and paved the way for the expansion of the first French colonial empire.
For his role in the development and promotio

In [10]:
# test request for multiple page_titles
params = {'titles': PAGE_TITLES,
          'action': 'query',
          'format': 'json',
          'prop': 'cirrusbuilddoc|cirruscompsuggestbuilddoc|extracts|pageimages|pageprops|pageterms|pageviews',
          'piprop': 'original',
          'exintro': True # gives the first paragraphs of a page (before detail sections)
         }

r00_1 = requests.get(url = URL, headers = HEADERS, params = params)
r00_1.status_code

200

In [11]:
r00_1.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'},
   {'from': 'Albert_Einstein', 'to': 'Albert Einstein'},
   {'from': 'Nika_riots', 'to': 'Nika riots'}],
  'pages': {'736': {'pageid': 736,
    'ns': 0,
    'title': 'Albert Einstein',
    'cirrusbuilddoc': {'version': 1005821753,
     'wiki': 'enwiki',
     'namespace': 0,
     'namespace_text': '',
     'title': 'Albert Einstein',
     'timestamp': '2021-02-09T15:57:44Z',
     'create_timestamp': '2001-11-05T18:26:16Z',
     'redirect': [{'namespace': 0, 'title': 'Einstein'},
      {'namespace': 0, 'title': 'Albert Eienstein'},
      {'namespace': 0, 'title': 'Albert Einstien'},
      {'namespace': 0, 'title': 'Albert einstein'},
      {'namespace': 0, 'title': 'Einstien'},
      {'namespace': 0, 'title': 'Einsteinian'},
      {'namespace': 0, 'title': 'Einsetein'},
      {'namespace': 0, 'title': 'Albert Enstein'},
      {'namespace': 0, 'title': "Albert Einstein's"},
  

In [12]:
print(r00_1.json().keys())
print(r00_1.json()['query'].keys())
print(r00_1.json()['query']['pages'].keys())

dict_keys(['normalized', 'pages'])
dict_keys(['736', '50012', '251783'])


In [13]:
j00_1 = r00_1.json()
page_dicts = []
for page_id in j00_1['query']['pages'].keys():
    page_dict_1 = j00_1['query']['pages'][page_id]
    data_dict_1 = {'page_size': page_dict_1['cirrusbuilddoc']['text_bytes'],
                 'incoming_links': page_dict_1['cirrusbuilddoc']['incoming_links'],
                 'coordinates': page_dict_1['cirrusbuilddoc']['coordinates'],
                 'page_score': page_dict_1['cirruscompsuggestbuilddoc'][f'{page_id}t']['score_explanation']['value'],
                 'first_paragraphs': BeautifulSoup(page_dict_1['extract']).text.strip(),
                 'image_url': page_dict_1['original']['source'],
                 'image_file': page_dict_1['pageprops']['page_image_free'],
                 'wikibase_shortdesc': page_dict_1['pageprops']['wikibase-shortdesc'],
                 'wikibase_item': page_dict_1['pageprops']['wikibase_item'],
                 'wiki_desc': page_dict_1['terms']['description'],
                 'page_views': page_dict_1['pageviews']}
    page_dicts.append(data_dict_1)
print(len(page_dicts))

3


In [14]:
page_dicts

[{'page_size': 177860,
  'incoming_links': 10610,
  'coordinates': [],
  'page_score': 5269369,
  'first_paragraphs': 'Albert Einstein ( EYEN-styne; German: [ˈalbɛʁt ˈʔaɪnʃtaɪn] (listen); 14 March 1879\xa0– 18 April 1955) was a German-born theoretical physicist who developed the theory of relativity, one of the two pillars of modern physics (alongside quantum mechanics). His work is also known for its influence on the philosophy of science. He is best known to the general public for his mass–energy equivalence formula E = mc2, which has been dubbed "the world\'s most famous equation". He received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect", a pivotal step in the development of quantum theory.\nThe son of a salesman who later operated an electrochemical factory, Einstein was born in the German Empire, but moved to Switzerland in 1895, forsaking his German citizenship the following year

## Section 2 
---
Trying out different prop values in the order they are listed on the Wikipedia API documentation page. Making notes whether results are useful or not for this project.

---
### 1. prop = categories
**Notes:** 
- results returned using this value are not needed for this project

In [15]:
params = {'titles': PAGE_TITLE,
          'action': 'query',
          'format': 'json',
          'prop': 'categories'}

r01 = requests.get(url = URL, headers = HEADERS, params = params)
r01.status_code

200

In [16]:
r01.json()

{'continue': {'clcontinue': '50012|All_articles_with_failed_verification',
  'continue': '||'},
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'categories': [{'ns': 14, 'title': 'Category:1494 births'},
     {'ns': 14, 'title': 'Category:1510s in France'},
     {'ns': 14, 'title': 'Category:1520s in France'},
     {'ns': 14, 'title': 'Category:1530s in France'},
     {'ns': 14, 'title': 'Category:1540s in France'},
     {'ns': 14, 'title': 'Category:1547 deaths'},
     {'ns': 14, 'title': 'Category:15th-century peers of France'},
     {'ns': 14, 'title': 'Category:16th-century dukes of Brittany'},
     {'ns': 14, 'title': 'Category:16th-century kings of France'},
     {'ns': 14, 'title': 'Category:16th-century peers of France'}]}}}}

---
### 2. prop = cirrusbuilddoc
**Notes:**
- this returns useful results:
    - incoming links
        - r02.json()['query']['pages']['50012']['cirrusbuilddoc']['incoming_links']
    - source text (clean regex for first paragraph):
        - r02.json()['query']['pages']['50012']['cirrusbuilddoc']['source_text']
        - better option is prop=extracts and exintro=True)
    - page size (same data as length when prop=info): 
        - r02.json()['query']['pages']['50012']['cirrusbuilddoc']['text_bytes']
    - coordinates (if any): 
        - r02.json()['query']['pages']['50012']['cirrusbuilddoc']['text_bytes']
    

In [17]:
params = {'titles': PAGE_TITLE,
          'action': 'query',
          'format': 'json',
          'prop': 'cirrusbuilddoc'}

r02 = requests.get(url = URL, headers = HEADERS, params = params)
r02.status_code

200

In [18]:
r02.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'cirrusbuilddoc': {'version': 1005165190,
     'wiki': 'enwiki',
     'namespace': 0,
     'namespace_text': '',
     'title': 'Francis I of France',
     'timestamp': '2021-02-06T08:25:34Z',
     'create_timestamp': '2002-04-21T20:28:56Z',
     'redirect': [{'namespace': 0, 'title': 'François I of France'},
      {'namespace': 0, 'title': 'Francis I of france'},
      {'namespace': 0, 'title': 'Francis I, King of France'},
      {'namespace': 0, 'title': 'Francois I of France'},
      {'namespace': 0, 'title': 'François I'},
      {'namespace': 0, 'title': 'Francois I'},
      {'namespace': 0, 'title': 'King François I'},
      {'namespace': 0, 'title': 'François Ier'},
      {'namespace': 0, 'title': 'King of France François I'},
      {'namespace': 0, 'title': 'François 1er'},
  

In [19]:
print(r02.json().keys())
print(r02.json()['query'].keys())
print(r02.json()['query']['pages'].keys())
print(r02.json()['query']['pages']['50012'].keys())
print(r02.json()['query']['pages']['50012']['cirrusbuilddoc'].keys())

dict_keys(['batchcomplete', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['50012'])
dict_keys(['pageid', 'ns', 'title', 'cirrusbuilddoc'])
dict_keys(['version', 'wiki', 'namespace', 'namespace_text', 'title', 'timestamp', 'create_timestamp', 'redirect', 'incoming_links', 'category', 'external_link', 'outgoing_link', 'template', 'text', 'source_text', 'text_bytes', 'content_model', 'coordinates', 'wikibase_item', 'language', 'heading', 'opening_text', 'auxiliary_text', 'defaultsort', 'display_title'])


In [20]:
print(r02.json()['query']['pages']['50012']['cirrusbuilddoc']['incoming_links'])

2187


In [21]:
print(len(r02.json()['query']['pages']['50012']['cirrusbuilddoc']['external_link']))
# print(r02.json()['query']['pages']['50012']['cirrusbuilddoc']['external_link'])

60


In [22]:
print(len(r02.json()['query']['pages']['50012']['cirrusbuilddoc']['outgoing_link']))
# print(r02.json()['query']['pages']['50012']['cirrusbuilddoc']['outgoing_link'])

618


In [23]:
print(r02.json()['query']['pages']['50012']['cirrusbuilddoc']['heading'])

['Early life and accession', 'Reign', 'Patron of the arts', 'Man of letters', 'Construction', 'Military action', 'Relations with the New World and Asia', 'Americas', 'Far East Asia', 'Ottoman Empire', 'Bureaucratic reform and language policy', 'Religious policies', 'Death', 'Image and reputation', 'Marriage and issue', 'Francis I in films, stage and literature', 'Ancestors', 'Further reading']


In [24]:
print(len(r02.json()['query']['pages']['50012']['cirrusbuilddoc']['text']))
print(len((r02.json()['query']['pages']['50012']['cirrusbuilddoc']['text']).split(' ')))
print(r02.json()['query']['pages']['50012']['cirrusbuilddoc']['text'])

36038
5811
Francis I (French: François Ier; Middle French: Francoys; 12 September 1494 – 31 March 1547) was King of France from 1515 until his death in 1547. He was the son of Charles, Count of Angoulême, and Louise of Savoy. He succeeded his first cousin once removed Louis XII, who died without a son. A prodigious patron of the arts, he promoted the emergent French Renaissance by attracting many Italian artists to work for him, including Leonardo da Vinci, who brought the Mona Lisa with him, which Francis had acquired. Francis' reign saw important cultural changes with the growth of central power in France, the spread of humanism and Protestantism, and the beginning of French exploration of the New World. Jacques Cartier and others claimed lands in the Americas for France and paved the way for the expansion of the first French colonial empire. For his role in the development and promotion of a standardized French language, he became known as le Père et Restaurateur des Lettres (the 'F

In [25]:
print(r02.json()['query']['pages']['50012']['cirrusbuilddoc']['source_text'])

{{short description|King of the House of Valois-Angoulême (1494–1547, r. 1515–47)}}
{{Use dmy dates|date=July 2020}}
{{Infobox royalty
| name         = Francis I
| image        = François Ier Louvre.jpg
| caption      = Portrait by [[Jean Clouet]], c. 1530
| alt          = Portrait of King Francis I in his {{age|format=ordinal|1494|1530}} year
| succession   = [[King of France]]
| moretext     = ([[Style of the French sovereign|more...]])
| reign        = 1 January 1515 – {{nowrap|31 March 1547}}
| coronation   = 25 January 1515
| cor-type     = france
| predecessor  = [[Louis XII of France|Louis XII]]
| successor    = [[Henry II of France|Henry II]]
| birth_date   = 12 September 1494
| birth_place  = [[Château de Cognac]], [[Cognac, France|Cognac]], France
| death_date   = {{Death date and age|df=yes|1547|3|31|1494|9|12}}
| death_place  = [[Château de Rambouillet]], France
| burial_date  = 23 May 1547
| burial_place = [[Basilica of St Denis]], France
| spouse       = {{marriage|[[Clau

In [26]:
print(r02.json()['query']['pages']['50012']['cirrusbuilddoc']['text_bytes'])

52547


In [27]:
print(r02.json()['query']['pages']['50012']['cirrusbuilddoc']['coordinates'])

[]


In [28]:
print(r02.json()['query']['pages']['50012']['cirrusbuilddoc']['opening_text'])

Francis I (French: François Ier; Middle French: Francoys; 12 September 1494 – 31 March 1547) was King of France from 1515 until his death in 1547. He was the son of Charles, Count of Angoulême, and Louise of Savoy. He succeeded his first cousin once removed Louis XII, who died without a son. A prodigious patron of the arts, he promoted the emergent French Renaissance by attracting many Italian artists to work for him, including Leonardo da Vinci, who brought the Mona Lisa with him, which Francis had acquired. Francis' reign saw important cultural changes with the growth of central power in France, the spread of humanism and Protestantism, and the beginning of French exploration of the New World. Jacques Cartier and others claimed lands in the Americas for France and paved the way for the expansion of the first French colonial empire. For his role in the development and promotion of a standardized French language, he became known as le Père et Restaurateur des Lettres (the 'Father and R

In [29]:
print(r02.json()['query']['pages']['50012']['cirrusbuilddoc']['auxiliary_text'])

['Francis I Portrait by Jean Clouet, c. 1530 King of France (more...) Reign 1 January 1515 – 31 March 1547 Coronation 25 January 1515 Predecessor Louis XII Successor Henry II Born 12 September 1494 Château de Cognac, Cognac, France Died 31 March 1547(1547-03-31) (aged\xa052) Château de Rambouillet, France Burial 23 May 1547 Basilica of St Denis, France Spouse Claude, Duchess of Brittany \u200b \u200b (m.\xa01514)\u200b Eleanor of Austria \u200b (m.\xa01530)\u200b Issue among others... Francis III, Duke of Brittany Henry II of France Madeleine, Queen of Scots Charles, Duke of Orléans Margaret, Duchess of Savoy House Valois-Angoulême Father Charles, Count of Angoulême Mother Louise of Savoy Religion Roman Catholicism Signature', 'Ancestors of Francis I of France', '16. Charles V of France 8. Louis I, Duke of Orléans 17. Joanna of Bourbon 4. John, Count of Angoulême 18. Gian Galeazzo Visconti 9. Valentina Visconti 19. Isabelle of Valois 2. Charles, Count of Angoulême 20. Alain VIII, Visco

In [30]:
print(r02.json()['query']['pages']['50012']['cirrusbuilddoc']['display_title'])

None


---
### 3. prop = cirruscompsuggestbuilddoc
**Notes:** 
- this returns useful results:
    - score:
        - an indicator directly correlated with the popularity of a page
        - in r03.json()['query']['pages']['50012']['cirruscompsuggestbuilddoc']['50012t']['score_explanation']['value']

In [31]:
params = {'titles': PAGE_TITLE,
          'action': 'query',
          'format': 'json',
          'prop': 'cirruscompsuggestbuilddoc'}

r03 = requests.get(url = URL, headers = HEADERS, params = params)
r03.status_code

200

In [32]:
r03.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'cirruscompsuggestbuilddoc': {'50012t': {'batch_id': 1612908135,
      'source_doc_id': '50012',
      'target_title': {'title': 'Francis I of France', 'namespace': 0},
      'suggest': {'input': ['Francis I of France',
        'Francis I of france',
        'Francois I of France',
        'Francis 1 of France',
        'Francis i of france'],
       'weight': 4335947},
      'suggest-stop': {'input': ['Francis I of France',
        'Francis I of france',
        'Francois I of France',
        'Francis 1 of France',
        'Francis i of france'],
       'weight': 4335947},
      'score_explanation': {'value': 4335947,
       'description': 'Convert to an integer score: 0.43359473441567 * 10000000',
       'details': [{'value': 0.43359473441567276,
         'description': 'Weighted

In [33]:
print(r03.json().keys())
print(r03.json()['query'].keys())
print(r03.json()['query']['pages'].keys())
print(r03.json()['query']['pages']['50012'].keys())
print(r03.json()['query']['pages']['50012']['cirruscompsuggestbuilddoc'].keys())
print(r03.json()['query']['pages']['50012']['cirruscompsuggestbuilddoc']['50012t'].keys())
print(r03.json()['query']['pages']['50012']['cirruscompsuggestbuilddoc']['50012r'].keys())
print(r03.json()['query']['pages']['50012']['cirruscompsuggestbuilddoc']['50012r']['suggest'].keys())
print(r03.json()['query']['pages']['50012']['cirruscompsuggestbuilddoc']['50012r']['suggest']['weight'])
print(r03.json()['query']['pages']['50012']['cirruscompsuggestbuilddoc']['50012t']['score_explanation'].keys())
print(r03.json()['query']['pages']['50012']['cirruscompsuggestbuilddoc']['50012t']['score_explanation']['value'])
print(r03.json()['query']['pages']['50012']['cirruscompsuggestbuilddoc']['50012t']['score_explanation']['description'])
print(r03.json()['query']['pages']['50012']['cirruscompsuggestbuilddoc']['50012t']['score_explanation']['details'])

dict_keys(['batchcomplete', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['50012'])
dict_keys(['pageid', 'ns', 'title', 'cirruscompsuggestbuilddoc'])
dict_keys(['50012t', '50012r'])
dict_keys(['batch_id', 'source_doc_id', 'target_title', 'suggest', 'suggest-stop', 'score_explanation'])
dict_keys(['batch_id', 'source_doc_id', 'target_title', 'suggest', 'suggest-stop', 'score_explanation'])
dict_keys(['input', 'weight'])
433594
dict_keys(['value', 'description', 'details'])
4335947
Convert to an integer score: 0.43359473441567 * 10000000
[{'value': 0.43359473441567276, 'description': 'Weighted sum of doc quality score and popularity', 'details': {'popularity_weighted': {'value': 0.13050526852769972, 'description': 'popularity*weight/total; popularity = 0.45676843984695, weight = 0.4, total = 1.4', 'details': {'popularity': {'value': 0.4567684398469489, 'description': 'log(1+(min(popularity,popularity_max)*max_docs), pop_logbase); popularity = 5.5469153185451E-6, popularity_max 

In [34]:
r03_score = r03.json()['query']['pages']['50012']['cirruscompsuggestbuilddoc']['50012t']['score_explanation']['details']
for score in r03_score:
    for k,v in score.items():
        if not isinstance(v, dict):
            print(k, ': ', v)
        else:
            print(k)
            for k_, v_ in v.items():
                print(f'\t{k_}: {v_}')

value :  0.43359473441567276
description :  Weighted sum of doc quality score and popularity
details
	popularity_weighted: {'value': 0.13050526852769972, 'description': 'popularity*weight/total; popularity = 0.45676843984695, weight = 0.4, total = 1.4', 'details': {'popularity': {'value': 0.4567684398469489, 'description': 'log(1+(min(popularity,popularity_max)*max_docs), pop_logbase); popularity = 5.5469153185451E-6, popularity_max = 0.0004, max_docs = 6246361, pop_logbase = 2499.5444', 'details': {'pop_logbase': {'value': 2499.5444, 'description': '1+popularity_max*max_docs; popularity_max = 0.0004, max_docs = 6246361'}}}}}
	page_quality: {'value': 0.30308946588797303, 'description': 'quality*weight/total; quality = 0.42432525224316, weight = 1, total = 1.4', 'details': {'quality': {'value': 0.42432525224316225, 'description': 'weighted sum of document metadata', 'details': {'incoming_links_weighted': {'value': 0.003113131031041026, 'description': 'incoming_links_normalized*weight/to

In [35]:
len(r03_score)

1

---
### 4. contributors
**Notes:** results returned by this query are not needed for this project

In [36]:
params = {'titles': PAGE_TITLE,
          'action': 'query',
          'format': 'json',
          'prop': 'contributors',
          'pclimit': 500}

r04 = requests.get(url = URL, headers = HEADERS, params = params)
r04.status_code

200

In [37]:
r04.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'anoncontributors': 357,
    'contributors': [{'userid': 1623918, 'name': 'Jayron32'},
     {'userid': 13791031, 'name': 'Frietjes'},
     {'userid': 27015025, 'name': 'InternetArchiveBot'},
     {'userid': 7611264, 'name': 'AnomieBOT'},
     {'userid': 54809, 'name': 'Utcursch'},
     {'userid': 11292982, 'name': 'EmausBot'},
     {'userid': 1215485, 'name': 'Cydebot'},
     {'userid': 2790592, 'name': 'KylieTastic'},
     {'userid': 7903804, 'name': 'Citation bot'},
     {'userid': 7167267, 'name': 'Tide rolls'},
     {'userid': 1808194, 'name': 'TAnthony'},
     {'userid': 13286072, 'name': 'ClueBot NG'},
     {'userid': 1304678, 'name': 'Doug Weller'},
     {'userid': 313197, 'name': 'Rjensen'},
     {'userid': 1219, 'name': 'Deb'},
     {'userid': 2092487, 'name': 'JaGa'},
    

---
### 5. extracts
**Notes:** 
- useful results:
    - extract with param 'exintro' gives the first paragraphs of a page (before detailed sections begin). It's useful and easier to clean up than the results returned by prop cirrusbuilddoc
    - r05_1.json()['query']['pages']['50012']['extract']


In [38]:
params = {'titles': PAGE_TITLE,
          'action': 'query',
          'format': 'json',
          'prop': 'extracts'}

r05 = requests.get(url = URL, headers = HEADERS, params = params)
r05.status_code

200

In [39]:
r05.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'extract': '<p class="mw-empty-elt">\n</p>\n\n<p><b>Francis I</b> (French: <i lang="fr">François I<sup>er</sup></i>; Middle French: <i lang="frm">Francoys</i>; 12 September 1494 – 31 March 1547) was King of France from 1515 until his death in 1547. He was the son of Charles, Count of Angoulême, and Louise of Savoy. He succeeded his first cousin once removed Louis XII, who died without a son.\n</p><p>A prodigious patron of the arts, he promoted the emergent French Renaissance by attracting many Italian artists to work for him, including Leonardo da Vinci, who brought the <i>Mona Lisa</i> with him, which Francis had acquired. Francis\' reign saw important cultural changes with the growth of central power in France, the spread of humanism and Protestantism, and the beginning of French 

In [40]:
params = {'titles': PAGE_TITLE,
          'action': 'query',
          'format': 'json',
          'prop': 'extracts',
          'exintro': True}

r05_1 = requests.get(url = URL, headers = HEADERS, params = params)
r05_1.status_code

200

In [41]:
r05_1.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'extract': '<p class="mw-empty-elt">\n</p>\n\n<p><b>Francis I</b> (French: <i lang="fr">François I<sup>er</sup></i>; Middle French: <i lang="frm">Francoys</i>; 12 September 1494 – 31 March 1547) was King of France from 1515 until his death in 1547. He was the son of Charles, Count of Angoulême, and Louise of Savoy. He succeeded his first cousin once removed Louis XII, who died without a son.\n</p><p>A prodigious patron of the arts, he promoted the emergent French Renaissance by attracting many Italian artists to work for him, including Leonardo da Vinci, who brought the <i>Mona Lisa</i> with him, which Francis had acquired. Francis\' reign saw important cultural changes with the growth of central power in France, the spread of humanism and Protestantism, and the beginning of French 

In [42]:
r05_1.json().keys()



In [43]:
s05 = BeautifulSoup(r05_1.json()['query']['pages']['50012']['extract'])
s05.text

"\n\nFrancis I (French: François Ier; Middle French: Francoys; 12 September 1494 – 31 March 1547) was King of France from 1515 until his death in 1547. He was the son of Charles, Count of Angoulême, and Louise of Savoy. He succeeded his first cousin once removed Louis XII, who died without a son.\nA prodigious patron of the arts, he promoted the emergent French Renaissance by attracting many Italian artists to work for him, including Leonardo da Vinci, who brought the Mona Lisa with him, which Francis had acquired. Francis' reign saw important cultural changes with the growth of central power in France, the spread of humanism and Protestantism, and the beginning of French exploration of the New World. Jacques Cartier and others claimed lands in the Americas for France and paved the way for the expansion of the first French colonial empire.\nFor his role in the development and promotion of a standardized French language, he became known as le Père et Restaurateur des Lettres (the 'Fathe

---
### 6. info
**Notes:**
- useful info:
    - page size: 
        - r06.json()['query']['pages']['50012']['length']
        - same data as text_bytes when prop=cirrusbuilddoc => I'll only use cirrusbuilddoc since it has extra data

In [44]:
params = {'titles': PAGE_TITLE,
          'action': 'query',
          'format': 'json',
          'prop': 'info'}

r06 = requests.get(url = URL, headers = HEADERS, params = params)
r06.status_code

200

In [45]:
r06.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'contentmodel': 'wikitext',
    'pagelanguage': 'en',
    'pagelanguagehtmlcode': 'en',
    'pagelanguagedir': 'ltr',
    'touched': '2021-02-09T13:31:49Z',
    'lastrevid': 1005165190,
    'length': 52547}}}}

In [46]:
print(r06.json()['query']['pages']['50012'].keys())
print(r06.json()['query']['pages']['50012']['length'])

dict_keys(['pageid', 'ns', 'title', 'contentmodel', 'pagelanguage', 'pagelanguagehtmlcode', 'pagelanguagedir', 'touched', 'lastrevid', 'length'])
52547


---
### 7. pageimages
**Notes:**
- useful info:
    - image thumbnail
        - in r07.json()['query']['pages']['50012']['thumbnail']
        - need regex to get to full size image (or extra param piprop=original, but this doesn't return pageimage)
    - pageimage
        - in r07.json()['query']['pages']['50012']['pageimage']
        - (?) useful in image data (i.e. license, author, etc)

In [47]:
params = {'titles': PAGE_TITLE,
          'action': 'query',
          'format': 'json',
          'prop': 'pageimages'}

r07 = requests.get(url = URL, headers = HEADERS, params = params)
r07.status_code

200

In [48]:
r07.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'thumbnail': {'source': 'https://upload.wikimedia.org/wikipedia/commons/thumb/8/87/Fran%C3%A7ois_Ier_Louvre.jpg/39px-Fran%C3%A7ois_Ier_Louvre.jpg',
     'width': 39,
     'height': 50},
    'pageimage': 'François_Ier_Louvre.jpg'}}}}

In [49]:
print(r07.json()['query']['pages']['50012']['thumbnail'])
print(r07.json()['query']['pages']['50012']['pageimage'])

{'source': 'https://upload.wikimedia.org/wikipedia/commons/thumb/8/87/Fran%C3%A7ois_Ier_Louvre.jpg/39px-Fran%C3%A7ois_Ier_Louvre.jpg', 'width': 39, 'height': 50}
François_Ier_Louvre.jpg


In [50]:
params = {'titles': PAGE_TITLE,
          'action': 'query',
          'format': 'json',
          'prop': 'pageimages',
          'piprop': 'original'}

r07_1 = requests.get(url = URL, headers = HEADERS, params = params)
r07_1.status_code

200

In [51]:
r07_1.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'original': {'source': 'https://upload.wikimedia.org/wikipedia/commons/8/87/Fran%C3%A7ois_Ier_Louvre.jpg',
     'width': 2048,
     'height': 2648}}}}}

---
### 8. pageprops
**Notes:**
- useful for:
    - page_image_free
    - wikibase_short_description (similar but not identical to description from pagetemrs (see item 9)
    - wikibase_item

In [52]:
params = {'titles': PAGE_TITLE,
          'action': 'query',
          'format': 'json',
          'prop': 'pageprops'}

r08 = requests.get(url = URL, headers = HEADERS, params = params)
r08.status_code

200

In [53]:
r08.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'pageprops': {'defaultsort': 'Francis 01 of France',
     'page_image_free': 'François_Ier_Louvre.jpg',
     'wikibase-shortdesc': 'King of the House of Valois-Angoulême (1494–1547, r. 1515–47)',
     'wikibase_item': 'Q129857'}}}}}

---
### 9. pageterms
**Notes:**
- useful for description (similar but not identical to wikibase_shortdesc from pageprops (see item 8)) (need to decide which to use)

In [54]:
params = {'titles': PAGE_TITLE,
          'action': 'query',
          'format': 'json',
          'prop': 'pageterms'}

r09 = requests.get(url = URL, headers = HEADERS, params = params)
r09.status_code

200

In [55]:
r09.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'terms': {'alias': ['François I',
      're di Francia François I',
      'King of France Franz I',
      'King of France François I',
      'King of France Francesco I',
      'roi de France François I',
      'roi de France François Ier',
      'King of France Francisco I',
      'King of France Francis I',
      'König Franz I. Frankreich',
      'Duke of Lorraine François Stefan'],
     'label': ['Francis I of France'],
     'description': ['King of France (1494-1547)']}}}}}

---
### 10. pageviews
**Notes:**
- useful for page_views :-)

In [56]:
params = {'titles': 'Francis_I_of_France',
          'action': 'query',
          'format': 'json',
          'prop': 'pageviews'}

r10 = requests.get(url = URL, headers = HEADERS, params = params)
r10.status_code

200

In [57]:
r10.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'pageviews': {'2020-12-11': 1571,
     '2020-12-12': 1505,
     '2020-12-13': 1667,
     '2020-12-14': 1743,
     '2020-12-15': 1579,
     '2020-12-16': 1633,
     '2020-12-17': 1539,
     '2020-12-18': 2491,
     '2020-12-19': 1514,
     '2020-12-20': 1660,
     '2020-12-21': 1563,
     '2020-12-22': 1617,
     '2020-12-23': 1540,
     '2020-12-24': 1362,
     '2020-12-25': 1197,
     '2020-12-26': 1364,
     '2020-12-27': 1797,
     '2020-12-28': 1764,
     '2020-12-29': 1770,
     '2020-12-30': 1882,
     '2020-12-31': 1555,
     '2021-01-01': 1921,
     '2021-01-02': 1921,
     '2021-01-03': 2011,
     '2021-01-04': 1713,
     '2021-01-05': 1792,
     '2021-01-06': 1587,
     '2021-01-07': 1373,
     '2021-01-08': 1533,
     '2021-01-09': 1625,
     '2021-01-10': 1794,
     '202

In [58]:
print(len(r10.json()['query']['pages']['50012']['pageviews']))

60


In [59]:
params = {'titles': 'Francis_I_of_France',
          'action': 'query',
          'format': 'json',
          'prop': 'pageviews',
          'pvipcontinue': True}

r10_1 = requests.get(url = URL, headers = HEADERS, params = params)
r10_1.status_code

200

In [60]:
r10_1.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France'}}}}

---
### 11.  transclude
**Notes:**
- no data useful for this project

In [61]:
params = {'titles': PAGE_TITLE,
          'action': 'query',
          'format': 'json',
          'prop': 'transcludedin'} #ti

r11 = requests.get(url = URL, headers = HEADERS, params = params)
r11.status_code

200

In [62]:
r11.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'transcludedin': [{'pageid': 50012,
      'ns': 0,
      'title': 'Francis I of France'}]}}}}

---
### 12. description
**Notes:**
- per wikipedia docs, this endpoint is for internal use and may be unstable
- data retrieved from here is already available from other endpoints

In [63]:
params = {'titles': PAGE_TITLE,
          'action': 'query',
          'format': 'json',
          'prop': 'description'} 

r12 = requests.get(url = URL, headers = HEADERS, params = params)
r12.status_code

200

In [64]:
r12.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'description': 'King of the House of Valois-Angoulême (1494–1547, r. 1515–47)',
    'descriptionsource': 'local'}}}}

In [65]:
params = {'titles': PAGE_TITLE,
          'action': 'query',
          'format': 'json',
          'prop': 'description',
          'descprefersource': 'central'} # options 'local' or 'central', but doesn't work

r12_1 = requests.get(url = URL, headers = HEADERS, params = params)
r12_1.status_code

200

In [66]:
r12_1.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'Francis_I_of_France',
    'to': 'Francis I of France'}],
  'pages': {'50012': {'pageid': 50012,
    'ns': 0,
    'title': 'Francis I of France',
    'description': 'King of the House of Valois-Angoulême (1494–1547, r. 1515–47)',
    'descriptionsource': 'local'}}}}