# Wikipedia API <small>get image data</small>
---
**Goal:** explore the wikipedia API and figure out a good way of getting image data

**Notes:**
- wikipedia docs specific to image info at https://www.mediawiki.org/wiki/API:Imageinfo
- this notebook has two main sections:
    - Section 1: summary of what I found out after exploring the API
    - Section 2: the messy exploration of the API


In [1]:
import requests
import time
import config
import numpy as np
from bs4 import BeautifulSoup

In [2]:
URL = 'https://en.wikipedia.org/w/api.php/'
HEADERS = config.HEADERS

In [3]:
# available urls (do they have different data?)
url_wikipedia = 'https://en.wikipedia.org/w/api.php/'
url_mediawiki = 'https://www.mediawiki.org/w/api.php/'
url_metawiki = 'https://meta.wikimedia.org/w/api.php'
url_commons = 'https://commons.wikimedia.org/w/api.php'

In [14]:
# using pictures in and not in public domain (different data available?)
francis = 'File:François_Ier_Louvre.jpg ' # public domain
albert = 'File:Albert_Einstein_Head.jpg' # public domain
albert_n = 'File:Albert Einstein Head.jpg' # normalized by Wikipedia
richard = 'File:Richard_Feynman_Nobel.jpg' # not public domain
nika = 'File:Turkey-03228_-_Hippodrome_of_Constantinople_(11312626353).jpg' # cc by 2.0
covid = 'File:Novel_Coronavirus_SARS-CoV-2.jpg' # cc by 2.0
wrong = 'File:Wrong File Request Sample' # using this to test what happens to non-existing files

## Section 1
---
**Notes:** 
- useful results when:
    - 'prop' = 'imageinfo'
    - 'iiprop' = 'user|url|extmetadata'
- using url_wikimedia is enough to get the image data needed for this project
- images are processed in batches (default_size = 5) to reduce number of API calls

**Example request to and data retrieved from wikipedia API**

In [18]:
titles = f'{francis}|{albert_n}|{richard}|{nika}|{covid}|{wrong}'
params = {'titles': titles,
          'action': 'query',
          'format': 'json',
          'prop': 'imageinfo',
          'iiprop': 'user|comment|url|extmetadata',
         }

r = requests.get(url = url_wikipedia, headers = HEADERS, params = params)
r.status_code

200

In [19]:
j = r.json()
j

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'File:François_Ier_Louvre.jpg ',
    'to': 'File:François Ier Louvre.jpg'},
   {'from': 'File:Richard_Feynman_Nobel.jpg',
    'to': 'File:Richard Feynman Nobel.jpg'},
   {'from': 'File:Turkey-03228_-_Hippodrome_of_Constantinople_(11312626353).jpg',
    'to': 'File:Turkey-03228 - Hippodrome of Constantinople (11312626353).jpg'},
   {'from': 'File:Novel_Coronavirus_SARS-CoV-2.jpg',
    'to': 'File:Novel Coronavirus SARS-CoV-2.jpg'}],
  'pages': {'-1': {'ns': 6,
    'title': 'File:François Ier Louvre.jpg',
    'missing': '',
    'known': '',
    'imagerepository': 'shared',
    'imageinfo': [{'user': 'Oakenchips',
      'comment': 'User created page with UploadWizard',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/8/87/Fran%C3%A7ois_Ier_Louvre.jpg',
      'descriptionurl': 'https://commons.wikimedia.org/wiki/File:Fran%C3%A7ois_Ier_Louvre.jpg',
      'descriptionshorturl': 'https://commons.wikimedia.org/w/index.php?cur

In [20]:
print(j.keys())
print(j['query'].keys())
print(j['query']['pages'].keys())

dict_keys(['batchcomplete', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['-1', '-2', '-3', '-4', '63511704', '34664654'])


In [23]:
for val in j['query']['pages']:
    print(val)

-1
-2
-3
-4
63511704
34664654


In [13]:
print_image_info(r)

dict_keys(['batchcomplete', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['-1', '-2', '-3', '-4', '63511704', '34664654'])


dict_keys(['ns', 'title', 'missing', 'known', 'imagerepository', 'imageinfo'])
title:  File:François Ier Louvre.jpg
missing:  
known:  
image_repository:  shared
1
dict_keys(['user', 'comment', 'url', 'descriptionurl', 'descriptionshorturl', 'extmetadata'])
dict_keys(['DateTime', 'ObjectName', 'CommonsMetadataExtension', 'Categories', 'Assessments', 'Credit', 'LicenseShortName', 'UsageTerms', 'AttributionRequired', 'Copyrighted', 'Restrictions', 'License'])

	 DateTime :  {'value': '2013-12-21 16:34:30', 'source': 'mediawiki-metadata', 'hidden': ''}
	 ObjectName :  {'value': 'François Ier Louvre', 'source': 'mediawiki-metadata', 'hidden': ''}
	 CommonsMetadataExtension :  {'value': 1.2, 'source': 'extension', 'hidden': ''}
	 Categories :  {'value': 'Artworks with Wikidata item|Artworks with Wikidata item missing author|Artworks with accession number fro

In [112]:
batch

[{'title': 'File:François Ier Louvre.jpg',
  'image_repository': 'shared',
  'user': 'Oakenchips',
  'image_url': 'https://upload.wikimedia.org/wikipedia/commons/8/87/Fran%C3%A7ois_Ier_Louvre.jpg',
  'image_date': '2013-12-21 16:34:30',
  'image_credit': '<span class="int-own-work" lang="en">Own work</span>',
  'image_description': nan,
  'image_license_name': 'Public domain',
  'image_usage_terms': 'Public domain',
  'image_attrib_required': 'false',
  'image_copyright': 'False',
  'image_restriction': '',
  'image_license': 'pd'},
 {'title': 'File:Albert Einstein Head.jpg',
  'image_repository': 'shared',
  'user': 'Triggerhippie4',
  'image_url': 'https://upload.wikimedia.org/wikipedia/commons/d/d3/Albert_Einstein_Head.jpg',
  'image_date': '2014-11-25 19:59:28',
  'image_credit': 'This image  is available from the United States <a href="//commons.wikimedia.org/wiki/Library_of_Congress" title="Library of Congress">Library of Congress</a>\'s <a rel="nofollow" class="external text" hr

## Section 2

### Request multiple pictures

Values for 'prop' and 'iiprop' were discovered in previous sections (which have been moved further down to make this notebook easier to read)

In [5]:
titles = f'{francis}|{albert}|{richard}|{nika}|{covid}'
params = {'titles': titles,
          'action': 'query',
          'format': 'json',
          'prop': 'imageinfo',
          'iiprop': 'user|comment|url|metadata|mime|mediatype|extmetadata',
         }

r10_1 = requests.get(url = url_wikipedia, headers = HEADERS, params = params)
r10_2 = requests.get(url = url_commons, headers = HEADERS, params = params)

print(r10_1.status_code)
print(r10_2.status_code)

200
200


In [12]:
def print_image_info(requests_response):
    j = requests_response.json()
    print(j.keys())
    print(j['query'].keys())
    print(j['query']['pages'].keys())
    print('\n===============\n')
    for id_ in j['query']['pages'].keys():
        print(j['query']['pages'][id_].keys())
        print('title: ', j['query']['pages'][id_]['title'])
        try:
            print('missing: ', j['query']['pages'][id_]['missing'])
        except:
            print('missing: ********')
        try: 
            print('known: ', j['query']['pages'][id_]['known'])
        except:
            print('known: ********')
            
        print('image_repository: ', j['query']['pages'][id_]['imagerepository'])
        try:
            print(len(j['query']['pages'][id_]['imageinfo']))
            print(j['query']['pages'][id_]['imageinfo'][0].keys())
            print(j['query']['pages'][id_]['imageinfo'][0]['extmetadata'].keys())
            print()
            for k, v in j['query']['pages'][id_]['imageinfo'][0]['extmetadata'].items():
                print('\t', k, ': ', v)
        except:
            print('imageinfo: ********')        
        print('\n===============\n')

In [108]:
def extract_single_image_data(image_data, url ='wikipedia'):
    image_dict_list = []
    for image_id in image_data.keys():
        image = image_data[image_id]
        
        if int(image_id) < 0 and url == 'commons':
            image_dict = {'title': image['title'],
                         'image_repository': image['imagerepository'],
                         'image_keys' : image.keys()}
            image_dict_list.append(image_dict)
            continue
            
        image_dict = {'title' : image['title'],
                      'image_repository' : image['imagerepository'],
                      'user' : image['imageinfo'][0]['user'],
                      'image_url' : image['imageinfo'][0]['url'],
                      'image_date' : image['imageinfo'][0]['extmetadata']['DateTime']['value'],
                      'image_credit' : image['imageinfo'][0]['extmetadata']['Credit']['value'],
                      'image_description' : image['imageinfo'][0]['extmetadata'].get('ImageDescription', np.nan),
                      'image_license_name' : image['imageinfo'][0]['extmetadata'].get('LicenseShortName', np.nan),
                      'image_usage_terms' : image['imageinfo'][0]['extmetadata'].get('UsageTerms', np.nan),
                      'image_attrib_required' : image['imageinfo'][0]['extmetadata'].get('AttributionRequired', np.nan),
                      'image_copyright' : image['imageinfo'][0]['extmetadata'].get('Copyrighted', np.nan),
                      'image_restriction' : image['imageinfo'][0]['extmetadata'].get('Restrictions', np.nan),
                      'image_license' : image['imageinfo'][0]['extmetadata'].get('License', np.nan)}
        
        fields_with_value = ['image_description','image_license_name', 'image_usage_terms', 
                             'image_attrib_required', 'image_copyright', 'image_restriction', 'image_license']
        
        for field in fields_with_value:
            if isinstance(image_dict[field], dict):
                image_dict[field] = image_dict[field]['value']

        
        if isinstance(image_dict['image_description'], str):
            image_dict['image_description'] = BeautifulSoup(image_dict['image_description']).text

        image_dict_list.append(image_dict)
        
    return image_dict_list
            


In [109]:
a = np.nan
if np.isnan(a):
    print(1)
else:
    print(0)

1


In [110]:
def extract_image_batch_data(response, url = 'wikipedia'):
    resp = response.json()
    
    file_names = resp['query']['normalized']
    file_names_dict = {file['from']:file['to'] for file in file_names}
    
    image_batch_list = []
    
    for k, v in resp['query']['pages'].items():
        image = {k: v}
        image_dict = extract_single_image_data(image, url = url)
        image_batch_list.extend(image_dict)
        
    return file_names_dict, image_batch_list
    

In [111]:
fn, batch = extract_image_batch_data(r10_1)

dict_keys(['-1'])
dict_keys(['-2'])
dict_keys(['-3'])
dict_keys(['63511704'])
dict_keys(['34664654'])


In [112]:
batch

[{'title': 'File:François Ier Louvre.jpg',
  'image_repository': 'shared',
  'user': 'Oakenchips',
  'image_url': 'https://upload.wikimedia.org/wikipedia/commons/8/87/Fran%C3%A7ois_Ier_Louvre.jpg',
  'image_date': '2013-12-21 16:34:30',
  'image_credit': '<span class="int-own-work" lang="en">Own work</span>',
  'image_description': nan,
  'image_license_name': 'Public domain',
  'image_usage_terms': 'Public domain',
  'image_attrib_required': 'false',
  'image_copyright': 'False',
  'image_restriction': '',
  'image_license': 'pd'},
 {'title': 'File:Albert Einstein Head.jpg',
  'image_repository': 'shared',
  'user': 'Triggerhippie4',
  'image_url': 'https://upload.wikimedia.org/wikipedia/commons/d/d3/Albert_Einstein_Head.jpg',
  'image_date': '2014-11-25 19:59:28',
  'image_credit': 'This image  is available from the United States <a href="//commons.wikimedia.org/wiki/Library_of_Congress" title="Library of Congress">Library of Congress</a>\'s <a rel="nofollow" class="external text" hr

In [69]:
fn_2, batch_2 = extract_image_batch_data(r10_2, 'commons')

dict_keys(['-1'])
dict_keys(['925243'])
dict_keys(['30275305'])
dict_keys(['92612457'])
dict_keys(['66987701'])


In [70]:
batch_2

[{'title': 'File:Richard Feynman Nobel.jpg',
  'image_repository': '',
  'image_keys': dict_keys(['ns', 'title', 'missing', 'imagerepository'])},
 {'title': 'File:Albert Einstein Head.jpg',
  'image_repository': 'local',
  'user': 'Triggerhippie4',
  'image_url': 'https://upload.wikimedia.org/wikipedia/commons/d/d3/Albert_Einstein_Head.jpg',
  'image_date': '2014-11-25 19:59:28',
  'image_credit': 'This image  is available from the United States <a href="//commons.wikimedia.org/wiki/Library_of_Congress" title="Library of Congress">Library of Congress</a>\'s <a rel="nofollow" class="external text" href="//www.loc.gov/rr/print/">Prints and Photographs division</a><br> under the digital ID <a rel="nofollow" class="external text" href="http://hdl.loc.gov/loc.pnp/cph.3b46036">cph.3b46036</a>.<br><small>This tag does not indicate the copyright status of the attached work. A normal copyright tag is still required. See <a href="//commons.wikimedia.org/wiki/Commons:Licensing" title="Commons:Lic

In [7]:
j10_1 = r10_1.json()
j10_1

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'File:François_Ier_Louvre.jpg ',
    'to': 'File:François Ier Louvre.jpg'},
   {'from': 'File:Albert_Einstein_Head.jpg',
    'to': 'File:Albert Einstein Head.jpg'},
   {'from': 'File:Richard_Feynman_Nobel.jpg',
    'to': 'File:Richard Feynman Nobel.jpg'},
   {'from': 'File:Turkey-03228_-_Hippodrome_of_Constantinople_(11312626353).jpg',
    'to': 'File:Turkey-03228 - Hippodrome of Constantinople (11312626353).jpg'},
   {'from': 'File:Novel_Coronavirus_SARS-CoV-2.jpg',
    'to': 'File:Novel Coronavirus SARS-CoV-2.jpg'}],
  'pages': {'-1': {'ns': 6,
    'title': 'File:François Ier Louvre.jpg',
    'missing': '',
    'known': '',
    'imagerepository': 'shared',
    'imageinfo': [{'user': 'Oakenchips',
      'comment': 'User created page with UploadWizard',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/8/87/Fran%C3%A7ois_Ier_Louvre.jpg',
      'descriptionurl': 'https://commons.wikimedia.org/wiki/File:Fran%C3%A7ois_Ier

In [8]:
print(j10_1.keys())
print(j10_1['query'].keys())
print(j10_1['query']['pages'].keys())

dict_keys(['batchcomplete', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['-1', '-2', '-3', '63511704', '34664654'])


In [9]:
print_image_info(r10_1)

dict_keys(['batchcomplete', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['-1', '-2', '-3', '63511704', '34664654'])


dict_keys(['ns', 'title', 'missing', 'known', 'imagerepository', 'imageinfo'])
title:  File:François Ier Louvre.jpg
missing:  
known:  
image_repository:  shared
1
dict_keys(['user', 'comment', 'url', 'descriptionurl', 'descriptionshorturl', 'metadata', 'extmetadata', 'mime', 'mediatype'])
dict_keys(['DateTime', 'ObjectName', 'CommonsMetadataExtension', 'Categories', 'Assessments', 'Credit', 'LicenseShortName', 'UsageTerms', 'AttributionRequired', 'Copyrighted', 'Restrictions', 'License'])

	 DateTime :  {'value': '2013-12-21 16:34:30', 'source': 'mediawiki-metadata', 'hidden': ''}
	 ObjectName :  {'value': 'François Ier Louvre', 'source': 'mediawiki-metadata', 'hidden': ''}
	 CommonsMetadataExtension :  {'value': 1.2, 'source': 'extension', 'hidden': ''}
	 Categories :  {'value': 'Artworks with Wikidata item|Artworks with Wikidata item missing author|Artwork

In [10]:
j10_2 = r10_2.json()
j10_2

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'File:François_Ier_Louvre.jpg ',
    'to': 'File:François Ier Louvre.jpg'},
   {'from': 'File:Albert_Einstein_Head.jpg',
    'to': 'File:Albert Einstein Head.jpg'},
   {'from': 'File:Richard_Feynman_Nobel.jpg',
    'to': 'File:Richard Feynman Nobel.jpg'},
   {'from': 'File:Turkey-03228_-_Hippodrome_of_Constantinople_(11312626353).jpg',
    'to': 'File:Turkey-03228 - Hippodrome of Constantinople (11312626353).jpg'},
   {'from': 'File:Novel_Coronavirus_SARS-CoV-2.jpg',
    'to': 'File:Novel Coronavirus SARS-CoV-2.jpg'}],
  'pages': {'-1': {'ns': 6,
    'title': 'File:Richard Feynman Nobel.jpg',
    'missing': '',
    'imagerepository': ''},
   '925243': {'pageid': 925243,
    'ns': 6,
    'title': 'File:Albert Einstein Head.jpg',
    'imagerepository': 'local',
    'imageinfo': [{'user': 'Triggerhippie4',
      'comment': 'higher resolution, quality',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/d/d3/Albert_Einstein

In [11]:
print(j10_2.keys())
print(j10_2['query'].keys())
print(j10_2['query']['pages'].keys())

dict_keys(['batchcomplete', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['-1', '925243', '30275305', '92612457', '66987701'])


In [14]:
print_image_info(r10_2)

dict_keys(['batchcomplete', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['-1', '925243', '30275305', '92612457', '66987701'])


dict_keys(['ns', 'title', 'missing', 'imagerepository'])
title:  File:Richard Feynman Nobel.jpg
missing:  
known: ********
image_repository:  
imageinfo: ********


dict_keys(['pageid', 'ns', 'title', 'imagerepository', 'imageinfo'])
title:  File:Albert Einstein Head.jpg
missing: ********
known: ********
image_repository:  local
1
dict_keys(['user', 'comment', 'url', 'descriptionurl', 'descriptionshorturl', 'metadata', 'extmetadata', 'mime', 'mediatype'])
dict_keys(['DateTime', 'ObjectName', 'CommonsMetadataExtension', 'Categories', 'Assessments', 'ImageDescription', 'DateTimeOriginal', 'Credit', 'Artist', 'LicenseShortName', 'UsageTerms', 'AttributionRequired', 'Copyrighted', 'Restrictions', 'License'])

	 DateTime :  {'value': '2014-11-25 19:59:28', 'source': 'mediawiki-metadata', 'hidden': ''}
	 ObjectName :  {'value': 'Albert Einstein Head', 'so

### Compare results for same picture but using different API urls
**Notes:**
- 'imagerepository' value ('shared', or 'local') returned by url_wikipedia seems to indicate if image is in public domain or not (can be different, for the same picture, than 'imagerepository' returned by url_commons)
- if image marked as 'public domain':
    - no difference between:
        - url_wikipedia
        - url_mediawiki
        - url_metawiki

### 1. Pictures in public domain

#### Albert Einstein

In [9]:
params = {'titles': albert,
          'action': 'query',
          'format': 'json',
          'prop': 'imageinfo',
          'iiprop': 'user|comment|url|metadata',
         }

r00_1 = requests.get(url = url_wikipedia, headers = HEADERS, params = params)
r00_2 = requests.get(url = url_mediawiki, headers = HEADERS, params = params)
r00_3 = requests.get(url = url_metawiki, headers = HEADERS, params = params)
r00_4 = requests.get(url = url_commons, headers = HEADERS, params = params)
print(str(r00_1.json()) == str(r00_2.json()))
print(str(r00_1.json()) == str(r00_3.json()))
print(str(r00_1.json()) == str(r00_4.json()))

True
True
False


In [10]:
r00_1.json()

{'continue': {'iistart': '2008-06-06T22:27:45Z', 'continue': '||'},
 'query': {'normalized': [{'from': 'File:Albert_Einstein_Head.jpg',
    'to': 'File:Albert Einstein Head.jpg'}],
  'pages': {'-1': {'ns': 6,
    'title': 'File:Albert Einstein Head.jpg',
    'missing': '',
    'known': '',
    'imagerepository': 'shared',
    'imageinfo': [{'user': 'Triggerhippie4',
      'comment': 'higher resolution, quality',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/d/d3/Albert_Einstein_Head.jpg',
      'descriptionurl': 'https://commons.wikimedia.org/wiki/File:Albert_Einstein_Head.jpg',
      'descriptionshorturl': 'https://commons.wikimedia.org/w/index.php?curid=925243',
      'metadata': [{'name': 'MEDIAWIKI_EXIF_VERSION', 'value': 1}]}]}}}}

In [11]:
r00_4.json()

{'continue': {'iistart': '2008-06-06T22:27:45Z', 'continue': '||'},
 'query': {'normalized': [{'from': 'File:Albert_Einstein_Head.jpg',
    'to': 'File:Albert Einstein Head.jpg'}],
  'pages': {'925243': {'pageid': 925243,
    'ns': 6,
    'title': 'File:Albert Einstein Head.jpg',
    'imagerepository': 'local',
    'imageinfo': [{'user': 'Triggerhippie4',
      'comment': 'higher resolution, quality',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/d/d3/Albert_Einstein_Head.jpg',
      'descriptionurl': 'https://commons.wikimedia.org/wiki/File:Albert_Einstein_Head.jpg',
      'descriptionshorturl': 'https://commons.wikimedia.org/w/index.php?curid=925243',
      'metadata': [{'name': 'MEDIAWIKI_EXIF_VERSION', 'value': 1}]}]}}}}

#### Francis I of France

In [12]:
params = {'titles': francis,
          'action': 'query',
          'format': 'json',
          'prop': 'imageinfo',
          'iiprop': 'user|comment|url|metadata',
         }

r01_1 = requests.get(url = url_wikipedia, headers = HEADERS, params = params)
r01_2 = requests.get(url = url_mediawiki, headers = HEADERS, params = params)
r01_3 = requests.get(url = url_metawiki, headers = HEADERS, params = params)
r01_4 = requests.get(url = url_commons, headers = HEADERS, params = params)
print(str(r01_1.json()) == str(r01_2.json()))
print(str(r01_1.json()) == str(r01_3.json()))
print(str(r01_1.json()) == str(r01_4.json()))

True
True
False


In [13]:
r01_1.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'File:François_Ier_Louvre.jpg ',
    'to': 'File:François Ier Louvre.jpg'}],
  'pages': {'-1': {'ns': 6,
    'title': 'File:François Ier Louvre.jpg',
    'missing': '',
    'known': '',
    'imagerepository': 'shared',
    'imageinfo': [{'user': 'Oakenchips',
      'comment': 'User created page with UploadWizard',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/8/87/Fran%C3%A7ois_Ier_Louvre.jpg',
      'descriptionurl': 'https://commons.wikimedia.org/wiki/File:Fran%C3%A7ois_Ier_Louvre.jpg',
      'descriptionshorturl': 'https://commons.wikimedia.org/w/index.php?curid=30275305',
      'metadata': [{'name': 'Make', 'value': 'Canon'},
       {'name': 'Model', 'value': 'Canon EOS 5D Mark III'},
       {'name': 'Orientation', 'value': 1},
       {'name': 'XResolution', 'value': '9437184/131072'},
       {'name': 'YResolution', 'value': '9437184/131072'},
       {'name': 'ResolutionUnit', 'value': 2},
       {'name': 'Soft

In [14]:
r01_4.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'File:François_Ier_Louvre.jpg ',
    'to': 'File:François Ier Louvre.jpg'}],
  'pages': {'30275305': {'pageid': 30275305,
    'ns': 6,
    'title': 'File:François Ier Louvre.jpg',
    'imagerepository': 'local',
    'imageinfo': [{'user': 'Oakenchips',
      'comment': 'User created page with UploadWizard',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/8/87/Fran%C3%A7ois_Ier_Louvre.jpg',
      'descriptionurl': 'https://commons.wikimedia.org/wiki/File:Fran%C3%A7ois_Ier_Louvre.jpg',
      'descriptionshorturl': 'https://commons.wikimedia.org/w/index.php?curid=30275305',
      'metadata': [{'name': 'Make', 'value': 'Canon'},
       {'name': 'Model', 'value': 'Canon EOS 5D Mark III'},
       {'name': 'Orientation', 'value': 1},
       {'name': 'XResolution', 'value': '9437184/131072'},
       {'name': 'YResolution', 'value': '9437184/131072'},
       {'name': 'ResolutionUnit', 'value': 2},
       {'name': 'Software', 

### Pictures licence CC BY 2.0

#### Richard Feynman

In [15]:
params = {'titles': richard,
          'action': 'query',
          'format': 'json',
          'prop': 'imageinfo',
          'iiprop': 'user|comment|url|metadata',
         }

r02_1 = requests.get(url = url_wikipedia, headers = HEADERS, params = params)
r02_2 = requests.get(url = url_mediawiki, headers = HEADERS, params = params)
r02_3 = requests.get(url = url_metawiki, headers = HEADERS, params = params)
r02_4 = requests.get(url = url_commons, headers = HEADERS, params = params)
print(str(r02_1.json()) == str(r02_2.json()))
print(str(r02_1.json()) == str(r02_3.json()))
print(str(r02_1.json()) == str(r02_4.json()))
# seems urls url_mediawiki, url_metawiki, and url_commons return no data
print(str(r02_2.json()) == str(r02_3.json()) == str(r02_4.json()))

False
False
False
True


In [16]:
r02_1.json()

{'continue': {'iistart': '2012-02-09T11:32:05Z', 'continue': '||'},
 'query': {'normalized': [{'from': 'File:Richard_Feynman_Nobel.jpg',
    'to': 'File:Richard Feynman Nobel.jpg'}],
  'pages': {'34664654': {'pageid': 34664654,
    'ns': 6,
    'title': 'File:Richard Feynman Nobel.jpg',
    'imagerepository': 'local',
    'imageinfo': [{'user': 'Materialscientist',
      'comment': 'resolution',
      'url': 'https://upload.wikimedia.org/wikipedia/en/4/42/Richard_Feynman_Nobel.jpg',
      'descriptionurl': 'https://en.wikipedia.org/wiki/File:Richard_Feynman_Nobel.jpg',
      'descriptionshorturl': 'https://en.wikipedia.org/w/index.php?curid=34664654',
      'metadata': [{'name': 'MEDIAWIKI_EXIF_VERSION', 'value': 1}]}]}}}}

In [17]:
r02_4.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'File:Richard_Feynman_Nobel.jpg',
    'to': 'File:Richard Feynman Nobel.jpg'}],
  'pages': {'-1': {'ns': 6,
    'title': 'File:Richard Feynman Nobel.jpg',
    'missing': '',
    'imagerepository': ''}}}}

#### Nika riots

In [18]:
params = {'titles': nika,
          'action': 'query',
          'format': 'json',
          'prop': 'imageinfo',
          'iiprop': 'user|comment|url|metadata',
         }

r03_1 = requests.get(url = url_wikipedia, headers = HEADERS, params = params)
r03_2 = requests.get(url = url_mediawiki, headers = HEADERS, params = params)
r03_3 = requests.get(url = url_metawiki, headers = HEADERS, params = params)
r03_4 = requests.get(url = url_commons, headers = HEADERS, params = params)
print(str(r03_1.json()) == str(r03_2.json()))
print(str(r03_1.json()) == str(r03_3.json()))
print(str(r03_1.json()) == str(r03_4.json()))
# seems urls url_mediawiki, url_metawiki, and url_commons return no data
print(str(r03_2.json()) == str(r03_3.json()) == str(r03_4.json()))

True
True
False
False


In [19]:
r03_1.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'File:Turkey-03228_-_Hippodrome_of_Constantinople_(11312626353).jpg',
    'to': 'File:Turkey-03228 - Hippodrome of Constantinople (11312626353).jpg'}],
  'pages': {'-1': {'ns': 6,
    'title': 'File:Turkey-03228 - Hippodrome of Constantinople (11312626353).jpg',
    'missing': '',
    'known': '',
    'imagerepository': 'shared',
    'imageinfo': [{'user': 'Artix Kreiger 2',
      'comment': 'Transferred from Flickr via [[Commons:Flickr2Commons|Flickr2Commons]]',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/0/00/Turkey-03228_-_Hippodrome_of_Constantinople_%2811312626353%29.jpg',
      'descriptionurl': 'https://commons.wikimedia.org/wiki/File:Turkey-03228_-_Hippodrome_of_Constantinople_(11312626353).jpg',
      'descriptionshorturl': 'https://commons.wikimedia.org/w/index.php?curid=66987701',
      'metadata': [{'name': 'ImageWidth', 'value': 4000},
       {'name': 'ImageLength', 'value': 6000},
       {'name': 'B

In [20]:
r03_4.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'File:Turkey-03228_-_Hippodrome_of_Constantinople_(11312626353).jpg',
    'to': 'File:Turkey-03228 - Hippodrome of Constantinople (11312626353).jpg'}],
  'pages': {'66987701': {'pageid': 66987701,
    'ns': 6,
    'title': 'File:Turkey-03228 - Hippodrome of Constantinople (11312626353).jpg',
    'imagerepository': 'local',
    'imageinfo': [{'user': 'Artix Kreiger 2',
      'comment': 'Transferred from Flickr via [[Commons:Flickr2Commons|Flickr2Commons]]',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/0/00/Turkey-03228_-_Hippodrome_of_Constantinople_%2811312626353%29.jpg',
      'descriptionurl': 'https://commons.wikimedia.org/wiki/File:Turkey-03228_-_Hippodrome_of_Constantinople_(11312626353).jpg',
      'descriptionshorturl': 'https://commons.wikimedia.org/w/index.php?curid=66987701',
      'metadata': [{'name': 'ImageWidth', 'value': 4000},
       {'name': 'ImageLength', 'value': 6000},
       {'name': 'BitsPerS

#### Covid

In [23]:
params = {'titles': covid,
          'action': 'query',
          'format': 'json',
          'prop': 'imageinfo',
          'iiprop': 'user|comment|url|metadata',
         }

r04_1 = requests.get(url = url_wikipedia, headers = HEADERS, params = params)
r04_2 = requests.get(url = url_mediawiki, headers = HEADERS, params = params)
r04_3 = requests.get(url = url_metawiki, headers = HEADERS, params = params)
r04_4 = requests.get(url = url_commons, headers = HEADERS, params = params)
print(str(r04_1.json()) == str(r04_2.json()))
print(str(r04_1.json()) == str(r04_3.json()))
print(str(r04_1.json()) == str(r04_4.json()))
# seems urls url_mediawiki, url_metawiki, and url_commons return no data
print(str(r04_2.json()) == str(r04_3.json()) == str(r04_4.json()))

False
False
False
False


In [24]:
r04_1.json()

{'continue': {'iistart': '2020-03-18T18:55:55Z', 'continue': '||'},
 'query': {'normalized': [{'from': 'File:Novel_Coronavirus_SARS-CoV-2.jpg',
    'to': 'File:Novel Coronavirus SARS-CoV-2.jpg'}],
  'pages': {'63511704': {'pageid': 63511704,
    'ns': 6,
    'title': 'File:Novel Coronavirus SARS-CoV-2.jpg',
    'imagerepository': 'shared',
    'imageinfo': [{'user': 'Ytoyoda',
      'comment': 'Sorry, misunderstood the split request',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/7/76/Novel_Coronavirus_SARS-CoV-2.jpg',
      'descriptionurl': 'https://commons.wikimedia.org/wiki/File:Novel_Coronavirus_SARS-CoV-2.jpg',
      'descriptionshorturl': 'https://commons.wikimedia.org/w/index.php?curid=92612457',
      'metadata': [{'name': 'ImageWidth', 'value': 3116},
       {'name': 'ImageLength', 'value': 3366},
       {'name': 'BitsPerSample', 'value': 8},
       {'name': 'Compression', 'value': 1},
       {'name': 'PhotometricInterpretation', 'value': 3},
       {'name': 'O

In [25]:
r04_2.json()

{'continue': {'iistart': '2020-03-18T18:55:55Z', 'continue': '||'},
 'query': {'normalized': [{'from': 'File:Novel_Coronavirus_SARS-CoV-2.jpg',
    'to': 'File:Novel Coronavirus SARS-CoV-2.jpg'}],
  'pages': {'-1': {'ns': 6,
    'title': 'File:Novel Coronavirus SARS-CoV-2.jpg',
    'missing': '',
    'known': '',
    'imagerepository': 'shared',
    'imageinfo': [{'user': 'Ytoyoda',
      'comment': 'Sorry, misunderstood the split request',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/7/76/Novel_Coronavirus_SARS-CoV-2.jpg',
      'descriptionurl': 'https://commons.wikimedia.org/wiki/File:Novel_Coronavirus_SARS-CoV-2.jpg',
      'descriptionshorturl': 'https://commons.wikimedia.org/w/index.php?curid=92612457',
      'metadata': [{'name': 'ImageWidth', 'value': 3116},
       {'name': 'ImageLength', 'value': 3366},
       {'name': 'BitsPerSample', 'value': 8},
       {'name': 'Compression', 'value': 1},
       {'name': 'PhotometricInterpretation', 'value': 3},
       {'nam

In [26]:
r04_3.json()

{'continue': {'iistart': '2020-03-18T18:55:55Z', 'continue': '||'},
 'query': {'normalized': [{'from': 'File:Novel_Coronavirus_SARS-CoV-2.jpg',
    'to': 'File:Novel Coronavirus SARS-CoV-2.jpg'}],
  'pages': {'-1': {'ns': 6,
    'title': 'File:Novel Coronavirus SARS-CoV-2.jpg',
    'missing': '',
    'known': '',
    'imagerepository': 'shared',
    'imageinfo': [{'user': 'Ytoyoda',
      'comment': 'Sorry, misunderstood the split request',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/7/76/Novel_Coronavirus_SARS-CoV-2.jpg',
      'descriptionurl': 'https://commons.wikimedia.org/wiki/File:Novel_Coronavirus_SARS-CoV-2.jpg',
      'descriptionshorturl': 'https://commons.wikimedia.org/w/index.php?curid=92612457',
      'metadata': [{'name': 'ImageWidth', 'value': 3116},
       {'name': 'ImageLength', 'value': 3366},
       {'name': 'BitsPerSample', 'value': 8},
       {'name': 'Compression', 'value': 1},
       {'name': 'PhotometricInterpretation', 'value': 3},
       {'nam

In [27]:
r04_4.json()

{'continue': {'iistart': '2020-03-18T18:55:55Z', 'continue': '||'},
 'query': {'normalized': [{'from': 'File:Novel_Coronavirus_SARS-CoV-2.jpg',
    'to': 'File:Novel Coronavirus SARS-CoV-2.jpg'}],
  'pages': {'92612457': {'pageid': 92612457,
    'ns': 6,
    'title': 'File:Novel Coronavirus SARS-CoV-2.jpg',
    'imagerepository': 'local',
    'imageinfo': [{'user': 'Ytoyoda',
      'comment': 'Sorry, misunderstood the split request',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/7/76/Novel_Coronavirus_SARS-CoV-2.jpg',
      'descriptionurl': 'https://commons.wikimedia.org/wiki/File:Novel_Coronavirus_SARS-CoV-2.jpg',
      'descriptionshorturl': 'https://commons.wikimedia.org/w/index.php?curid=92612457',
      'metadata': [{'name': 'ImageWidth', 'value': 3116},
       {'name': 'ImageLength', 'value': 3366},
       {'name': 'BitsPerSample', 'value': 8},
       {'name': 'Compression', 'value': 1},
       {'name': 'PhotometricInterpretation', 'value': 3},
       {'name': 'Or

### Compare results for same picture, different API urls, different 'prop' values

#### Albert Einstein

In [5]:
params = {'titles': albert,
          'action': 'query',
          'format': 'json',
          'prop': 'imageinfo',
          'iiprop': 'user|comment|url|metadata|mime|mediatype|extmetadata',
         }

r05_1 = requests.get(url = url_wikipedia, headers = HEADERS, params = params)
r05_2 = requests.get(url = url_commons, headers = HEADERS, params = params)

In [6]:
r05_1.json()

{'continue': {'iistart': '2008-06-06T22:27:45Z', 'continue': '||'},
 'query': {'normalized': [{'from': 'File:Albert_Einstein_Head.jpg',
    'to': 'File:Albert Einstein Head.jpg'}],
  'pages': {'-1': {'ns': 6,
    'title': 'File:Albert Einstein Head.jpg',
    'missing': '',
    'known': '',
    'imagerepository': 'shared',
    'imageinfo': [{'user': 'Triggerhippie4',
      'comment': 'higher resolution, quality',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/d/d3/Albert_Einstein_Head.jpg',
      'descriptionurl': 'https://commons.wikimedia.org/wiki/File:Albert_Einstein_Head.jpg',
      'descriptionshorturl': 'https://commons.wikimedia.org/w/index.php?curid=925243',
      'metadata': [{'name': 'MEDIAWIKI_EXIF_VERSION', 'value': 1}],
      'extmetadata': {'DateTime': {'value': '2014-11-25 19:59:28',
        'source': 'mediawiki-metadata',
        'hidden': ''},
       'ObjectName': {'value': 'Albert Einstein Head',
        'source': 'mediawiki-metadata',
        'hidden': '

In [30]:
r05_2.json()

{'continue': {'iistart': '2008-06-06T22:27:45Z', 'continue': '||'},
 'query': {'normalized': [{'from': 'File:Albert_Einstein_Head.jpg',
    'to': 'File:Albert Einstein Head.jpg'}],
  'pages': {'925243': {'pageid': 925243,
    'ns': 6,
    'title': 'File:Albert Einstein Head.jpg',
    'imagerepository': 'local',
    'imageinfo': [{'user': 'Triggerhippie4',
      'comment': 'higher resolution, quality',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/d/d3/Albert_Einstein_Head.jpg',
      'descriptionurl': 'https://commons.wikimedia.org/wiki/File:Albert_Einstein_Head.jpg',
      'descriptionshorturl': 'https://commons.wikimedia.org/w/index.php?curid=925243',
      'metadata': [{'name': 'MEDIAWIKI_EXIF_VERSION', 'value': 1}],
      'extmetadata': {'DateTime': {'value': '2014-11-25 19:59:28',
        'source': 'mediawiki-metadata',
        'hidden': ''},
       'ObjectName': {'value': 'Albert Einstein Head',
        'source': 'mediawiki-metadata',
        'hidden': ''},
       

In [59]:
def print_image_info(requests_response):
    j = requests_response.json()
    print(j.keys())
    print(j['query'].keys())
    print(j['query']['pages'].keys())
    for id_ in j['query']['pages'].keys():
        print(j['query']['pages'][id_].keys())
        print(j['query']['pages'][id_]['imagerepository'])
        print(len(j['query']['pages'][id_]['imageinfo']))
        print(j['query']['pages'][id_]['imageinfo'][0].keys())
        print(j['query']['pages'][id_]['imageinfo'][0]['extmetadata'].keys())
        print()
        for k, v in j['query']['pages'][id_]['imageinfo'][0]['extmetadata'].items():
            print('\t', k, ': ', v)
        print()

In [60]:
print_image_info(r05_1)

dict_keys(['continue', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['-1'])
dict_keys(['ns', 'title', 'missing', 'known', 'imagerepository', 'imageinfo'])
shared
1
dict_keys(['user', 'comment', 'url', 'descriptionurl', 'descriptionshorturl', 'metadata', 'extmetadata', 'mime', 'mediatype'])
dict_keys(['DateTime', 'ObjectName', 'CommonsMetadataExtension', 'Categories', 'Assessments', 'ImageDescription', 'DateTimeOriginal', 'Credit', 'Artist', 'LicenseShortName', 'UsageTerms', 'AttributionRequired', 'Copyrighted', 'Restrictions', 'License'])

	 DateTime :  {'value': '2014-11-25 19:59:28', 'source': 'mediawiki-metadata', 'hidden': ''}
	 ObjectName :  {'value': 'Albert Einstein Head', 'source': 'mediawiki-metadata', 'hidden': ''}
	 CommonsMetadataExtension :  {'value': 1.2, 'source': 'extension', 'hidden': ''}
	 Categories :  {'value': '1947 portrait photographs of men|68-year-old human males|Albert Einstein by Oren Jack Turner (1947)|Black and white photographs of men looking at 

In [61]:
print_image_info(r05_2)

dict_keys(['continue', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['925243'])
dict_keys(['pageid', 'ns', 'title', 'imagerepository', 'imageinfo'])
local
1
dict_keys(['user', 'comment', 'url', 'descriptionurl', 'descriptionshorturl', 'metadata', 'extmetadata', 'mime', 'mediatype'])
dict_keys(['DateTime', 'ObjectName', 'CommonsMetadataExtension', 'Categories', 'Assessments', 'ImageDescription', 'DateTimeOriginal', 'Credit', 'Artist', 'LicenseShortName', 'UsageTerms', 'AttributionRequired', 'Copyrighted', 'Restrictions', 'License'])

	 DateTime :  {'value': '2014-11-25 19:59:28', 'source': 'mediawiki-metadata', 'hidden': ''}
	 ObjectName :  {'value': 'Albert Einstein Head', 'source': 'mediawiki-metadata', 'hidden': ''}
	 CommonsMetadataExtension :  {'value': 1.2, 'source': 'extension', 'hidden': ''}
	 Categories :  {'value': '1947 portrait photographs of men|68-year-old human males|Albert Einstein by Oren Jack Turner (1947)|Black and white photographs of men looking at viewer|

#### Francis I of France

In [58]:
params = {'titles': francis,
          'action': 'query',
          'format': 'json',
          'prop': 'imageinfo',
          'iiprop': 'user|comment|url|metadata|mime|mediatype|extmetadata',
         }

r06_1 = requests.get(url = url_wikipedia, headers = HEADERS, params = params)
r06_2 = requests.get(url = url_commons, headers = HEADERS, params = params)

In [62]:
print_image_info(r06_1)

dict_keys(['batchcomplete', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['-1'])
dict_keys(['ns', 'title', 'missing', 'known', 'imagerepository', 'imageinfo'])
shared
1
dict_keys(['user', 'comment', 'url', 'descriptionurl', 'descriptionshorturl', 'metadata', 'extmetadata', 'mime', 'mediatype'])
dict_keys(['DateTime', 'ObjectName', 'CommonsMetadataExtension', 'Categories', 'Assessments', 'Credit', 'LicenseShortName', 'UsageTerms', 'AttributionRequired', 'Copyrighted', 'Restrictions', 'License'])

	 DateTime :  {'value': '2013-12-21 16:34:30', 'source': 'mediawiki-metadata', 'hidden': ''}
	 ObjectName :  {'value': 'François Ier Louvre', 'source': 'mediawiki-metadata', 'hidden': ''}
	 CommonsMetadataExtension :  {'value': 1.2, 'source': 'extension', 'hidden': ''}
	 Categories :  {'value': 'Artworks with Wikidata item|Artworks with Wikidata item missing author|Artworks with accession number from Wikidata|Artworks with known accession number|Author died more than 100 years ago pub

In [63]:
print_image_info(r06_2)

dict_keys(['batchcomplete', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['30275305'])
dict_keys(['pageid', 'ns', 'title', 'imagerepository', 'imageinfo'])
local
1
dict_keys(['user', 'comment', 'url', 'descriptionurl', 'descriptionshorturl', 'metadata', 'extmetadata', 'mime', 'mediatype'])
dict_keys(['DateTime', 'ObjectName', 'CommonsMetadataExtension', 'Categories', 'Assessments', 'Credit', 'LicenseShortName', 'UsageTerms', 'AttributionRequired', 'Copyrighted', 'Restrictions', 'License'])

	 DateTime :  {'value': '2013-12-21 16:34:30', 'source': 'mediawiki-metadata', 'hidden': ''}
	 ObjectName :  {'value': 'François Ier Louvre', 'source': 'mediawiki-metadata', 'hidden': ''}
	 CommonsMetadataExtension :  {'value': 1.2, 'source': 'extension', 'hidden': ''}
	 Categories :  {'value': 'Artworks with Wikidata item|Artworks with Wikidata item missing author|Artworks with accession number from Wikidata|Artworks with known accession number|Author died more than 100 years ago public d

#### Richard Feynman

In [64]:
params = {'titles': richard,
          'action': 'query',
          'format': 'json',
          'prop': 'imageinfo',
          'iiprop': 'user|comment|url|metadata|mime|mediatype|extmetadata',
         }

r07_1 = requests.get(url = url_wikipedia, headers = HEADERS, params = params)
r07_2 = requests.get(url = url_commons, headers = HEADERS, params = params)

In [65]:
print_image_info(r07_1)

dict_keys(['continue', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['34664654'])
dict_keys(['pageid', 'ns', 'title', 'imagerepository', 'imageinfo'])
local
1
dict_keys(['user', 'comment', 'url', 'descriptionurl', 'descriptionshorturl', 'metadata', 'extmetadata', 'mime', 'mediatype'])
dict_keys(['DateTime', 'ObjectName', 'CommonsMetadataExtension', 'Categories', 'Assessments', 'ImageDescription', 'Credit', 'DateTimeOriginal', 'Artist', 'Restrictions'])

	 DateTime :  {'value': '2016-07-24 11:28:27', 'source': 'mediawiki-metadata', 'hidden': ''}
	 ObjectName :  {'value': 'Richard Feynman Nobel', 'source': 'mediawiki-metadata', 'hidden': ''}
	 CommonsMetadataExtension :  {'value': 1.2, 'source': 'extension', 'hidden': ''}
	 Categories :  {'value': 'All free in US media|All media requiring a US status confirmation|Files deleted on Wikimedia Commons|Files with no machine-readable license|Nobel laureates in Physics|PD-Sweden images with unknown US copyright status|Pre-1996 PD in h

In [66]:
print_image_info(r07_2)

dict_keys(['batchcomplete', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['-1'])
dict_keys(['ns', 'title', 'missing', 'imagerepository'])



KeyError: 'imageinfo'

In [67]:
r07_2.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'File:Richard_Feynman_Nobel.jpg',
    'to': 'File:Richard Feynman Nobel.jpg'}],
  'pages': {'-1': {'ns': 6,
    'title': 'File:Richard Feynman Nobel.jpg',
    'missing': '',
    'imagerepository': ''}}}}

#### Nika

In [68]:
params = {'titles': nika,
          'action': 'query',
          'format': 'json',
          'prop': 'imageinfo',
          'iiprop': 'user|comment|url|metadata|mime|mediatype|extmetadata',
         }

r08_1 = requests.get(url = url_wikipedia, headers = HEADERS, params = params)
r08_2 = requests.get(url = url_commons, headers = HEADERS, params = params)

In [71]:
r08_1.json()

{'batchcomplete': '',
 'query': {'normalized': [{'from': 'File:Turkey-03228_-_Hippodrome_of_Constantinople_(11312626353).jpg',
    'to': 'File:Turkey-03228 - Hippodrome of Constantinople (11312626353).jpg'}],
  'pages': {'-1': {'ns': 6,
    'title': 'File:Turkey-03228 - Hippodrome of Constantinople (11312626353).jpg',
    'missing': '',
    'known': '',
    'imagerepository': 'shared',
    'imageinfo': [{'user': 'Artix Kreiger 2',
      'comment': 'Transferred from Flickr via [[Commons:Flickr2Commons|Flickr2Commons]]',
      'url': 'https://upload.wikimedia.org/wikipedia/commons/0/00/Turkey-03228_-_Hippodrome_of_Constantinople_%2811312626353%29.jpg',
      'descriptionurl': 'https://commons.wikimedia.org/wiki/File:Turkey-03228_-_Hippodrome_of_Constantinople_(11312626353).jpg',
      'descriptionshorturl': 'https://commons.wikimedia.org/w/index.php?curid=66987701',
      'metadata': [{'name': 'ImageWidth', 'value': 4000},
       {'name': 'ImageLength', 'value': 6000},
       {'name': 'B

In [69]:
print_image_info(r08_1)

dict_keys(['batchcomplete', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['-1'])
dict_keys(['ns', 'title', 'missing', 'known', 'imagerepository', 'imageinfo'])
shared
1
dict_keys(['user', 'comment', 'url', 'descriptionurl', 'descriptionshorturl', 'metadata', 'extmetadata', 'mime', 'mediatype'])
dict_keys(['DateTime', 'ObjectName', 'CommonsMetadataExtension', 'Categories', 'Assessments', 'ImageDescription', 'DateTimeOriginal', 'Credit', 'Artist', 'Permission', 'LicenseShortName', 'UsageTerms', 'AttributionRequired', 'LicenseUrl', 'Copyrighted', 'Restrictions', 'License'])

	 DateTime :  {'value': '2018-03-02 21:13:59', 'source': 'mediawiki-metadata', 'hidden': ''}
	 ObjectName :  {'value': 'Turkey-03228 - Hippodrome of Constantinople (11312626353)', 'source': 'mediawiki-metadata', 'hidden': ''}
	 CommonsMetadataExtension :  {'value': 1.2, 'source': 'extension', 'hidden': ''}
	 Categories :  {'value': 'Files uploaded by Artix Kreiger|Flickr images reviewed by FlickreviewR 2|Obe

In [70]:
print_image_info(r08_2)

dict_keys(['batchcomplete', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['66987701'])
dict_keys(['pageid', 'ns', 'title', 'imagerepository', 'imageinfo'])
local
1
dict_keys(['user', 'comment', 'url', 'descriptionurl', 'descriptionshorturl', 'metadata', 'extmetadata', 'mime', 'mediatype'])
dict_keys(['DateTime', 'ObjectName', 'CommonsMetadataExtension', 'Categories', 'Assessments', 'ImageDescription', 'DateTimeOriginal', 'Credit', 'Artist', 'Permission', 'LicenseShortName', 'UsageTerms', 'AttributionRequired', 'LicenseUrl', 'Copyrighted', 'Restrictions', 'License'])

	 DateTime :  {'value': '2018-03-02 21:13:59', 'source': 'mediawiki-metadata', 'hidden': ''}
	 ObjectName :  {'value': 'Turkey-03228 - Hippodrome of Constantinople (11312626353)', 'source': 'mediawiki-metadata', 'hidden': ''}
	 CommonsMetadataExtension :  {'value': 1.2, 'source': 'extension', 'hidden': ''}
	 Categories :  {'value': 'Files uploaded by Artix Kreiger|Flickr images reviewed by FlickreviewR 2|Obelisk 

#### Covid

In [72]:
params = {'titles': covid,
          'action': 'query',
          'format': 'json',
          'prop': 'imageinfo',
          'iiprop': 'user|comment|url|metadata|mime|mediatype|extmetadata',
         }

r09_1 = requests.get(url = url_wikipedia, headers = HEADERS, params = params)
r09_2 = requests.get(url = url_commons, headers = HEADERS, params = params)

In [73]:
print_image_info(r09_1)

dict_keys(['continue', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['63511704'])
dict_keys(['pageid', 'ns', 'title', 'imagerepository', 'imageinfo'])
shared
1
dict_keys(['user', 'comment', 'url', 'descriptionurl', 'descriptionshorturl', 'metadata', 'extmetadata', 'mime', 'mediatype'])
dict_keys(['DateTime', 'ObjectName', 'CommonsMetadataExtension', 'Categories', 'Assessments', 'ImageDescription', 'DateTimeOriginal', 'Credit', 'Artist', 'LicenseShortName', 'UsageTerms', 'AttributionRequired', 'LicenseUrl', 'Copyrighted', 'Restrictions', 'License'])

	 DateTime :  {'value': '2020-04-10 22:18:29', 'source': 'mediawiki-metadata', 'hidden': ''}
	 ObjectName :  {'value': 'Novel Coronavirus SARS-CoV-2', 'source': 'mediawiki-metadata', 'hidden': ''}
	 CommonsMetadataExtension :  {'value': 1.2, 'source': 'extension', 'hidden': ''}
	 Categories :  {'value': 'Featured pictures on Wikipedia, Arabic|Featured pictures on Wikipedia, English|Files from NIAID Flickr stream|Files with Assessm

In [75]:
print_image_info(r09_1)

dict_keys(['continue', 'query'])
dict_keys(['normalized', 'pages'])
dict_keys(['63511704'])
dict_keys(['pageid', 'ns', 'title', 'imagerepository', 'imageinfo'])
shared
1
dict_keys(['user', 'comment', 'url', 'descriptionurl', 'descriptionshorturl', 'metadata', 'extmetadata', 'mime', 'mediatype'])
dict_keys(['DateTime', 'ObjectName', 'CommonsMetadataExtension', 'Categories', 'Assessments', 'ImageDescription', 'DateTimeOriginal', 'Credit', 'Artist', 'LicenseShortName', 'UsageTerms', 'AttributionRequired', 'LicenseUrl', 'Copyrighted', 'Restrictions', 'License'])

	 DateTime :  {'value': '2020-04-10 22:18:29', 'source': 'mediawiki-metadata', 'hidden': ''}
	 ObjectName :  {'value': 'Novel Coronavirus SARS-CoV-2', 'source': 'mediawiki-metadata', 'hidden': ''}
	 CommonsMetadataExtension :  {'value': 1.2, 'source': 'extension', 'hidden': ''}
	 Categories :  {'value': 'Featured pictures on Wikipedia, Arabic|Featured pictures on Wikipedia, English|Files from NIAID Flickr stream|Files with Assessm