** FIND INFORMATION OF US PRESIDENTS USING WIKIDATA**

Wikidata is a free and open knowledge base that can be read and edited by both humans and machines.
Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.

** SETUP**

Load the necessary libraries;

In [338]:
# load package of wikidata,  helpers and sparqlwrapper
!pip install Wikidata
!pip install helpers
!pip install sparqlwrapper



In [339]:
import sys
from SPARQLWrapper import SPARQLWrapper, JSON
import requests
import helpers
import pandas as pd
import matplotlib as mpl


**Define the SPARQL query**

SPARQL, short for “SPARQL Protocol and RDF Query Language”, enables users to query information from databases or any data source that can be mapped to RDF.

The SPARQL standard is designed and endorsed by the W3C and helps users and developers focus on what they would like to know instead of how a database is organized.

First we define the template, which serve as shortcuts to resolve resources. The names following the SELECT keyword are variables, which are indicated by a ? Template.

what these variables mean is defined by triple patterns that follow in the WHERE clause. The first triple basically says that ?p stands for the wikidata entity Q11696, " President of the United States of America". The following triples define ?p  ?ppicture ?w  ?wpicture ?birth_date ?position_held ?place_birth ?picture_presidents ?country_citizenship ?starttime  and ?endtime as properties P569 (birth_date), P19 (place_birth), P18 (picture_presidents), P27 (country_of_citizenship), P580 (starttime), and P582 (endtime).

In [343]:

query = """#Présidents et leur épouses
#TEMPLATE={"template":"Presidents of ?country and their spouses","variables":{"?country":{"query":" SELECT ?id WHERE { ?id wdt:P31 wd:Q6256 . }"} } }
SELECT ?p  ?ppicture ?w  ?pLabel ?wpicture ?birth_date ?position_held ?place_birth ?picture_presidents ?country_citizenship ?starttime ?endtime WHERE {
  BIND(wd:Q30 AS ?country)

  ?country (p:P6/ps:P6) ?p.
  ?p wdt:P26 ?w.

  ?p wdt:P569 ?birth_date.
  ?p wdt:P19 ?place_birth.
  ?p wdt:P18 ?picture_presidents.
  ?p wdt:P27 ?country_of_citizenship.
  #-----------------------------------------
  wd:Q6279 p:P39 ?statement.
  ?statement pq:P580 ?starttime.
  ?statement pq:P582 ?endtime.
  #-----------------------------------------------
  OPTIONAL {
   ?p wdt:P18 ?ppicture.
   ?w wdt:P18 ?wpicture.
  }
  OPTIONAL {
    ?p wdt:P27 ?country_citizenship.
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
"""



**Get and process the data**

 we send an HTTP request to the SPARQL endpoint providing the query as a URL parameter, we also specify that we want the result encoded as JSON rather than the default XML. Thanks to the requests library this is practically self-explaining code.

In [344]:
# link of our data and get our data using requests of query 
url = 'https://query.wikidata.org/sparql'
data = requests.get(url, params={'query': query, 'format': 'json'}).json()

Now we iterate through the result, creating a list of dictionaries, each of which contains values for the query variables defined above. Then we create a Pandas DataFrame from this list, print its length and the first few rows.

In [345]:
#print(data)
presidents = []
for item in data['results']['bindings']:
    presidents.append({
        'starttime': item['starttime']['value'],
        'endtime': item['endtime']['value'],
        'date_birth  ': item['birth_date']['value'],
        'name_presidents': item['pLabel']['value'],
        'picture_presidents': item['picture_presidents']['value'],
        #'place_birth': item['place_birth']['value'],
        #'country_citizenship': item['country_citizenship']['value'],

        })



In [346]:
# dataframe of presidents
df = pd.DataFrame(presidents)
# len of our dataframe 
print(len(df))
# firt 10 rows of our dataframe 
df

1474


Unnamed: 0,starttime,endtime,date_birth,name_presidents,picture_presidents
0,2009-01-03T00:00:00Z,2009-01-15T00:00:00Z,1732-02-22T00:00:00Z,George Washington,http://commons.wikimedia.org/wiki/Special:File...
1,1991-01-03T00:00:00Z,1993-01-03T00:00:00Z,1732-02-22T00:00:00Z,George Washington,http://commons.wikimedia.org/wiki/Special:File...
2,1979-01-03T00:00:00Z,1981-01-03T00:00:00Z,1732-02-22T00:00:00Z,George Washington,http://commons.wikimedia.org/wiki/Special:File...
3,1973-01-03T00:00:00Z,1975-01-03T00:00:00Z,1732-02-22T00:00:00Z,George Washington,http://commons.wikimedia.org/wiki/Special:File...
4,1989-01-03T00:00:00Z,1991-01-03T00:00:00Z,1732-02-22T00:00:00Z,George Washington,http://commons.wikimedia.org/wiki/Special:File...
...,...,...,...,...,...
1469,1997-01-03T00:00:00Z,1999-01-03T00:00:00Z,1942-11-20T00:00:00Z,Joe Biden,http://commons.wikimedia.org/wiki/Special:File...
1470,2007-01-03T00:00:00Z,2009-01-03T00:00:00Z,1942-11-20T00:00:00Z,Joe Biden,http://commons.wikimedia.org/wiki/Special:File...
1471,1975-01-03T00:00:00Z,1977-01-03T00:00:00Z,1942-11-20T00:00:00Z,Joe Biden,http://commons.wikimedia.org/wiki/Special:File...
1472,1995-01-03T00:00:00Z,1997-01-03T00:00:00Z,1942-11-20T00:00:00Z,Joe Biden,http://commons.wikimedia.org/wiki/Special:File...


In [347]:
# convert two columns to datetime 
df['starttime'] = pd.to_datetime(df['starttime'])
df['endtime'] = pd.to_datetime(df['endtime'])

In [348]:
df.info()   

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1474 entries, 0 to 1473
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype              
---  ------              --------------  -----              
 0   starttime           1474 non-null   datetime64[ns, UTC]
 1   endtime             1474 non-null   datetime64[ns, UTC]
 2   date_birth          1474 non-null   object             
 3   name_presidents     1474 non-null   object             
 4   picture_presidents  1474 non-null   object             
dtypes: datetime64[ns, UTC](2), object(3)
memory usage: 57.7+ KB


In [349]:
df.tail(50)

Unnamed: 0,starttime,endtime,date_birth,name_presidents,picture_presidents
1424,1995-01-03 00:00:00+00:00,1997-01-03 00:00:00+00:00,1924-06-12T00:00:00Z,George H. W. Bush,http://commons.wikimedia.org/wiki/Special:File...
1425,2001-01-03 00:00:00+00:00,2003-01-03 00:00:00+00:00,1924-06-12T00:00:00Z,George H. W. Bush,http://commons.wikimedia.org/wiki/Special:File...
1426,1970-11-04 00:00:00+00:00,1972-11-08 00:00:00+00:00,1924-06-12T00:00:00Z,George H. W. Bush,http://commons.wikimedia.org/wiki/Special:File...
1427,2009-01-20 00:00:00+00:00,2017-01-20 00:00:00+00:00,1924-06-12T00:00:00Z,George H. W. Bush,http://commons.wikimedia.org/wiki/Special:File...
1428,2003-01-03 00:00:00+00:00,2005-01-03 00:00:00+00:00,1924-06-12T00:00:00Z,George H. W. Bush,http://commons.wikimedia.org/wiki/Special:File...
1429,1999-01-03 00:00:00+00:00,2001-01-03 00:00:00+00:00,1924-06-12T00:00:00Z,George H. W. Bush,http://commons.wikimedia.org/wiki/Special:File...
1430,2009-01-03 00:00:00+00:00,2009-01-15 00:00:00+00:00,1924-10-01T00:00:00Z,Jimmy Carter,http://commons.wikimedia.org/wiki/Special:File...
1431,1991-01-03 00:00:00+00:00,1993-01-03 00:00:00+00:00,1924-10-01T00:00:00Z,Jimmy Carter,http://commons.wikimedia.org/wiki/Special:File...
1432,1979-01-03 00:00:00+00:00,1981-01-03 00:00:00+00:00,1924-10-01T00:00:00Z,Jimmy Carter,http://commons.wikimedia.org/wiki/Special:File...
1433,1973-01-03 00:00:00+00:00,1975-01-03 00:00:00+00:00,1924-10-01T00:00:00Z,Jimmy Carter,http://commons.wikimedia.org/wiki/Special:File...


In [352]:
df = df.sort_values(by='starttime', ascending=False)

In [353]:
df = df.to_csv('sort_us_president.csv')

 **Summary**

This notebook showed informations of us president with the wikidata query service from python.