# Inspect the properties on Wikidata

In this notebook we will inspect the different properties of Wikidata and show which ones are interesting to study in a statistical analysis.
The most essential thing to focus on is which properties are present in the maximum number of instances. It is also important that the instances share most of the properties.

To do this, we set up a query that retrieves all the properties but first we exclude the properties including an identifier (because there is no interest to study and this reduces the number of responses). We also exclude properties with too few instances (relative to the number of instances in the query). It drastically decreases the execution time of the request.

All the queries are first written on the SQLite database to keep them in memory. (to create it see. https://github.com/Semantic-Data-for-Humanities/Economists_Jurists/blob/development/Notebooks/Merge/Database_SQlite.ipynb).

In [1]:
# Import libraries usefull

from SPARQLWrapper import SPARQLWrapper, SPARQLWrapper2, JSON, TURTLE, XML, RDFXML
import pprint
import csv
# from bs4 import BeautifulSoup

from collections import Counter
from operator import itemgetter
import pandas as pd

import sqlite3 as sql
import time

from importlib import reload
from shutil import copyfile

In [2]:
import sparql_functions as spqf # It's made-home fonctions created by Francesco Beretta
# so they must to be in the same folder as this file.

# Query economists and jurists

In [19]:
### It's define the database ligne to use
pk_query = 55

# Connexion to the database
original_db = 'data/sparql_queries.db'
conn = sql.connect(original_db)

c = conn.cursor()

### It runs the query on the SQLite database to get the row values 
c.execute('SELECT * FROM query WHERE pk_query = ?', [pk_query]) ### a list around argument is needed for a string longer then one
#c.execute('SELECT * FROM query WHERE pk_query = 10')

rc = c.fetchone()

# close connexion
conn.close()


In [20]:
print(rc[2] +  "\n-----\n" + rc[4] +  "\n-----\n" +   rc[7]+  "\n\n\n------------------\n" +  rc[5] + "\n\n\n------------------\n")

TypeError: 'NoneType' object is not subscriptable

In [57]:
### Execute the SPARQL query wrapped in the function in the library _sparql_functions.py_
# The first setting correspond to SPARQL Endpoint, the seconde to the query
q = spqf.get_json_sparql_result(rc[4],rc[5])

KeyboardInterrupt: 

In [8]:
### This fonction retrieves and shapes time

# definition
def timestamp_formatted_for_file_name():
    is_now = time.strftime('%Y%m%d_%H%M%S')
    return is_now

# execution
timestamp_formatted_for_file_name()

'20210527_092844'

In [9]:
# Define the file addresses, the existing one and new one
original_db = 'data/sparql_queries.db'

timestamped_db_copy = 'data/sparql_queries_' + timestamp_formatted_for_file_name() + '.sqlite'

In [10]:
## Documentation:
# https://docs.python.org/3/library/shutil.htmlcopied_db = copyfile(original_db, timestamped_db_copy)

copied_db = copyfile(original_db, timestamped_db_copy)
copied_db

'data/sparql_queries_20210527_092846.sqlite'

In [41]:
### store the answer of the SPARQL endpoint in the 'result'

conn = sql.connect(original_db)
c = conn.cursor()
values = (pk_query, str(q),timestamp_formatted_for_file_name())

# https://www.techonthenet.com/sqlite/functions/now.php
c.execute("INSERT INTO result (fk_query, result, timestmp) VALUES (?,?,?)", values)
# commit the insertion and close the database
# !! REQUIRE to be enabled to commit to the SQLite database !!
# conn.commit()
conn.close()

In [24]:
### Inspect the resultat after an insert

# Choose the row of the database to get
pk_result = ('8')

# connexion to the database
original_db = 'data/sparql_queries.db'
conn = sql.connect(original_db)

### execute the query on the SQLite database to retrieve the values of the row
c = conn.cursor()
c.execute('SELECT * FROM result WHERE pk_result = ?', pk_result)
result_q = c.fetchone()

# close connexion
conn.close()
# result_q[3]

In [25]:
### Transform string to dict
## Doc.:
# https://stackoverflow.com/questions/988228/convert-a-string-representation-of-a-dictionary-to-a-dictionary
import ast
d = ast.literal_eval(result_q[3])
type(d)

dict

In [26]:
##### Transform the result into a list with a fonction of the library #####

#### Result of the query on the economist
r_eco = [l for l in spqf.sparql_result_to_list(d)]
print(len(r_eco))
r_eco[:10]
#### Result of the Dbpedia query
#r_dbp = [l for l in spqf.sparql_result_to_list(d)]
#print(len(r_dbp))
#r_dbp[:10]
#### Result of the Dbpedia query (ressource "Lawyer")
#r_dbp_l = [l for l in spqf.sparql_result_to_list(d)]
#print(len(r_dbp_l))
#r_dbp_l[:10]
#### Result of the Wikidata query
#r_wk = [l for l in spqf.sparql_result_to_list(d)]
#print(len(r_wk))
#r_wk[:10]


2387


[['http://dbpedia.org/resource/Luc-Normand_Tellier',
  'Luc-Normand Tellier',
  '-19014-65532-65528T11:19:00-12:41',
  '1821-03-26',
  'Luc-Normand Tellier (born October 10, 1944) is a Professor Emeritus in spatial economics of the University of Quebec at Montreal.'],
 ['http://dbpedia.org/resource/Ludwik_Maurycy_Landau',
  'Ludwik Maurycy Landau',
  '-8708-65531-65519T00:07:00+00:07',
  '1851-02-24',
  'Ludwik Maurycy Landau (31 May 1902 – 29 February 1944) was a Polish economist and statistician, a member of the Polish resistance movement in World War II, and a victim of the Holocaust.'],
 ['http://dbpedia.org/resource/Luigi_Pasinetti',
  'Luigi Pasinetti',
  '-19014-65532-65528T11:19:00-12:41',
  '1821-03-26',
  'Luigi L. Pasinetti (born September 12, 1930) is an Italian economist of the post-Keynesian school. Pasinetti is considered the heir of the "Cambridge Keynesians" and a student of Piero Sraffa and Richard Kahn. Along with them, as well as Joan Robinson, he was one of the pro