# Abstract

Final project for Electronic Publishing and Digital Storytelling in fullfillment of an LM in Digital Humanities and Digital Knowledge from the University of Bologna.

## Project Aims
Wikidata is one of the largest free and open knowledge databases in the world. 
Launched in 2012, it now contains over 97 million items, over six million of them people.

This project investigates how Wikidata describes art historians and how those descriptions differ across gender.
This project serves as a case study in how our descriptions of history create history.

### Phase 1: Overview
We first wanted to get an wide view of Wikidata's data on art historians.
To do this we first queried art historians grouped by gender.

In [30]:
#insert Denise's initial query that breaks down those with art historian/sub groups into genders
from SPARQLWrapper import SPARQLWrapper, JSON
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

# get the endpoint API
wikidata_endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"

# prepare the query : 10 random triples
my_SPARQL_query = """
SELECT ?gender (count(distinct ?human) as ?number)
WHERE
{
  ?human wdt:P31 wd:Q5
  ; wdt:P21 ?gender
  ; wdt:P106/wdt:P279* wd:Q1792450 .
}
GROUP BY ?gender
LIMIT 10

"""

# set the endpoint 
sparql_wd = SPARQLWrapper(wikidata_endpoint)
# set the query
sparql_wd.setQuery(my_SPARQL_query)
# set the returned format
sparql_wd.setReturnFormat(JSON)
# get the results
results = sparql_wd.query().convert()

# manipulate the result
for result in results["results"]["bindings"]:
    print(result["gender"]["value"], result["number"]["value"])
print("💩")

http://www.wikidata.org/entity/Q6581097 11707
http://www.wikidata.org/entity/Q6581072 5828
http://www.wikidata.org/entity/Q48270 2
💩


Then we wanted to look at the properties used to describe art historians across genders. So we ran a query to count the number of properties used for each

In [34]:
#insert Sarah's query getting property counts
from SPARQLWrapper import SPARQLWrapper, JSON
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

# get the endpoint API
wikidata_endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"

# prepare the query
my_SPARQL_query = """
SELECT ?gender (count(distinct ?property) as ?number)
WHERE
{SERVICE wikibase:label {
     bd:serviceParam wikibase:language "en" .
   }

  ?human wdt:P31 wd:Q5
  ; wdt:P21 ?gender
  ; ?property ?object
  ; wdt:P106/wdt:P279* wd:Q1792450 .

}

GROUP BY ?gender
"""
# set the endpoint 
sparql_wd = SPARQLWrapper(wikidata_endpoint)
# set the query
sparql_wd.setQuery(my_SPARQL_query)
# set the returned format
sparql_wd.setReturnFormat(JSON)
# get the results
results = sparql_wd.query().convert()

# manipulate the result
for result in results["results"]["bindings"]:
    print(result["gender"]["value"], result["number"]["value"])
print("🦐")


http://www.wikidata.org/entity/Q6581097 2656
http://www.wikidata.org/entity/Q6581072 1797
http://www.wikidata.org/entity/Q48270 233
🦐


__NOTE!!!! This code copies everything from the first one and just changes the query. Is there a more efficient way to do this? It seems good to have it all bc you could run whichever query you want whenever you want but if we know she's going to run all the preceeding code before, maybe there's a way to make it more efficient (eg. only import once, reuse variables, etc.?)__

We also wanted to look at basic trends over time.

In [None]:
#queries for seeing if more art historians/women art historians from later periods are better or worse represented

Optional: trends over geographic space?

### Phase 2: Types of Properties
Then we wanted to break down those properties into types to see if certain properties/types of properties appear more often for some genders over others.

In [None]:
#query for types of properties. eg. external authority linking, personal relationships, professional relationships, geographic, publications, etc.

### Phase 2a: Professions and Occupations

In [None]:
#various queries about these; how many other jobs do people have, what area are the in, what are the most popular other jobs for each gender?

Then we analyzed each of these areas more deeply.
### Phase 2b: Personal Relationships
Are men or women more likely to have personal relationships listed? What kinds of relationships appear?

In [None]:
#various queries for these; eg. spouses/partners, children, parents

### Phase 2c: professional relationships queries: 
are women more likely to engaged professionally with other women? men? students? what about institutional relationships? awards?