# Collage, the shapes of arte
This notebook was created by [Francesca Borriello](https://github.com/Fran-cesca), [Lorenza Pierucci](https://github.com/LorenzaPierucci) and [Laura Travaglini](https://github.com/lauratravaglini) as part of their final project for the [Digital Pubishing and Electronic Storytelling](https://www.unibo.it/it/didattica/insegnamenti/insegnamento/2021/443749) course at the University of Bologna (academic year 221/2022).

# About the project
Starting from the datasets made publicly available by the NY Museum of Modern Art (**MoMA**) and by the **Tate galleries**, *Collage, the shapes of art* analyses artworks acquisitions throughout the years, with the aim of understanding which criteria brought together them Museums' collection in a historical and social perspective. 
Art and history of art are no sealed compartments: they are heavily inter-dependent with social, political, economic factors, which in turn influence our very perception of what art is. 
Cultural institutions – museums in particular – play a fundamental role in this intertwined dynamics: through their selection, they have the potential to shape the public understanding of arts and its modifications throughout time.  
In some way, what makes into museums makes into history of art and viceversa. 
From these considerations stems our analysis: how do external (social, political, economic) factors influence the perception of art and its history? 
A way to investigate it is by looking at the greatest and most representative museums around the world, and at their acquisition policies and campaigns in particular. 

## Our key questions: 
1.<br>
2.<br>
3.<br>



# 1. Creating dataframes.
After importing all the necessary libraries, we can read our Museums' online CSV files containing information about artworks and artists as `Pandas Dataframes` in order to better manipulate and analyse them.

## Import

In [1]:
import pandas as pd
import csv
import re
from collections import defaultdict
from rdflib import Namespace , Literal , URIRef
from rdflib.namespace import RDF , RDFS
import ssl
from json import JSONDecodeError
from qwikidata.sparql import return_sparql_query_results 

For both Museums, we gather data directly from the remote files available on their Github pages ([MoMA](https://github.com/MuseumofModernArt/collection), [Tate](https://github.com/tategallery/collection)). 
In particular, we work on two separate datasets: one carries information about **artworks** (their title, date, acquisition year atc.), the other provides data on the **artists** (their name, nationality, gender etc.).
We select the columns we are interested in and merge the two files into one dataframe.

# MoMA

In [31]:
spreadsheet = pd.read_csv('https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artworks.csv')
MoMA_artworks = spreadsheet[['Title', 'Artist', 'ConstituentID', 'Nationality', 'BeginDate', 'EndDate','Date', 'Department', 'DateAcquired']]
MoMA_artists = pd.read_csv('https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artists.csv')
MoMA_artists['ConstituentID'] = MoMA_artists['ConstituentID'].astype(str)
MoMA = pd.merge(MoMA_artworks, MoMA_artists[['ConstituentID', 'Wiki QID', 'Gender']],on = 'ConstituentID', how = 'left')
MoMA = MoMA.rename(columns = {'ConstituentID':'Id', 'BeginDate':'BirthDate', 'EndDate':'DeathDate'})

# Tate

In [32]:
spreadsheet = pd.read_csv('https://raw.githubusercontent.com/tategallery/collection/master/artwork_data.csv')
Tate_artworks = spreadsheet[['artist', 'artistId', 'title', 'year', 'acquisitionYear']]
Tate_artworks = Tate_artworks.rename(columns = {'artistId':'Id'})
Tate_artworks['Id'] = Tate_artworks['Id'].astype(str)
Tate_artists = pd.read_csv('https://raw.githubusercontent.com/tategallery/collection/master/artist_data.csv')
Tate_artists = Tate_artists[['id', 'name', 'gender', 'placeOfBirth']]
Tate_artists = Tate_artists.rename(columns = {'id':'Id'})
Tate_artists["Id"] = Tate_artists["Id"].astype(str)
Tate = pd.merge(Tate_artworks,Tate_artists[['Id', 'gender']], on = 'Id', how = 'left')
Tate = Tate.rename(columns = {'artist':'Artist', 'title':'Title','year':'Date', 'acquisitionYear':'DateAcquired', 'gender':'Gender'})

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


In [50]:
display(Tate_artists)

Unnamed: 0,Id,name,gender,placeOfBirth
0,10093,Magdalena Abakanowicz,Female,Poland
1,0,Edwin Austin Abbey,Male,United States
2,2756,Berenice Abbott,Female,United States
3,1,Lemuel Francis Abbott,Male,United Kingdom
4,622,Ivor Abrahams,Male,United Kingdom
...,...,...,...,...
3527,12542,Gilberto Zorio,Male,Italia
3528,2186,Larry Zox,Male,United States
3529,621,Francesco Zuccarelli,Male,Italia
3530,2187,Ignacio Zuloaga,Male,Spain


# 2. Cleaning data

Identifying and fixing incoherent, corrupt or defective data is an essential process for ensuring a satisfactory threshold of reliability to any further analysis. Let us delve into it.

## MoMA

### Missing values
First of all, let us deal with missing values, substituting them with zeros to better handle them.

In [33]:
MoMA.fillna(value='0', inplace=True)

### Dates
Artworks acquisition dates are in the form `YYYY-MM-DD`.<br>
For the sake of our analysis, we extract the year and transform it into an integer.

In [34]:
def cleanAcquisitionDatesMoMA(date):
    if '-' in date:
        date = date.split('-')[0]
    return date

In [35]:
MoMA["DateAcquired"] = MoMA["DateAcquired"].apply(cleanAcquisitionDatesMoMA)

In MoMA database, artworks' creation dates are mostly already represented by just one year.<br>
Nevertheless, there are some exceptions: years separated by a slash or a dash and strings of any kind, such as '(1950).  (Prints executed 1948', '(1883, published 1897)' or '(1911, dated 1912, published c. 1917)'.

We wrote a code to extract the year through a regex only matching the first sequence of four digits for each value.

In [36]:
def cleanDatesMoMA(date):
    if '-' in date:
        splitted = date.split('-')
        date = ' '.join(splitted) 
    if '/' in date:
        splitted = date.split('/')
        date = ' '.join(splitted) 
    if ',' in date:
        splitted = date.split(',')
        date = ' '.join(splitted) 
    if '.' in date:
        splitted = date.split('.')
        date = ' '.join(splitted) 
        
    x = re.search("\d{4}", date)
    if x:
        date = x.group()
    else:
        date = '0'
    return date

In [37]:
MoMA["Date"] = MoMA["Date"].astype(str)
MoMA["Date"] = MoMA["Date"].apply(cleanDatesMoMA)

## Tate

### Missing values
Again, let us replace missing values and strings indicating lack of information with zeros, thus obtaining a dataframe filled with coherent data.

In [38]:
Tate.fillna(value='0', inplace=True)
Tate['Date'].replace(to_replace=['no date','c'], value='0', inplace= True)

In [39]:
Tate_artists.fillna(value='0', inplace=True)

### Dates
Artworks acquisition dates and artworks creation dates are represented as floats (e.g. 1997.0). We convert them to integers.

In [40]:
def cleanDatesTate(date):
    if '.' in date:
        date = date.split('.')[0] 
    return date

In [41]:
Tate["Date"] = Tate["Date"].astype(str)
Tate["Date"] = Tate["Date"].apply(cleanDatesTate)
Tate["DateAcquired"] = Tate["DateAcquired"].astype(str)
Tate["DateAcquired"] = Tate["DateAcquired"].apply(cleanDatesTate)

### Artists' names
In both Tate dataframes artists' names are in the form `Surname, Name`. For clarity purposes, we decided to normalise them as `Name Surname`, and wrote a code to do so. 

In [42]:
def cleanArtistsNames(name):
    if ',' in name:
        name= name.split(',')
        name[0], name[1] = name[1], name[0]
        name = ' '.join(name)
    return name.strip()

In [43]:
Tate["Artist"] = Tate["Artist"].apply(cleanArtistsNames)

In [44]:
Tate_artists["name"] = Tate_artists["name"].apply(cleanArtistsNames)

### Nationalities
In Tate's dataframe about artists, nationalities are often in the form `city, country`(e.g. 'Philadelphia, United States') and sometimes just indicate a city (e.g. 'Wimbledon'). <br>
Finally, all countries' names are in their original form (e.g. 'Nihon' for 'Japan'). <br>
We normalised it indicating, for each artist, its country of origin, in English. <br>
We looked for all diverging values and replaced them one by one through a script.

In [45]:
def cleanNationalitiesTate(naz): 
    if ',' in naz: 
        naz = naz.split(',')[1] 
    if naz == 'Blackheath': 
        naz= naz.replace('Blackheath', 'United Kingdom') 
    if naz == 'London': 
        naz= naz.replace('London', 'United Kingdom') 
    if naz == 'Kensington': 
        naz= naz.replace('Kensington', 'United Kingdom') 
    if naz == 'Chung-hua Min-kuo': 
        naz= naz.replace('Chung-hua Min-kuo', 'Taiwan') 
    if naz == 'Solothurn': 
        naz= naz.replace('Solothurn', 'Schweiz') 
    if naz == 'Melmerby': 
        naz= naz.replace('Melmerby', 'United Kingdom') 
    if naz == 'Montserrat': 
        naz= naz.replace('Montserrat', 'España') 
    if naz == 'Canterbury': 
        naz= naz.replace('Canterbury', 'United Kingdom') 
    if naz == 'Staten Island': 
        naz= naz.replace('Staten Island', 'United States') 
    if naz == 'Epsom': 
        naz= naz.replace('Epsom', 'United Kingdom') 
    if naz == 'Plymouth': 
        naz= naz.replace('Plymouth', 'United Kingdom') 
    if naz == 'Wimbledon': 
        naz= naz.replace('Wimbledon', 'United Kingdom') 
    if naz == 'Edinburgh': 
        naz= naz.replace('Edinburgh', 'United Kingdom') 
    if naz == 'Beckington': 
        naz= naz.replace('Beckington', 'United Kingdom') 
    if naz == 'Hertfordshire': 
        naz= naz.replace('Hertfordshire', 'United Kingdom') 
    if naz == 'Isle of Man': 
        naz= naz.replace('Isle of Man', 'United Kingdom') 
    if naz == 'Bristol': 
        naz= naz.replace('Bristol', 'United Kingdom') 
    if naz == 'Liverpool': 
        naz= naz.replace('Liverpool', 'United Kingdom') 
    if naz == 'Braintree': 
        naz= naz.replace('Braintree', 'United Kingdom') 
    if naz == 'Stoke on Trent': 
        naz= naz.replace('Stoke on Trent', 'United Kingdom') 
    if naz == 'Rochdale': 
        naz= naz.replace('Rochdale', 'United Kingdom') 
    if 'D.C.' in naz: 
        naz= naz.replace('D.C.', 'Colombia') 
    if 'Otok' in naz: 
        naz= naz.replace('Otok', 'Hrvatska') 
    if 'Département de la' in naz: 
        naz= naz.replace('Département de la', 'France') 
    if naz == 'Niederschlesien': 
        naz= naz.replace('Niederschlesien', 'Polska') 
    if naz == 'Perth': 
        naz= naz.replace('Perth', 'Australia') 
    if naz == 'Bermondsey': 
        naz= naz.replace('Bermondsey', 'United Kingdom') 
    if naz == 'Egremont': 
        naz= naz.replace('Egremont', 'United Kingdom') 
    if naz == 'Charlotte Amalie': 
        naz= naz.replace('Charlotte Amalie', 'United States') 
    if naz == 'Charlieu': 
        naz= naz.replace('Charlieu', 'France') 
    if naz == 'Stockholm': 
        naz= naz.replace('Stockholm', 'Sverige') 
    if naz == 'Auteuil': 
        naz= naz.replace('Auteuil', 'France') 
 
    if 'Polska' in naz: 
        naz = naz.replace('Polska', 'Poland') 
    if "Yisra'el" in naz: 
        naz = naz.replace("Yisra'el", 'Israel') 
    if 'Deutschland' in naz: 
        naz = naz.replace('Deutschland', 'Germany') 
    if 'Schweiz' in naz: 
        naz = naz.replace('Schweiz', 'Switzerland') 
    if 'Suomi' in naz: 
        naz = naz.replace('Suomi', 'Finland') 
    if 'Zhonghua' in naz: 
        naz = naz.replace('Zhonghua', 'China') 
    if 'Türkiye' in naz: 
        naz = naz.replace('Türkiye', 'Turkey') 
    if 'Al-‘Iraq' in naz: 
        naz = naz.replace('Al-‘Iraq', 'Iraq') 
    if 'België' in naz: 
        naz = naz.replace('België', 'Belgium') 
    if 'Rossiya' in naz: 
        naz = naz.replace('Rossiya', 'Russia') 
    if 'Nihon' in naz: 
        naz = naz.replace('Nihon', 'Japan') 
    if 'Éire' in naz: 
        naz = naz.replace('Éire', 'Ireland') 
    if 'Österreich' in naz: 
        naz = naz.replace('Österreich', 'Austria') 
    if 'Saint Hélier' in naz: 
        naz = naz.replace('Saint Hélier', 'United Kingdom') 
    if 'Ceská Republik' in naz: 
        naz = naz.replace('Ceská Republik', 'Czech Republic') 
    if 'Ukrayina' in naz: 
        naz = naz.replace('Ukrayina', 'Ukraine') 
    if 'Ellás' in naz: 
        naz = naz.replace('Ellás', 'Greece') 
    if 'Latvija ' in naz: 
        naz = naz.replace('Latvija ', 'Latvia') 
    if 'Douglas' in naz: 
        naz = naz.replace('Douglas', 'United Kingdom') 
    if 'România' in naz: 
        naz = naz.replace('România', 'Romania') 
    if 'Sverige' in naz: 
        naz = naz.replace('Sverige', 'Sweden') 
    if 'Bharat' in naz: 
        naz = naz.replace('Bharat', 'India')     
    if 'España' in naz: 
        naz = naz.replace('España', 'Spain')   
    if 'Magyarország' in naz: 
        naz = naz.replace('Magyarország', 'Hungery')  
    if 'Slovenská Republika' in naz: 
        naz = naz.replace('Slovenská Republika', 'Slovenia')  
        
    return naz.strip()

In [46]:
Tate_artists["placeOfBirth"] = Tate_artists["placeOfBirth"].apply(cleanNationalitiesTate)

# Exploration
We can now start exploring our Museums, to get to know them better through available data.

## How many artworks?

In [39]:
museums=[MoMA, Tate]
names = ['MoMA','Tate']
for museum in museums:
    selected_rows = museum[~museum['Title'].isnull()]
    name = names.pop(0)
    print("Total artworks at", name, ":", len(selected_rows.index))

Total artworks at MoMA : 140848
Total artworks at Tate : 69201


## When do artworks date back?

In [42]:
museums=[MoMA, Tate]
names = ['MoMA','Tate']
for museum in museums:
    museum["Date"] = museum["Date"].astype(int)
    museum.sort_values(by=['Date'], inplace=True)
    museumWithoutZeros = museum[museum['Date'] != 0]
    firstDate = museumWithoutZeros['Date'].iat[0]
    lastDate = museumWithoutZeros['Date'].iat[-1]
    name = names.pop(0)
    print("Most ancient artwork at", name, "dates back to",firstDate )
    print("Most recent artwork at", name, "dates back to",lastDate )    

Most ancient artwork at MoMA dates back to 1768
Most recent artwork at MoMA dates back to 2022
Most ancient artwork at Tate dates back to 1545
Most recent artwork at Tate dates back to 2012


## Artists

For examining artist-related issues, we rely on the two CSV files from the Museums containing information about them, which we already transformed into dataframes (Tate_artists and MoMA_artists). <br> 
In doing so, we avoid duplicates (the same artists may have more than one artwork in the same museum).

### How many artists?

In [45]:
print('Total number of artists at MoMA', len(MoMA_artists))

Total number of artists at MoMA 15243


In [46]:
print('Total number of artists at Tate', len(Tate_artists))

Total number of artists at Tate 3532


### Artists' gender: which is the most represented?

## Tate

### Wikidata integration

Since Tate dataset lacks of some gender information, we took advantage of Wikidata in order to integrate it through some remote SPARQL queries.

1. We create a subset of the dataframe containing all and only the the rows in which the gender information is missing.

In [62]:
Tate_to_integrate = Tate_artists[Tate_artists['gender']== '0']

2. We exclude some entities to which a gender cannot be attributed (e.g., collective or anonymous artists).

In [67]:
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'Anonymous']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'born 1945; Mel Ramsden Art & Language (Michael Baldwin  born 1944)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'born 1939; David Bainbridge Art & Language (Terry Atkinson  born 1941; Michael Baldwin  born 1945; Harold Hurrell  born']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'born 1939; Michael Baldwin Art & Language (Terry Atkinson  born 1945)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != '1939-1993; Mel Ramsden Art & Language (Ian Burn  born 1944)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'Atlas Group']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'Black Audio Film Collective (John Akomfrah; Reece Auguis; Edward George; Lina Gopaul; Avril Johnson; David Lawson; Trevo']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'Fionnuala and Leslie Boyd and Evans']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'British (?) School']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'British (?) School 19th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'British School 17th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'British School 16th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'British School 17th or 18th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'British School 18th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'British School 19th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'British School 20th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'Chinese School 18th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'French School 18th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'French School 19th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'International Local (Sarah Charlesworth; Joseph Kosuth; Anthony McCall)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'Italian or German (?) School 17th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'Langlands and Bell, Ben and Nikki']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'Italian or German (?) School 17th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'Ben and Nikki Langlands and Bell']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'Lucy and Eegyudluk']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'France) M/M (Paris']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'T R Uthco (Doug Hall born 1944, Diane Andrews Hall born 1945, Jody Procter 1944-1998)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'Art & Language (Ian Burn, 1939-1993; Mel Ramsden, born 1944)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'Unknown']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'Marc Voge) Young-Hae Chang Heavy Industries (Young-Hae Chang']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'K.O.S.']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'Diane Andrews Hall born 1945 T R Uthco (Doug Hall born 1944  Jody Procter 1944-1998)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['name'] != 'Mel Ramsden Art & Language (Ian Burn  born 1944)']

3. We proceed to search the artists' entities on Wikidata: this will then allow us to look up for their gender.<br>
The SPARQL query you can read below searches for human individuals that work as photograpers, artists, graphic artists, painters, video artists, sculptor and visual artist. The `{}` placeholder will be replaced by the artist's name from the dataframe via python `format()` method.<br>
We directly apply the query to our dataframe through a function taking advantage of `qwikidata` is a Python package allowing to interact with Wikidata.<br>
Finally, we insert the result in a new colum created on the fly, named `Artist Entity`.

In [None]:
artists_genders_from_ids = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?artist
WHERE {{
    ?artist wdt:P31 wd:Q5 . 
    ?artist wdt:P106 ?occupation
                  FILTER (?occupation IN (wd:Q33231) ) 
    ?artist rdfs:label ?o
    FILTER regex(?o, \"^{}$\" )
            FILTER (langMatches(lang(?o), "EN")).
}}

"""

In [None]:
def find_artists_genders_from_ids(name):
    query = artists_genders_from_ids.format(name.strip())
    res = return_sparql_query_results(query_string=query)
   
    try:
        wdt_uri = res['results']['bindings'][0]['artist']['value']
    except (IndexError, KeyError):
        return ""
    return wdt_uri.split("/")[-1]

In [None]:
Tate_to_integrate["Artist Entity"] = Tate_to_integrate["name"].apply(find_artists_genders_from_ids)

Since Wikidata SPARQL endpoint does not support heavy queries, we search for one occupation at a time and create a CSV file for all the artists for which the corresponding wikidata entity was found (e.g., all photographers).<br>
We then continue the research on the rest of the dataframe, from which we remove the rows corresponding to the artists for which we already found the wikidata entity.<br>
We work on copies to avoid compromising the original dataframe.

In [None]:
copy = Tate_to_integrate.copy(deep=True) 
photographers =  copy[copy['Artist Entity']!= ''] 
photographers.to_csv('Photographers.csv')

In [None]:
to_integrate =  copy[copy['Artist Entity']== ''] 
to_integrate["Artist Entity"] = to_integrate["name"].apply(find_artists_genders_from_ids)

4. Finally, when we have obtained one CSV file for each profession, we can move to integrating all of them into one.

In [26]:
Tate_artists_integrated = pd.concat(map(pd.read_csv, ['Handmade.csv', 'Artists.csv', 'Photographers.csv', 'Videoartists.csv', 'graphicartists.csv', 'Painters.csv']), ignore_index=True)
#Tate_artists_integrated.to_csv('ArtistIntegrated.csv')
display(Tate_artists_integrated )

Unnamed: 0.1,Column1,id,name,gender,dates,yearOfBirth,yearOfDeath,placeOfBirth,placeOfDeath,url,Artist Entity,Unnamed: 0
0,142.0,18070,John Bacon,,1740–1799,17400.0,17990.0,,,http://www.tate.org.uk/art/artists/john-bacon-...,,
1,145.0,12918,Ruth Baehnisch,,1910–1997,19100.0,19970.0,,,http://www.tate.org.uk/art/artists/ruth-baehni...,Q94774202,
2,242.0,2618,A. Belloguet,,19th century,18000.0,18990.0,,,http://www.tate.org.uk/art/artists/a-belloguet...,,
3,244.0,18264,Nikolaj Bendix Skyum Larsen,,born 1971,19710.0,,,,http://www.tate.org.uk/art/artists/nikolaj-ben...,Q38054711,
4,255.0,18242,Caroline Bergvall,,born 1962,19620.0,,,,http://www.tate.org.uk/art/artists/caroline-be...,Q5044980,
...,...,...,...,...,...,...,...,...,...,...,...,...
83,,5645,Nikolai Kogout,,1891–1959,1891.0,1959.0,,,http://www.tate.org.uk/art/artists/nikolai-kog...,Q15068972,1762.0
84,,17127,Vladimir Kozlinsky,,1891 – 1967,1891.0,1967.0,,,http://www.tate.org.uk/art/artists/vladimir-ko...,Q24351555,1785.0
85,,17008,James Orrock,,1829–1913,1829.0,1913.0,,,http://www.tate.org.uk/art/artists/james-orroc...,Q6140661,2413.0
86,,13098,Monika Sosnowska,,born 1972,1972.0,,,,http://www.tate.org.uk/art/artists/monika-sosn...,Q520491,2995.0


In [12]:
artists_genders = """ 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX wd: <http://www.wikidata.org/entity/> 
SELECT DISTINCT (SAMPLE(?genderLabel) AS ?genderL)
WHERE {{ 
     wd:{}  wdt:P21 ?gender . 
     ?gender rdfs:label ?genderLabel
    FILTER (langMatches(lang(?genderLabel), "EN"))
}} 
"""

In [None]:
def find_artists_genders(wikiId): 
    query = artists_genders.format(wikiId.strip()) 
    res = return_sparql_query_results(query_string=query) 
    print(query) 
    try: 
        gender= res['results']['bindings'][0]['genderL']['value'] 
    except (IndexError, KeyError, JSONDecodeError, ChunkedEncodingError): 
        return "" 
    return gender

In [None]:
ArtistIntegrated["gender"] = ArtistIntegrated["Artist Entity"].apply(find_artists_genders)

In [61]:
Artistswithgender = pd.read_csv('ArtistIntegratedFinal.csv')
Artistswithgender.rename(columns = {'id':'Id'}, inplace = True)
Artistswithgender["Id"] = Artistswithgender["Id"].astype(str)

In [63]:
Tate = pd.merge(Tate, Artistswithgender[['Id', 'name', 'gender', 'dates', 'yearOfBirth', 'yearOfDeath', 'placeOfBirth', 'placeOfDeath', 'url']],on = 'Id', how = 'left')

In [64]:
pd.set
display(Tate)

Unnamed: 0,Artist,Id,Title,Medium,CreditLine,Date,DateAcquired,URL,Gender,BirthDate,DeathDate,name,gender,dates,yearOfBirth,yearOfDeath,placeOfBirth,placeOfDeath,url
0,"Blake, Robert",38,A Figure Bowing before a Seated Old Man with h...,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,0,1922,http://www.tate.org.uk/art/artworks/blake-a-fi...,Male,1762.0,1787.0,,,,,,,,
1,"Cozens, Alexander",118,A Landscape with a Waterfall,"Graphite, ink and watercolour on paper",Purchased as part of the Oppé Collection with ...,0,1997,http://www.tate.org.uk/art/artworks/cozens-a-l...,Male,1717.0,1786.0,,,,,,,,
2,"Cozens, Alexander",118,A Hilly Coast Line,Ink on paper,Purchased as part of the Oppé Collection with ...,0,1997,http://www.tate.org.uk/art/artworks/cozens-a-h...,Male,1717.0,1786.0,,,,,,,,
3,"Cozens, Alexander",118,Trees under a Cliff,Ink on paper,Purchased as part of the Oppé Collection with ...,0,1997,http://www.tate.org.uk/art/artworks/cozens-tre...,Male,1717.0,1786.0,,,,,,,,
4,"Cozens, Alexander",118,"A Rocky Coast, to Right, a Felucca to Left",Watercolour and ink on paper,Purchased as part of the Oppé Collection with ...,0,1997,http://www.tate.org.uk/art/artworks/cozens-a-r...,Male,1717.0,1786.0,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69196,"Moriyama, Daido",11595,Memory,"Photograph, gelatin silver print on paper",Presented by the artist 2013,2012,2013,http://www.tate.org.uk/art/artworks/moriyama-m...,Male,1938.0,0,,,,,,,,
69197,"Moriyama, Daido",11595,Memory,"Photograph, gelatin silver print on paper",Presented by the artist 2013,2012,2013,http://www.tate.org.uk/art/artworks/moriyama-m...,Male,1938.0,0,,,,,,,,
69198,"Moriyama, Daido",11595,Memory,"Photograph, gelatin silver print on paper",Presented by the artist 2013,2012,2013,http://www.tate.org.uk/art/artworks/moriyama-m...,Male,1938.0,0,,,,,,,,
69199,"Moriyama, Daido",11595,Memory,"Photograph, gelatin silver print on paper",Presented by the artist 2013,2012,2013,http://www.tate.org.uk/art/artworks/moriyama-m...,Male,1938.0,0,,,,,,,,


In [24]:
TateArtists['gender'].value_counts()

Male      2895
Female     521
Name: gender, dtype: int64

In [22]:
MoMaArtists['Gender'].value_counts()

Male          9715
Female        2342
male            17
Non-Binary       2
female           1
Non-binary       1
Name: Gender, dtype: int64

### What are the most represented nationalities?

### MoMa

In [58]:
MoMaArtists['Nationality'].value_counts()

American          5181
German             965
British            860
French             847
Italian            536
                  ... 
Bahamian             1
Bangladeshi          1
Coptic               1
Sierra Leonean       1
Ugandan              1
Name: Nationality, Length: 119, dtype: int64

### Tate

In [59]:
def cleanNationalitiesTate(naz): 
    if ',' in naz: 
        naz = naz.split(',')[1] 
    if naz == 'Blackheath': 
        naz= naz.replace('Blackheath', 'United Kingdom') 
    if naz == 'London': 
        naz= naz.replace('London', 'United Kingdom') 
    if naz == 'Kensington': 
        naz= naz.replace('Kensington', 'United Kingdom') 
    if naz == 'Chung-hua Min-kuo': 
        naz= naz.replace('Chung-hua Min-kuo', 'Taiwan') 
    if naz == 'Solothurn': 
        naz= naz.replace('Solothurn', 'Schweiz') 
    if naz == 'Melmerby': 
        naz= naz.replace('Melmerby', 'United Kingdom') 
    if naz == 'Montserrat': 
        naz= naz.replace('Montserrat', 'España') 
    if naz == 'Canterbury': 
        naz= naz.replace('Canterbury', 'United Kingdom') 
    if naz == 'Staten Island': 
        naz= naz.replace('Staten Island', 'United States') 
    if naz == 'Epsom': 
        naz= naz.replace('Epsom', 'United Kingdom') 
    if naz == 'Plymouth': 
        naz= naz.replace('Plymouth', 'United Kingdom') 
    if naz == 'Wimbledon': 
        naz= naz.replace('Wimbledon', 'United Kingdom') 
    if naz == 'Edinburgh': 
        naz= naz.replace('Edinburgh', 'United Kingdom') 
    if naz == 'Beckington': 
        naz= naz.replace('Beckington', 'United Kingdom') 
    if naz == 'Hertfordshire': 
        naz= naz.replace('Hertfordshire', 'United Kingdom') 
    if naz == 'Isle of Man': 
        naz= naz.replace('Isle of Man', 'United Kingdom') 
    if naz == 'Bristol': 
        naz= naz.replace('Bristol', 'United Kingdom') 
    if naz == 'Liverpool': 
        naz= naz.replace('Liverpool', 'United Kingdom') 
    if naz == 'Braintree': 
        naz= naz.replace('Braintree', 'United Kingdom') 
    if naz == 'Stoke on Trent': 
        naz= naz.replace('Stoke on Trent', 'United Kingdom') 
    if naz == 'Rochdale': 
        naz= naz.replace('Rochdale', 'United Kingdom') 
    if 'D.C.' in naz: 
        naz= naz.replace('D.C.', 'Colombia') 
    if 'Otok' in naz: 
        naz= naz.replace('Otok', 'Hrvatska') 
    if 'Département de la' in naz: 
        naz= naz.replace('Département de la', 'France') 
    if naz == 'Niederschlesien': 
        naz= naz.replace('Niederschlesien', 'Polska') 
    if naz == 'Perth': 
        naz= naz.replace('Perth', 'Australia') 
    if naz == 'Bermondsey': 
        naz= naz.replace('Bermondsey', 'United Kingdom') 
    if naz == 'Egremont': 
        naz= naz.replace('Egremont', 'United Kingdom') 
    if naz == 'Charlotte Amalie': 
        naz= naz.replace('Charlotte Amalie', 'United States') 
    if naz == 'Charlieu': 
        naz= naz.replace('Charlieu', 'France') 
    if naz == 'Stockholm': 
        naz= naz.replace('Stockholm', 'Sverige') 
    if naz == 'Auteuil': 
        naz= naz.replace('Auteuil', 'France') 
 
    if 'Polska' in naz: 
        naz = naz.replace('Polska', 'Poland') 
    if "Yisra'el" in naz: 
        naz = naz.replace("Yisra'el", 'Israel') 
    if 'Deutschland' in naz: 
        naz = naz.replace('Deutschland', 'Germany') 
    if 'Schweiz' in naz: 
        naz = naz.replace('Schweiz', 'Switzerland') 
    if 'Suomi' in naz: 
        naz = naz.replace('Suomi', 'Finland') 
    if 'Zhonghua' in naz: 
        naz = naz.replace('Zhonghua', 'China') 
    if 'Türkiye' in naz: 
        naz = naz.replace('Türkiye', 'Turkey') 
    if 'Al-‘Iraq' in naz: 
        naz = naz.replace('Al-‘Iraq', 'Iraq') 
    if 'België' in naz: 
        naz = naz.replace('België', 'Belgium') 
    if 'Rossiya' in naz: 
        naz = naz.replace('Rossiya', 'Russia') 
    if 'Nihon' in naz: 
        naz = naz.replace('Nihon', 'Japan') 
    if 'Éire' in naz: 
        naz = naz.replace('Éire', 'Ireland') 
    if 'Österreich' in naz: 
        naz = naz.replace('Österreich', 'Austria') 
    if 'Saint Hélier' in naz: 
        naz = naz.replace('Saint Hélier', 'United Kingdom') 
    if 'Ceská Republik' in naz: 
        naz = naz.replace('Ceská Republik', 'Czech Republic') 
    if 'Ukrayina' in naz: 
        naz = naz.replace('Ukrayina', 'Ukraine') 
    if 'Ellás' in naz: 
        naz = naz.replace('Ellás', 'Greece') 
    if 'Latvija ' in naz: 
        naz = naz.replace('Latvija ', 'Latvia') 
    if 'Douglas' in naz: 
        naz = naz.replace('Douglas', 'United Kingdom') 
    if 'România' in naz: 
        naz = naz.replace('România', 'Romania') 
    if 'Sverige' in naz: 
        naz = naz.replace('Sverige', 'Sweden') 
    if 'Bharat' in naz: 
        naz = naz.replace('Bharat', 'India')     
    if 'España' in naz: 
        naz = naz.replace('España', 'Spain')   
    if 'Magyarország' in naz: 
        naz = naz.replace('Magyarország', 'Hungery')  
    if 'Slovenská Republika' in naz: 
        naz = naz.replace('Slovenská Republika', 'Slovenia')  
        
    return naz.strip()

In [60]:
TateArtists = TateArtists[TateArtists['placeOfBirth'].notna()] 
TateArtists["placeOfBirth"] = TateArtists["placeOfBirth"].apply(cleanNationalitiesTate)

In [61]:
TateArtists['placeOfBirth'].value_counts()

United Kingdom    1522
United States      341
France             160
Germany            142
Italia              80
                  ... 
Barbados             1
Nicaragua            1
Iraq                 1
Luxembourg           1
Prathet Thai         1
Name: placeOfBirth, Length: 98, dtype: int64

# Acquisition criteria.

1.  In which years are artists' works mostly acquired?

## Year by year

In [62]:
MoMa['DateAcquired'].value_counts()

1964    12828
2008     7204
1968     6894
0        6682
2001     4170
        ...  
1933       93
1932       18
1929        9
1930        7
1931        3
Name: DateAcquired, Length: 95, dtype: int64

In [63]:
Tate['DateAcquired'].value_counts

<bound method IndexOpsMixin.value_counts of 0        1922
1        1922
2        1922
3        1922
4        1919
         ... 
69196    2013
69197    2013
69198    2013
69199    2013
69200    2013
Name: DateAcquired, Length: 69201, dtype: object>

Let us analyse acquisitions from a larger perspective: not year by year, but for every ten years.

In [64]:
MoMa.to_csv('MoMa.csv') 
with open('MoMa.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    years={} 
    for item in reader: 
        if item['DateAcquired']not in years: 
            years[item['DateAcquired']]= 1 
        else: 
            years[item['DateAcquired']]+= 1 
new_dict={} 
for key in years: 
    key_int=int(key) 
    if key_int in range(1928,1941): 
        if '1930s' not in new_dict.keys(): 
               new_dict['1930s']= years[key] 
        else: 
            new_dict['1930s'] += years[key] 
    if key_int in range(1940,1951): 
        if '1940s' not in new_dict.keys(): 
               new_dict['1940s']= years[key] 
        else: 
            new_dict['1940s'] += years[key] 
     
    if key_int in range(1950,1961): 
        if '1950s' not in new_dict.keys(): 
               new_dict['1950s']= years[key] 
        else: 
            new_dict['1950s'] += years[key] 
     
    if key_int in range(1960,1971): 
        if '1960s' not in new_dict.keys(): 
               new_dict['1960s']= years[key] 
        else: 
            new_dict['1960s'] += years[key] 
     
    if key_int in range(1970,1981): 
        if '1970s' not in new_dict.keys(): 
               new_dict['1970s']= years[key] 
        else: 
            new_dict['1970s'] += years[key] 
    if key_int in range(1980,1991): 
        if '1980s' not in new_dict.keys(): 
               new_dict['1980s']= years[key] 
        else: 
            new_dict['1980s'] += years[key] 
     
    if key_int in range(1990,2001): 
        if '1990s' not in new_dict.keys(): 
               new_dict['1990s']= years[key] 
        else: 
            new_dict['1990s'] += years[key] 
    
    if key_int in range(2000,2011): 
        if '2000s' not in new_dict.keys(): 
               new_dict['2000s']= years[key] 
        else: 
            new_dict['2000s'] += years[key] 
         
     
print('Moma:', new_dict)

Moma: {'1930s': 3318, '1970s': 13868, '1940s': 8274, '1960s': 31950, '1990s': 13332, '1980s': 11497, '1950s': 6846}


In [17]:
Tate.to_csv('Tate.csv') 
with open('Tate.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    years={} 
    for item in reader: 
        if item['DateAcquired']not in years: 
            years[item['DateAcquired']]= 1 
        else: 
            years[item['DateAcquired']]+= 1 
new_dict={} 
for key in years: 
    key_int=int(key) 
    if key_int in range(1928,1941): 
        if '1930s' not in new_dict.keys(): 
               new_dict['1930s']= years[key] 
        else: 
            new_dict['1930s'] += years[key] 
    if key_int in range(1940,1951): 
        if '1940s' not in new_dict.keys(): 
               new_dict['1940s']= years[key] 
        else: 
            new_dict['1940s'] += years[key] 
     
    if key_int in range(1950,1961): 
        if '1950s' not in new_dict.keys(): 
               new_dict['1950s']= years[key] 
        else: 
            new_dict['1950s'] += years[key] 
     
    if key_int in range(1960,1971): 
        if '1960s' not in new_dict.keys(): 
               new_dict['1960s']= years[key] 
        else: 
            new_dict['1960s'] += years[key] 
     
    if key_int in range(1970,1981): 
        if '1970s' not in new_dict.keys(): 
               new_dict['1970s']= years[key] 
        else: 
            new_dict['1970s'] += years[key] 
    if key_int in range(1980,1991): 
        if '1980s' not in new_dict.keys(): 
               new_dict['1980s']= years[key] 
        else: 
            new_dict['1980s'] += years[key] 
     
    if key_int in range(1990,2001): 
        if '1990s' not in new_dict.keys(): 
               new_dict['1990s']= years[key] 
        else: 
            new_dict['1990s'] += years[key] 
    
    if key_int in range(2000,2011): 
        if '2000s' not in new_dict.keys(): 
               new_dict['2000s']= years[key] 
        else: 
            new_dict['2000s'] += years[key] 
         
     
print('Tate:', new_dict)

Tate: {'1930s': 788, '1940s': 776, '1960s': 1048, '1950s': 633, '1970s': 6853, '2000s': 5345, '1980s': 5538, '1990s': 6693}


## Gender Gap <br> 
Does it decrease? 
When?

### MoMa

Number of female and male artists acquired every ten years.

In [66]:
with open('MoMa.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    gender={'1930s': {'Male': 0, 'Female': 0}, '1940s': {'Male': 0, 'Female': 0}, '1950s': {'Male': 0, 'Female': 0}, '1960s': {'Male': 0, 'Female': 0}, '1970s': {'Male': 0, 'Female': 0}, '1980s': {'Male': 0, 'Female': 0}, '1990s': {'Male': 0, 'Female': 0}, '2000s': {'Male': 0, 'Female': 0}} 
    for item in reader: 
        if int(item['DateAcquired']) in range (1928,1941): 
            if (item['Gender'] == 'Female'):  
                gender['1930s']['Female'] += 1 
            else: 
                gender['1930s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1940,1951): 
            if (item['Gender'] == 'Female'):  
                gender['1940s']['Female'] += 1 
            else: 
                gender['1940s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1950,1961): 
            if (item['Gender'] == 'Female'):  
                gender['1950s']['Female'] += 1 
            else: 
                gender['1950s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1960,1971): 
            if (item['Gender'] == 'Female'):  
                gender['1960s']['Female'] += 1 
            else: 
                gender['1960s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1970,1981): 
            if (item['Gender'] == 'Female'):  
                gender['1970s']['Female'] += 1 
            else: 
                gender['1970s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1980,1991): 
            if (item['Gender'] == 'Female'):  
                gender['1980s']['Female'] += 1 
            else: 
                gender['1980s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1990,2001): 
            if (item['Gender'] == 'Female'):  
                gender['1990s']['Female'] += 1 
            else: 
                gender['1990s']['Male'] += 1 
        if int(item['DateAcquired']) in range (2000,2011): 
            if (item['Gender'] == 'Female'):  
                gender['2000s']['Female'] += 1 
            else: 
                gender['2000s']['Male'] += 1 
    print(gender)

{'1930s': {'Male': 3082, 'Female': 236}, '1940s': {'Male': 7707, 'Female': 567}, '1950s': {'Male': 6476, 'Female': 370}, '1960s': {'Male': 30701, 'Female': 1249}, '1970s': {'Male': 12245, 'Female': 1623}, '1980s': {'Male': 10230, 'Female': 1267}, '1990s': {'Male': 10522, 'Female': 2810}, '2000s': {'Male': 21850, 'Female': 5011}}


Percentage of female-male artists every ten years

In [67]:
for el in gender: 
    tot = gender[el]['Male'] + gender[el]['Female'] 
    percentage = (gender[el]['Male']/tot)*100 
    print (el, 'Male', round(percentage),'%', 'Female', round(100-percentage),'%')

1930s Male 93 % Female 7 %
1940s Male 93 % Female 7 %
1950s Male 95 % Female 5 %
1960s Male 96 % Female 4 %
1970s Male 88 % Female 12 %
1980s Male 89 % Female 11 %
1990s Male 79 % Female 21 %
2000s Male 81 % Female 19 %


## Tate

Number of female and male artists acquired every ten years.

In [68]:
with open('Tate.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    gender={'1930s': {'Male': 0, 'Female': 0}, '1940s': {'Male': 0, 'Female': 0}, '1950s': {'Male': 0, 'Female': 0}, '1960s': {'Male': 0, 'Female': 0}, '1970s': {'Male': 0, 'Female': 0}, '1980s': {'Male': 0, 'Female': 0}, '1990s': {'Male': 0, 'Female': 0}} 
    for item in reader: 
        if int(item['DateAcquired']) in range (1928,1941): 
            if (item['Gender'] == 'Female'):  
                gender['1930s']['Female'] += 1 
            else: 
                gender['1930s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1940,1951): 
            if (item['Gender'] == 'Female'):  
                gender['1940s']['Female'] += 1 
            else: 
                gender['1940s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1950,1961): 
            if (item['Gender'] == 'Female'):  
                gender['1950s']['Female'] += 1 
            else: 
                gender['1950s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1960,1971): 
            if (item['Gender'] == 'Female'):  
                gender['1960s']['Female'] += 1 
            else: 
                gender['1960s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1970,1981): 
            if (item['Gender'] == 'Female'):  
                gender['1970s']['Female'] += 1 
            else: 
                gender['1970s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1980,1991): 
            if (item['Gender'] == 'Female'):  
                gender['1980s']['Female'] += 1 
            else: 
                gender['1980s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1990,2001): 
            if (item['Gender'] == 'Female'):  
                gender['1990s']['Female'] += 1 
            else: 
                gender['1990s']['Male'] += 1 
        if int(item['DateAcquired']) in range (2000,2011): 
            if (item['Gender'] == 'Female'):  
                gender['2000s']['Female'] += 1 
            else: 
                gender['2000s']['Male'] += 1 
    print(gender)

{'1930s': {'Male': 722, 'Female': 66}, '1940s': {'Male': 700, 'Female': 76}, '1950s': {'Male': 598, 'Female': 35}, '1960s': {'Male': 967, 'Female': 81}, '1970s': {'Male': 6398, 'Female': 455}, '1980s': {'Male': 5202, 'Female': 336}, '1990s': {'Male': 6025, 'Female': 668}}


Percentage of female-male artists every ten years

In [69]:
for el in gender: 
    tot = gender[el]['Male'] + gender[el]['Female'] 
    percentage = (gender[el]['Male']/tot)*100 
    print (el, 'Male', round(percentage),'%', 'Female', round(100-percentage),'%')

1930s Male 92 % Female 8 %
1940s Male 90 % Female 10 %
1950s Male 94 % Female 6 %
1960s Male 92 % Female 8 %
1970s Male 93 % Female 7 %
1980s Male 94 % Female 6 %
1990s Male 90 % Female 10 %


## Nationalities 

In which years artists' nationalities more influent on the selection?

## MoMA

For every ten years, we count the nationalities' frequency.

In [70]:
MoMaArtists.rename(columns = {'ConstituentID':'Id'}, inplace = True)

In [71]:
def cleanDates(date): 
    if '.' in date: 
        date = date.split('.')[0] 
    return date

In [72]:
MoMaNationalities = pd.merge(MoMaArtists,MoMa[['Id', 'DateAcquired']],on='Id', how='left') 
MoMaNationalities.fillna(value='0', inplace=True) 
MoMaNationalities["DateAcquired"] = MoMaNationalities["DateAcquired"].astype(str) 
MoMaNationalities["DateAcquired"] = MoMaNationalities["DateAcquired"].apply(cleanDates) 
MoMaNationalities = MoMaNationalities.drop_duplicates(subset='DisplayName', keep="first") 
MoMaNationalities.to_csv('MoMaNationalities.csv')

In [73]:
from collections import defaultdict  
 
with open('MoMaNationalities.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    nationalities = defaultdict(dict) 
    for item in reader: 
        if int(item['DateAcquired']) in range (1928,1941): 
            if item['Nationality'] not in nationalities['1930s']: 
                nationalities['1930s'][item['Nationality']] = 1 
            else: 
                nationalities['1930s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1940,1951): 
            if item['Nationality'] not in nationalities['1940s']: 
                nationalities['1940s'][item['Nationality']] = 1 
            else: 
                nationalities['1940s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1950,1961): 
            if item['Nationality'] not in nationalities['1950s']: 
                nationalities['1950s'][item['Nationality']] = 1 
            else: 
                nationalities['1950s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1960,1971): 
            if item['Nationality'] not in nationalities['1960s']: 
                nationalities['1960s'][item['Nationality']] = 1 
            else: 
                nationalities['1960s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1970,1981): 
            if item['Nationality'] not in nationalities['1970s']: 
                nationalities['1970s'][item['Nationality']] = 1 
            else: 
                nationalities['1970s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1980,1991): 
            if item['Nationality'] not in nationalities['1980s']: 
                nationalities['1980s'][item['Nationality']] = 1 
            else: 
                nationalities['1980s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1990,2001): 
            if item['Nationality'] not in nationalities['1990s']: 
                nationalities['1990s'][item['Nationality']] = 1 
            else: 
                nationalities['1990s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (2000,2011): 
            if item['Nationality'] not in nationalities['2000s']: 
                nationalities['2000s'][item['Nationality']] = 1 
            else: 
                nationalities['2000s'][item['Nationality']] += 1 
         
                 
 
print(nationalities)

defaultdict(<class 'dict'>, {'1980s': {'American': 634, 'Danish': 10, 'Estonian': 2, 'Swedish': 12, 'French': 91, 'Finnish': 8, 'Romanian': 2, 'Israeli': 8, 'Dutch': 31, 'Norwegian': 4, 'British': 90, 'Austrian': 24, 'Japanese': 60, 'German': 95, 'Russian': 23, 'Swiss': 44, 'Spanish': 12, 'Italian': 32, 'Congolese': 1, 'Brazilian': 2, 'Hungarian': 7, 'Polish': 17, 'Canadian': 33, 'Icelandic': 1, 'Australian': 6, 'Croatian': 3, 'Slovak': 1, 'Cuban': 4, 'Mexican': 9, 'Greek': 2, 'Chinese': 1, 'Belgian': 8, 'Czech': 10, 'Nationality unknown': 5, 'Venezuelan': 1, 'Portuguese': 1, 'Peruvian': 3, 'Indian': 1, 'Moroccan': 1, '0': 14, 'Latvian': 2, 'Irish': 1, 'Native American': 1, 'Chilean': 1, 'Colombian': 1, 'Puerto Rican': 1}, '1960s': {'Spanish': 21, '0': 14, 'American': 619, 'French': 142, 'Japanese': 63, 'British': 77, 'Finnish': 4, 'Argentine': 38, 'Kuwaiti': 1, 'German': 143, 'Italian': 91, 'Nationality unknown': 36, 'Chilean': 20, 'Swiss': 28, 'Czech': 7, 'Danish': 12, 'Brazilian': 2

In [27]:
display(Tate_artists)

Unnamed: 0,Id,name,gender,placeOfBirth
0,10093,"Abakanowicz, Magdalena",Female,Polska
1,0,"Abbey, Edwin Austin",Male,"Philadelphia, United States"
2,2756,"Abbott, Berenice",Female,"Springfield, United States"
3,1,"Abbott, Lemuel Francis",Male,"Leicestershire, United Kingdom"
4,622,"Abrahams, Ivor",Male,"Wigan, United Kingdom"
...,...,...,...,...
3527,12542,"Zorio, Gilberto",Male,"Andorno Micca, Italia"
3528,2186,"Zox, Larry",Male,"Des Moines, United States"
3529,621,"Zuccarelli, Francesco",Male,Italia
3530,2187,"Zuloaga, Ignacio",Male,España


In [22]:
TateNationalities = pd.merge(Tate_artists,Tate[['Id', 'DateAcquired']], on = 'Id', how = 'left')
#TateNationalities = TateNationalities[TateNationalities['placeOfBirth'].notna()]
TateNationalities = TateNationalities.drop_duplicates(subset='name', keep = "first")
#TateNationalities["placeOfBirth"] = TateNationalities["placeOfBirth"].apply(cleanNationalitiesTate)
TateNationalities.fillna(value='0', inplace=True)
TateNationalities["DateAcquired"] = TateNationalities["DateAcquired"].astype(str)
#TateNationalities["DateAcquired"] = TateNationalities["DateAcquired"].apply(cleanDates)
TateNationalities.rename(columns = {'placeOfBirth':'Nationality'}, inplace = True)
TateNationalities.to_csv("TateNationalities.csv")

In [23]:
from collections import defaultdict 

with open('TateNationalities.csv', mode='r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    nationalities = defaultdict(dict)
    for item in reader:
        if int(item['DateAcquired']) in range (1928,1941):
            if item['Nationality'] not in nationalities['1930s']:
                nationalities['1930s'][item['Nationality']] = 1
            else:
                nationalities['1930s'][item['Nationality']] += 1
        if int(item['DateAcquired']) in range (1940,1951):
            if item['Nationality'] not in nationalities['1940s']:
                nationalities['1940s'][item['Nationality']] = 1
            else:
                nationalities['1940s'][item['Nationality']] += 1
        if int(item['DateAcquired']) in range (1950,1961):
            if item['Nationality'] not in nationalities['1950s']:
                nationalities['1950s'][item['Nationality']] = 1
            else:
                nationalities['1950s'][item['Nationality']] += 1
        if int(item['DateAcquired']) in range (1960,1971):
            if item['Nationality'] not in nationalities['1960s']:
                nationalities['1960s'][item['Nationality']] = 1
            else:
                nationalities['1960s'][item['Nationality']] += 1
        if int(item['DateAcquired']) in range (1970,1981):
            if item['Nationality'] not in nationalities['1970s']:
                nationalities['1970s'][item['Nationality']] = 1
            else:
                nationalities['1970s'][item['Nationality']] += 1
        if int(item['DateAcquired']) in range (1980,1991):
            if item['Nationality'] not in nationalities['1980s']:
                nationalities['1980s'][item['Nationality']] = 1
            else:
                nationalities['1980s'][item['Nationality']] += 1
        if int(item['DateAcquired']) in range (1990,2001):
            if item['Nationality'] not in nationalities['1990s']:
                nationalities['1990s'][item['Nationality']] = 1
            else:
                nationalities['1990s'][item['Nationality']] += 1
        if int(item['DateAcquired']) in range (2000,2011): 
            if item['Nationality'] not in nationalities['2000s']: 
                nationalities['2000s'][item['Nationality']] = 1 
            else: 
                nationalities['2000s'][item['Nationality']] += 1 
        
                

print(nationalities)

defaultdict(<class 'dict'>, {'2000s': {'Poland': 9, 'United States': 99, 'Germany': 39, 'Finland': 1, 'China': 10, 'Iraq': 1, 'Russia': 2, 'United Kingdom': 150, 'Belgium': 5, 'México': 7, 'Perú': 3, 'Ukraine': 3, 'Îran': 5, 'Italia': 12, 'Venezuela': 4, 'Turkey': 2, '0': 29, 'France': 14, 'Israel': 5, 'Brasil': 17, 'Jugoslavija': 2, 'Uganda': 1, 'Norge': 1, 'Nederland': 4, 'South Africa': 6, 'Romania': 2, 'Argentina': 8, 'Cuba': 4, 'Canada': 9, 'Ireland': 4, 'Greece': 2, 'Colombia': 5, 'Latvija': 1, 'Sweden': 1, 'Chile': 2, 'Czech Republica': 4, 'Danmark': 5, 'Spain': 5, 'Austria': 4, 'Switzerland': 6, 'Pakistan': 1, 'Mehoz': 2, 'India': 3, 'Japan': 10, 'Bahamas': 1, 'Hungery': 3, 'Bangladesh': 1, 'Hrvatska': 2, 'Slovenia': 2, "Taehan Min'guk": 1, 'Zimbabwe': 1, 'Sri Lanka': 1, 'New Zealand': 4, 'Luxembourg': 1, 'Ísland': 1, 'Pilipinas': 1, 'Lietuva': 2, 'Australia': 2, 'Al-Lubnan': 3, 'Kenya': 1, "Al-Jaza'ir": 1, 'Lao': 1, 'Malta': 1, 'Panamá': 1, 'Misr': 1, 'Portugal': 3, 'Shqipëria