# Collage, the shapes of arte
This notebook was created by [Francesca Borriello](https://github.com/Fran-cesca), [Lorenza Pierucci](https://github.com/LorenzaPierucci) and [Laura Travaglini](https://github.com/lauratravaglini) as part of their final project for the [Digital Pubishing and Electronic Storytelling](https://www.unibo.it/it/didattica/insegnamenti/insegnamento/2021/443749) course at the University of Bologna (academic year 221/2022).

# About the project
Starting from the datasets made publicly available by the NY Museum of Modern Art (**MoMA**) and by the **Tate galleries**, *Collage, the shapes of art* analyses artworks acquisitions throughout the years, with the aim of understanding which criteria brought together them Museums' collection in a historical and social perspective. 
Art and history of art are no sealed compartments: they are heavily inter-dependent with social, political, economic factors, which in turn influence our very perception of what art is. 
Cultural institutions – museums in particular – play a fundamental role in this intertwined dynamics: through their selection, they have the potential to shape the public understanding of arts and its modifications throughout time.  
In some way, what makes into museums makes into history of art and viceversa. 
From these considerations stems our analysis: how do external (social, political, economic) factors influence the perception of art and its history? 
A way to investigate it is by looking at the greatest and most representative museums around the world, and at their acquisition policies and campaigns in particular. 

## Our key questions: 
1.<br>
2.<br>
3.<br>



# 1. Creating dataframes.
After importing all the necessary libraries, we can read our Museums' online CSV files containing information about artworks and artists as `Pandas Dataframes` in order to better manipulate and analyse them.

## Import

In [1]:
import pandas as pd
import csv
import re
from collections import defaultdict
from rdflib import Namespace , Literal , URIRef
from rdflib.namespace import RDF , RDFS
import ssl
from json import JSONDecodeError
from qwikidata.sparql import return_sparql_query_results 

For both Museums, we gather data directly from the remote files available on their Github pages ([MoMA](https://github.com/MuseumofModernArt/collection), [Tate](https://github.com/tategallery/collection)). 
In particular, we work on two separate datasets: one carries information about **artworks** (their title, date, acquisition year atc.), the other provides data on the **artists** (their name, nationality, gender etc.).
In addition, we merge these two dataframes on some selected columns to create a new one for analysing **acquisition**-related issues.

# MoMA

In [85]:
spreadsheet = pd.read_csv('https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artworks.csv')
MoMA_artworks = spreadsheet[['ConstituentID','Title','Date', 'DateAcquired']]
MoMA_artworks = MoMA_artworks.rename(columns = {'ConstituentID':'Id'})
MoMA_artists = pd.read_csv('https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artists.csv')
MoMA_artists = MoMA_artists[['ConstituentID', 'DisplayName', 'Nationality', 'Gender', 'Wiki QID']]
MoMA_artists['ConstituentID'] = MoMA_artists['ConstituentID'].astype(str)
MoMA_artists = MoMA_artists.rename(columns = {'ConstituentID':'Id', 'DisplayName':'Name'})

In [115]:
MoMA_acquisitions = pd.merge(MoMA_artists, MoMA_artworks[['Id', 'DateAcquired']], on='Id', how='left')
MoMA_acquisitions = MoMA_acquisitions.drop_duplicates(subset='Name', keep="first")

# Tate

In [116]:
spreadsheet = pd.read_csv('https://raw.githubusercontent.com/tategallery/collection/master/artwork_data.csv')
Tate_artworks = spreadsheet[['artistId','title', 'year', 'acquisitionYear']]
Tate_artworks = Tate_artworks.rename(columns = {'artistId':'Id', 'acquisitionYear':'DateAcquired', 'year':'Date', 'title':'Title'})
Tate_artworks['Id'] = Tate_artworks['Id'].astype(str)
Tate_artists = pd.read_csv('https://raw.githubusercontent.com/tategallery/collection/master/artist_data.csv')
Tate_artists = Tate_artists[['id', 'name','placeOfBirth', 'gender']]
Tate_artists = Tate_artists.rename(columns = {'id':'Id', 'name':'Name', 'gender':'Gender'})
Tate_artists["Id"] = Tate_artists["Id"].astype(str)

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


In [117]:
Tate_acquisitions = pd.merge(Tate_artists, Tate_artworks[['Id', 'DateAcquired']], on='Id', how='left')
Tate_acquisitions = Tate_acquisitions.drop_duplicates(subset='Name', keep="first")

# 2. Cleaning data

Identifying and fixing incoherent, corrupt or defective data is an essential process for ensuring a satisfactory threshold of reliability to any further analysis. Let us delve into it.

## MoMA

### Missing values
First of all, let us deal with missing values, substituting them with zeros to better handle them.

In [118]:
MoMA_artists.fillna(value='0', inplace=True)
MoMA_artworks.fillna(value='0', inplace=True)
MoMA_acquisitions.fillna(value='0', inplace=True)

### Dates
Artworks acquisition dates are in the form `YYYY-MM-DD`.<br>
For the sake of our analysis, we extract the year and transform it into an integer.

In [119]:
def cleanAcquisitionDatesMoMA(date):
    if '-' in date:
        date = date.split('-')[0]
    return date

In [120]:
MoMA_artworks["DateAcquired"] = MoMA_artworks["DateAcquired"].apply(cleanAcquisitionDatesMoMA)
MoMA_acquisitions["DateAcquired"] = MoMA_acquisitions["DateAcquired"].apply(cleanAcquisitionDatesMoMA)

In MoMA database, artworks' creation dates are mostly already represented by just one year.<br>
Nevertheless, there are some exceptions: years separated by a slash or a dash and strings of any kind, such as '(1950).  (Prints executed 1948', '(1883, published 1897)' or '(1911, dated 1912, published c. 1917)'.

We wrote a code to extract the year through a regex only matching the first sequence of four digits for each value.

In [121]:
def cleanDatesMoMA(date):
    if '-' in date:
        splitted = date.split('-')
        date = ' '.join(splitted) 
    if '/' in date:
        splitted = date.split('/')
        date = ' '.join(splitted) 
    if ',' in date:
        splitted = date.split(',')
        date = ' '.join(splitted) 
    if '.' in date:
        splitted = date.split('.')
        date = ' '.join(splitted) 
        
    x = re.search("\d{4}", date)
    if x:
        date = x.group()
    else:
        date = '0'
    return date

In [122]:
MoMA_artworks["Date"] = MoMA_artworks["Date"].astype(str)
MoMA_artworks["Date"] = MoMA_artworks["Date"].apply(cleanDatesMoMA)

## Tate

### Missing values
Again, let us replace missing values and strings indicating lack of information with zeros, thus obtaining a dataframe filled with coherent data.

In [123]:
Tate_artists.fillna(value='0', inplace=True)
Tate_artworks.fillna(value='0', inplace=True)
Tate_acquisitions.fillna(value='0', inplace=True)
Tate_artworks['Date'].replace(to_replace=['no date','c'], value='0', inplace= True)

### Dates
Artworks acquisition dates and artworks creation dates are represented as floats (e.g. 1997.0). We convert them to integers.

In [124]:
def cleanDatesTate(date):
    if '.' in date:
        date = date.split('.')[0] 
    return date

In [125]:
Tate_artworks["Date"] = Tate_artworks["Date"].astype(str)
Tate_artworks["Date"] = Tate_artworks["Date"].apply(cleanDatesTate)
Tate_artworks["DateAcquired"] = Tate_artworks["DateAcquired"].astype(str)
Tate_artworks["DateAcquired"] = Tate_artworks["DateAcquired"].apply(cleanDatesTate)
Tate_acquisitions["DateAcquired"] = Tate_acquisitions["DateAcquired"].astype(str)
Tate_acquisitions["DateAcquired"] = Tate_acquisitions["DateAcquired"].apply(cleanDatesTate)

### Artists' names
In both Tate dataframes artists' names are in the form `Surname, Name`. For clarity purposes, we decided to normalise them as `Name Surname`, and wrote a code to do so. 

In [126]:
def cleanArtistsNames(name):
    if ',' in name:
        name= name.split(',')
        name[0], name[1] = name[1], name[0]
        name = ' '.join(name)
    return name.strip()

In [127]:
Tate_artists["Name"] = Tate_artists["Name"].apply(cleanArtistsNames)
Tate_acquisitions["Name"] = Tate_acquisitions["Name"].apply(cleanArtistsNames)

### Nationalities
In Tate's dataframe about artists, nationalities are often in the form `city, country`(e.g. 'Philadelphia, United States') and sometimes just indicate a city (e.g. 'Wimbledon'). <br>
Finally, all countries' names are in their original form (e.g. 'Nihon' for 'Japan'). <br>
We normalised it indicating, for each artist, its country of origin, in English. <br>
We looked for all diverging values and replaced them one by one through a script.

In [128]:
def cleanNationalitiesTate(naz): 
    if ',' in naz: 
        naz = naz.split(',')[1] 
    if naz == 'Blackheath': 
        naz= naz.replace('Blackheath', 'United Kingdom') 
    if naz == 'London': 
        naz= naz.replace('London', 'United Kingdom') 
    if naz == 'Kensington': 
        naz= naz.replace('Kensington', 'United Kingdom') 
    if naz == 'Chung-hua Min-kuo': 
        naz= naz.replace('Chung-hua Min-kuo', 'Taiwan') 
    if naz == 'Solothurn': 
        naz= naz.replace('Solothurn', 'Schweiz') 
    if naz == 'Melmerby': 
        naz= naz.replace('Melmerby', 'United Kingdom') 
    if naz == 'Montserrat': 
        naz= naz.replace('Montserrat', 'España') 
    if naz == 'Canterbury': 
        naz= naz.replace('Canterbury', 'United Kingdom') 
    if naz == 'Staten Island': 
        naz= naz.replace('Staten Island', 'United States') 
    if naz == 'Epsom': 
        naz= naz.replace('Epsom', 'United Kingdom') 
    if naz == 'Plymouth': 
        naz= naz.replace('Plymouth', 'United Kingdom') 
    if naz == 'Wimbledon': 
        naz= naz.replace('Wimbledon', 'United Kingdom') 
    if naz == 'Edinburgh': 
        naz= naz.replace('Edinburgh', 'United Kingdom') 
    if naz == 'Beckington': 
        naz= naz.replace('Beckington', 'United Kingdom') 
    if naz == 'Hertfordshire': 
        naz= naz.replace('Hertfordshire', 'United Kingdom') 
    if naz == 'Isle of Man': 
        naz= naz.replace('Isle of Man', 'United Kingdom') 
    if naz == 'Bristol': 
        naz= naz.replace('Bristol', 'United Kingdom') 
    if naz == 'Liverpool': 
        naz= naz.replace('Liverpool', 'United Kingdom') 
    if naz == 'Braintree': 
        naz= naz.replace('Braintree', 'United Kingdom') 
    if naz == 'Stoke on Trent': 
        naz= naz.replace('Stoke on Trent', 'United Kingdom') 
    if naz == 'Rochdale': 
        naz= naz.replace('Rochdale', 'United Kingdom') 
    if 'D.C.' in naz: 
        naz= naz.replace('D.C.', 'Colombia') 
    if 'Otok' in naz: 
        naz= naz.replace('Otok', 'Hrvatska') 
    if 'Département de la' in naz: 
        naz= naz.replace('Département de la', 'France') 
    if naz == 'Niederschlesien': 
        naz= naz.replace('Niederschlesien', 'Polska') 
    if naz == 'Perth': 
        naz= naz.replace('Perth', 'Australia') 
    if naz == 'Bermondsey': 
        naz= naz.replace('Bermondsey', 'United Kingdom') 
    if naz == 'Egremont': 
        naz= naz.replace('Egremont', 'United Kingdom') 
    if naz == 'Charlotte Amalie': 
        naz= naz.replace('Charlotte Amalie', 'United States') 
    if naz == 'Charlieu': 
        naz= naz.replace('Charlieu', 'France') 
    if naz == 'Stockholm': 
        naz= naz.replace('Stockholm', 'Sverige') 
    if naz == 'Auteuil': 
        naz= naz.replace('Auteuil', 'France') 
 
    if 'Polska' in naz: 
        naz = naz.replace('Polska', 'Poland') 
    if "Yisra'el" in naz: 
        naz = naz.replace("Yisra'el", 'Israel') 
    if 'Deutschland' in naz: 
        naz = naz.replace('Deutschland', 'Germany') 
    if 'Schweiz' in naz: 
        naz = naz.replace('Schweiz', 'Switzerland') 
    if 'Suomi' in naz: 
        naz = naz.replace('Suomi', 'Finland') 
    if 'Zhonghua' in naz: 
        naz = naz.replace('Zhonghua', 'China') 
    if 'Türkiye' in naz: 
        naz = naz.replace('Türkiye', 'Turkey') 
    if 'Al-‘Iraq' in naz: 
        naz = naz.replace('Al-‘Iraq', 'Iraq') 
    if 'België' in naz: 
        naz = naz.replace('België', 'Belgium') 
    if 'Rossiya' in naz: 
        naz = naz.replace('Rossiya', 'Russia') 
    if 'Nihon' in naz: 
        naz = naz.replace('Nihon', 'Japan') 
    if 'Éire' in naz: 
        naz = naz.replace('Éire', 'Ireland') 
    if 'Österreich' in naz: 
        naz = naz.replace('Österreich', 'Austria') 
    if 'Saint Hélier' in naz: 
        naz = naz.replace('Saint Hélier', 'United Kingdom') 
    if 'Ceská Republik' in naz: 
        naz = naz.replace('Ceská Republik', 'Czech Republic') 
    if 'Ukrayina' in naz: 
        naz = naz.replace('Ukrayina', 'Ukraine') 
    if 'Ellás' in naz: 
        naz = naz.replace('Ellás', 'Greece') 
    if 'Latvija ' in naz: 
        naz = naz.replace('Latvija ', 'Latvia') 
    if 'Douglas' in naz: 
        naz = naz.replace('Douglas', 'United Kingdom') 
    if 'România' in naz: 
        naz = naz.replace('România', 'Romania') 
    if 'Sverige' in naz: 
        naz = naz.replace('Sverige', 'Sweden') 
    if 'Bharat' in naz: 
        naz = naz.replace('Bharat', 'India')     
    if 'España' in naz: 
        naz = naz.replace('España', 'Spain')   
    if 'Magyarország' in naz: 
        naz = naz.replace('Magyarország', 'Hungery')  
    if 'Slovenská Republika' in naz: 
        naz = naz.replace('Slovenská Republika', 'Slovenia')  
        
    return naz.strip()

In [129]:
Tate_artists["placeOfBirth"] = Tate_artists["placeOfBirth"].apply(cleanNationalitiesTate)
Tate_acquisitions["placeOfBirth"] = Tate_acquisitions["placeOfBirth"].apply(cleanNationalitiesTate)

# Exploration
We can now start exploring our Museums, to get to know them better through available data.

## How many artworks?

In [28]:
museums=[MoMA_artworks, Tate_artworks]
names = ['MoMA','Tate']
for museum in museums:
    selected_rows = museum[~museum['Title'].isnull()]
    name = names.pop(0)
    print("Total artworks at", name, ":", len(selected_rows.index))

Total artworks at MoMA : 140848
Total artworks at Tate : 69201


## When do artworks date back?

In [31]:
museums=[MoMA_artworks, Tate_artworks]
names = ['MoMA','Tate']
for museum in museums:
    museum["Date"] = museum["Date"].astype(int)
    museum.sort_values(by=['Date'], inplace=True)
    museumWithoutZeros = museum[museum['Date'] != 0]
    firstDate = museumWithoutZeros['Date'].iat[0]
    lastDate = museumWithoutZeros['Date'].iat[-1]
    name = names.pop(0)
    print("Most ancient artwork at", name, "dates back to",firstDate )
    print("Most recent artwork at", name, "dates back to",lastDate )    

Most ancient artwork at MoMA dates back to 1768
Most recent artwork at MoMA dates back to 2022
Most ancient artwork at Tate dates back to 1545
Most recent artwork at Tate dates back to 2012


## Artists

For examining artist-related issues, we rely on the two CSV files from the Museums containing information about them, which we already transformed into dataframes (Tate_artists and MoMA_artists). <br> 
In doing so, we avoid duplicates (the same artists may have more than one artwork in the same museum).

### How many artists?

In [32]:
print('Total number of artists at MoMA', len(MoMA_artists))

Total number of artists at MoMA 15243


In [33]:
print('Total number of artists at Tate', len(Tate_artists))

Total number of artists at Tate 3532


### Artists' gender: which is the most represented overall?

### Tate

### Wikidata integration

Since Tate dataset lacks of some gender information, we took advantage of Wikidata in order to integrate it through some remote SPARQL queries.

1. We create a subset of the dataframe containing all and only the the rows in which the gender information is missing.

In [35]:
Tate_to_integrate = Tate_artists[Tate_artists['Gender']== '0']

2. We exclude some entities to which a gender cannot be attributed (e.g., collective or anonymous artists).

In [37]:
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Anonymous']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'born 1945; Mel Ramsden Art & Language (Michael Baldwin  born 1944)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'born 1939; David Bainbridge Art & Language (Terry Atkinson  born 1941; Michael Baldwin  born 1945; Harold Hurrell  born']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'born 1939; Michael Baldwin Art & Language (Terry Atkinson  born 1945)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != '1939-1993; Mel Ramsden Art & Language (Ian Burn  born 1944)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Atlas Group']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Black Audio Film Collective (John Akomfrah; Reece Auguis; Edward George; Lina Gopaul; Avril Johnson; David Lawson; Trevo']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Fionnuala and Leslie Boyd and Evans']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British (?) School']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British (?) School 19th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British School 17th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British School 16th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British School 17th or 18th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British School 18th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British School 19th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British School 20th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Chinese School 18th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'French School 18th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'French School 19th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'International Local (Sarah Charlesworth; Joseph Kosuth; Anthony McCall)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Italian or German (?) School 17th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Langlands and Bell, Ben and Nikki']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Italian or German (?) School 17th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Ben and Nikki Langlands and Bell']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Lucy and Eegyudluk']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'France) M/M (Paris']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'T R Uthco (Doug Hall born 1944, Diane Andrews Hall born 1945, Jody Procter 1944-1998)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Art & Language (Ian Burn, 1939-1993; Mel Ramsden, born 1944)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Unknown']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Marc Voge) Young-Hae Chang Heavy Industries (Young-Hae Chang']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'K.O.S.']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Diane Andrews Hall born 1945 T R Uthco (Doug Hall born 1944  Jody Procter 1944-1998)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Mel Ramsden Art & Language (Ian Burn  born 1944)']

3. We proceed to search the artists' **Wikidata entities**: this will then allow us to look up for their gender.<br>
The SPARQL query you can read below searches for human individuals with a specific artistic occupation (photograpers, artists, graphic artists, painters, video artists, sculptors and visual artists). The `{}` placeholder will be replaced by the artist's name from the dataframe via python `format()` method.<br>
We directly apply the query to our dataframe through a function taking advantage of `qwikidata` is a Python package allowing to interact with Wikidata, and we insert the result in a new colum created on the fly, named `Artist Entity`.<br>
Since Wikidata SPARQL endpoint does not support heavy queries, we search for one occupation at a time and create a CSV file for all the artists for which the corresponding wikidata entity was found (e.g., all photographers).<br>
We then continue the research on the rest of the dataframe, from which we remove the rows for which we already found the wikidata entity. We work on copies to avoid compromising the original dataframe.<br>
Finally, we integrate all profession-specific CSV files into one dataframe, which contains all the rows without gender information with the corresponding Wikidata entity (the information which could not be found on Wikidata was integrated manually).

In [74]:
#Define the SPARQL query.
artists_genders_from_ids = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?artist
WHERE {{
    ?artist wdt:P31 wd:Q5 .
    ?artist wdt:P106 ?occupation
                  FILTER (?occupation IN (wd:Q1281618) ) 
    ?artist rdfs:label ?o
    FILTER regex(?o, \"^{}$\" )
            FILTER (langMatches(lang(?o), "EN")).
}}

"""

In [75]:
# Define the function for applying the query to the dataframe and returning the wanted results.
def find_artists_genders_from_ids(name):
    query = artists_genders_from_ids.format(name.strip())
    res = return_sparql_query_results(query_string=query)
   
    try:
        wdt_uri = res['results']['bindings'][0]['artist']['value']
    except (IndexError, KeyError):
        return ""
    return wdt_uri.split("/")[-1]

In [77]:
# Apply the query,
Tate_to_integrate["Artist Entity"] = Tate_to_integrate["name"].apply(find_artists_genders_from_ids)

In [80]:
# Create a CSV file for profession, e.g., photographers, visual artists etc. 
copy = Tate_to_integrate.copy(deep=True) 
sculptors =  copy[copy['Artist Entity']!= ''] 
sculptors.to_csv('Sculptors.csv')

In [None]:
# Apply the query iteratively, changing the occupation
to_integrate =  copy[copy['Artist Entity']== ''] 
to_integrate["Artist Entity"] = to_integrate["name"].apply(find_artists_genders_from_ids)

In [39]:
# Integrate all CSV files in one dataframe
Tate_artists_integrated = pd.concat(map(pd.read_csv, ['Artists.csv', 'Photographers.csv', 'Videoartists.csv', 'Graphicartists.csv', 'Painters.csv', 'Integratedmanually.csv']), ignore_index=True)

In [40]:
display(Tate_artists_integrated)

Unnamed: 0,Id,Name,Gender,PlaceOfBirth,Artist Entity
0,657,Shusaku Arakawa,0,"Nagoya, Nihon",Q478264
1,14424,Kiyohiko Komura,0,0,Q64826662
2,16926,Len Lye,0,0,Q1288566
3,5672,Vladimir Mayakovsky,0,0,Q132964
4,18266,Mithu Sen,0,0,Q43136922
...,...,...,...,...,...
77,17129,Vara,0,0,0
78,2107,Dee Villers,0,0,0
79,17732,Francis Vivares,0,0,Q5493438
80,11825,Imke Wagener,0,0,0


4. Once we have all Wikidata entities, we interrogate Wikidata to retrieve artists' gender.<br>
We apply the following SPARQL query to our new dataframe and add the retrieved gender information in a new `gender` column, created on the fly (the information which could not be found on Wikidata was integrated manually).

In [12]:
#Define the SPARQL query.
artists_genders = """ 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX wd: <http://www.wikidata.org/entity/> 
SELECT DISTINCT (SAMPLE(?genderLabel) AS ?genderL)
WHERE {{ 
     wd:{}  wdt:P21 ?gender . 
     ?gender rdfs:label ?genderLabel
    FILTER (langMatches(lang(?genderLabel), "EN"))
}} 
"""

In [None]:
# Define the function for applying the query to the dataframe and returning the wanted results.
def find_artists_genders(wikiId): 
    query = artists_genders.format(wikiId.strip()) 
    res = return_sparql_query_results(query_string=query) 
    print(query) 
    try: 
        gender= res['results']['bindings'][0]['genderL']['value'] 
    except (IndexError, KeyError, JSONDecodeError, ChunkedEncodingError): 
        return "" 
    return gender

In [None]:
# Apply the query.
Tate_artists_integrated['gender'] = ArtistIntegrated['Artist Entity'].apply(find_artists_genders)

In [41]:
# Manually integrate on a CSV file the missing information.
Tate_artists_with_gender = pd.read_csv('ArtistIntegratedManually.csv')
Tate_artists_with_gender["Id"] = Tate_artists_with_gender["Id"].astype(str)

5. Finally, we can add the dataframe with integrated information to the the dataframe already containing gender data (excluding collective and anonym artists), and analyse the gender distribution leveraging Pandas built-in function `value_counts()`, which counts values for our gender column.

In [42]:
Tate_gender = Tate_artists[Tate_artists['Gender'] != '0']

In [43]:
gender_count = Tate_gender.append(Tate_artists_with_gender)

In [None]:
# Wikidata results are lowercase, let us capitalize them and count occurrences for each gender.
gender_count['Gender'].replace(to_replace=['male'], value='Male', inplace= True)
gender_count['Gender'].replace(to_replace=['female'], value='Female', inplace= True)

In [61]:
gender_count['Gender'].value_counts()

Male      2954
Female     534
0           10
Name: Gender, dtype: int64

In [73]:
genderCount = {'Male': 2954, 'Female': 534}
tot = len(gender_count)
print ("Male artists' percentage is:", round((genderCount['Male']/tot)*100),'%;', "Female artists' percentage is:", round((genderCount['Female']/tot)*100),'%')

Male artists' percentage is: 84 %; Female artists' percentage is: 15 %


### MoMA

As for Tate, let us analyse the frequency of male and female artists in the collection.

In [75]:
MoMA_artists['Gender'].replace(to_replace=['male'], value='Male', inplace= True)
MoMA_artists['Gender'].replace(to_replace=['female'], value='Female', inplace= True)

In [76]:
MoMA_artists['Gender'].value_counts()

Male          9732
0             3165
Female        2343
Non-Binary       2
Non-binary       1
Name: Gender, dtype: int64

In [78]:
genderCount = {'Male': 9732, 'Female': 2343}
tot = len(MoMA_artists)
print ("Male artists' percentage is:", round((genderCount['Male']/tot)*100),'%;', "Female artists' percentage is:", round((genderCount['Female']/tot)*100),'%')

Male artists' percentage is: 64 %; Female artists' percentage is: 15 %


### Nationalities: which are the most represented nationalities overall?
As for genres, we examine the distribution of artists' nationalities in out datasets.

### MoMa

In [79]:
MoMA_artists['Nationality'].value_counts()

American     5181
0            2472
German        965
British       860
French        847
             ... 
Cambodian       1
Cypriot         1
Sahrawi         1
Kuwaiti         1
Beninese        1
Name: Nationality, Length: 120, dtype: int64

### Tate

In [80]:
Tate_artists['placeOfBirth'].value_counts()

United Kingdom    1522
0                  492
United States      341
France             160
Germany            142
                  ... 
Tunis                1
Iraq                 1
Bénin                1
Malaysia             1
Makedonija           1
Name: placeOfBirth, Length: 99, dtype: int64

# Acquisition criteria.

## 1.  In which years are artists' works mostly acquired?

### Year by year

In [94]:
MoMA['DateAcquired'].value_counts()

1964    12828
2008     7204
1968     6894
0        6682
2001     4170
        ...  
1933       93
1932       18
1929        9
1930        7
1931        3
Name: DateAcquired, Length: 95, dtype: int64

In [95]:
Tate['DateAcquired'].value_counts

<bound method IndexOpsMixin.value_counts of 0        1922
1        1922
2        1922
3        1922
4        1919
         ... 
69196    2013
69197    2013
69198    2013
69199    2013
69200    2013
Name: DateAcquired, Length: 69201, dtype: object>

### Every ten years

Let us analyse acquisitions from a larger perspective: not year by year, but for every ten years.

### MoMA

In [102]:
MoMA_artworks.to_csv('MoMA.csv') 
with open('MoMA.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    years={} 
    for item in reader: 
        if item['DateAcquired']not in years: 
            years[item['DateAcquired']]= 1 
        else: 
            years[item['DateAcquired']]+= 1 
new_dict={} 
for key in years: 
    key_int=int(key) 
    if key_int in range(1928,1941): 
        if '1930s' not in new_dict.keys(): 
               new_dict['1930s']= years[key] 
        else: 
            new_dict['1930s'] += years[key] 
    if key_int in range(1940,1951): 
        if '1940s' not in new_dict.keys(): 
               new_dict['1940s']= years[key] 
        else: 
            new_dict['1940s'] += years[key] 
     
    if key_int in range(1950,1961): 
        if '1950s' not in new_dict.keys(): 
               new_dict['1950s']= years[key] 
        else: 
            new_dict['1950s'] += years[key] 
     
    if key_int in range(1960,1971): 
        if '1960s' not in new_dict.keys(): 
               new_dict['1960s']= years[key] 
        else: 
            new_dict['1960s'] += years[key] 
     
    if key_int in range(1970,1981): 
        if '1970s' not in new_dict.keys(): 
               new_dict['1970s']= years[key] 
        else: 
            new_dict['1970s'] += years[key] 
    if key_int in range(1980,1991): 
        if '1980s' not in new_dict.keys(): 
               new_dict['1980s']= years[key] 
        else: 
            new_dict['1980s'] += years[key] 
     
    if key_int in range(1990,2001): 
        if '1990s' not in new_dict.keys(): 
               new_dict['1990s']= years[key] 
        else: 
            new_dict['1990s'] += years[key] 
    
    if key_int in range(2000,2011): 
        if '2000s' not in new_dict.keys(): 
               new_dict['2000s']= years[key] 
        else: 
            new_dict['2000s'] += years[key] 
         
     
print('MoMA:', new_dict)

MoMA: {'1990s': 13332, '1960s': 31950, '1970s': 13868, '1980s': 11497, '2000s': 26861, '1940s': 8274, '1950s': 6846, '1930s': 3318}


In [103]:
Tate_artworks.to_csv('Tate.csv') 
with open('Tate.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    years={} 
    for item in reader: 
        if item['DateAcquired']not in years: 
            years[item['DateAcquired']]= 1 
        else: 
            years[item['DateAcquired']]+= 1 
new_dict={} 
for key in years: 
    key_int=int(key) 
    if key_int in range(1928,1941): 
        if '1930s' not in new_dict.keys(): 
               new_dict['1930s']= years[key] 
        else: 
            new_dict['1930s'] += years[key] 
    if key_int in range(1940,1951): 
        if '1940s' not in new_dict.keys(): 
               new_dict['1940s']= years[key] 
        else: 
            new_dict['1940s'] += years[key] 
     
    if key_int in range(1950,1961): 
        if '1950s' not in new_dict.keys(): 
               new_dict['1950s']= years[key] 
        else: 
            new_dict['1950s'] += years[key] 
     
    if key_int in range(1960,1971): 
        if '1960s' not in new_dict.keys(): 
               new_dict['1960s']= years[key] 
        else: 
            new_dict['1960s'] += years[key] 
     
    if key_int in range(1970,1981): 
        if '1970s' not in new_dict.keys(): 
               new_dict['1970s']= years[key] 
        else: 
            new_dict['1970s'] += years[key] 
    if key_int in range(1980,1991): 
        if '1980s' not in new_dict.keys(): 
               new_dict['1980s']= years[key] 
        else: 
            new_dict['1980s'] += years[key] 
     
    if key_int in range(1990,2001): 
        if '1990s' not in new_dict.keys(): 
               new_dict['1990s']= years[key] 
        else: 
            new_dict['1990s'] += years[key] 
    
    if key_int in range(2000,2011): 
        if '2000s' not in new_dict.keys(): 
               new_dict['2000s']= years[key] 
        else: 
            new_dict['2000s'] += years[key] 
         
     
print('Tate:', new_dict)

Tate: {'1990s': 6693, '2000s': 5345, '1980s': 5538, '1940s': 776, '1950s': 633, '1970s': 6853, '1930s': 788, '1960s': 1048}


## Gender Gap 
We have already analysed the total number of male and female artists. Let us now anlysed the gender of acquired artists in time. In this way, we will try to investigate the gender gap in our museum's collections and try to understand how it changes (if it does) throughout the years. 

### MoMa

Number of female and male artists acquired every ten years.

In [130]:
MoMA_acquisitions.to_csv('MoMA_acquisitions.csv')
with open('MoMA_acquisitions.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    gender={'1930s': {'Male': 0, 'Female': 0}, '1940s': {'Male': 0, 'Female': 0}, '1950s': {'Male': 0, 'Female': 0}, '1960s': {'Male': 0, 'Female': 0}, '1970s': {'Male': 0, 'Female': 0}, '1980s': {'Male': 0, 'Female': 0}, '1990s': {'Male': 0, 'Female': 0}, '2000s': {'Male': 0, 'Female': 0}} 
    for item in reader: 
        if int(item['DateAcquired']) in range (1928,1941): 
            if (item['Gender'] == 'Female'):  
                gender['1930s']['Female'] += 1 
            else: 
                gender['1930s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1940,1951): 
            if (item['Gender'] == 'Female'):  
                gender['1940s']['Female'] += 1 
            else: 
                gender['1940s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1950,1961): 
            if (item['Gender'] == 'Female'):  
                gender['1950s']['Female'] += 1 
            else: 
                gender['1950s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1960,1971): 
            if (item['Gender'] == 'Female'):  
                gender['1960s']['Female'] += 1 
            else: 
                gender['1960s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1970,1981): 
            if (item['Gender'] == 'Female'):  
                gender['1970s']['Female'] += 1 
            else: 
                gender['1970s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1980,1991): 
            if (item['Gender'] == 'Female'):  
                gender['1980s']['Female'] += 1 
            else: 
                gender['1980s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1990,2001): 
            if (item['Gender'] == 'Female'):  
                gender['1990s']['Female'] += 1 
            else: 
                gender['1990s']['Male'] += 1 
        if int(item['DateAcquired']) in range (2000,2011): 
            if (item['Gender'] == 'Female'):  
                gender['2000s']['Female'] += 1 
            else: 
                gender['2000s']['Male'] += 1 
    print(gender)

{'1930s': {'Male': 292, 'Female': 32}, '1940s': {'Male': 739, 'Female': 109}, '1950s': {'Male': 942, 'Female': 101}, '1960s': {'Male': 1561, 'Female': 159}, '1970s': {'Male': 962, 'Female': 174}, '1980s': {'Male': 1246, 'Female': 238}, '1990s': {'Male': 1031, 'Female': 310}, '2000s': {'Male': 1771, 'Female': 503}}


Percentage of female-male artists every ten years

In [131]:
for el in gender: 
    tot = gender[el]['Male'] + gender[el]['Female'] 
    percentage = (gender[el]['Male']/tot)*100 
    print (el, 'Male', round(percentage),'%', 'Female', round(100-percentage),'%')

1930s Male 90 % Female 10 %
1940s Male 87 % Female 13 %
1950s Male 90 % Female 10 %
1960s Male 91 % Female 9 %
1970s Male 85 % Female 15 %
1980s Male 84 % Female 16 %
1990s Male 77 % Female 23 %
2000s Male 78 % Female 22 %


## Tate

Number of female and male artists acquired every ten years.

In [132]:
# We use the file with gender integration, adding acquisition dates and dropping duplicates
Tate_acquisitions_integrated = pd.merge(gender_count,Tate_artworks[['Id', 'DateAcquired']], on='Id', how='left')
Tate_acquisitions_integrated.fillna(value ='0', inplace = True )
Tate_acquisitions_integrated = Tate_acquisitions_integrated.drop_duplicates(subset='Name', keep="first")
Tate_acquisitions_integrated.to_csv('Tate_acquisitions_integrated.csv')
with open('Tate_acquisitions_integrated.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    gender={'1930s': {'Male': 0, 'Female': 0}, '1940s': {'Male': 0, 'Female': 0}, '1950s': {'Male': 0, 'Female': 0}, '1960s': {'Male': 0, 'Female': 0}, '1970s': {'Male': 0, 'Female': 0}, '1980s': {'Male': 0, 'Female': 0}, '1990s': {'Male': 0, 'Female': 0}, '2000s': {'Male': 0, 'Female': 0}} 
    for item in reader: 
        if int(item['DateAcquired']) in range (1928,1941): 
            if (item['Gender'] == 'Female'):  
                gender['1930s']['Female'] += 1 
            else: 
                gender['1930s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1940,1951): 
            if (item['Gender'] == 'Female'):  
                gender['1940s']['Female'] += 1 
            else: 
                gender['1940s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1950,1961): 
            if (item['Gender'] == 'Female'):  
                gender['1950s']['Female'] += 1 
            else: 
                gender['1950s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1960,1971): 
            if (item['Gender'] == 'Female'):  
                gender['1960s']['Female'] += 1 
            else: 
                gender['1960s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1970,1981): 
            if (item['Gender'] == 'Female'):  
                gender['1970s']['Female'] += 1 
            else: 
                gender['1970s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1980,1991): 
            if (item['Gender'] == 'Female'):  
                gender['1980s']['Female'] += 1 
            else: 
                gender['1980s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1990,2001): 
            if (item['Gender'] == 'Female'):  
                gender['1990s']['Female'] += 1 
            else: 
                gender['1990s']['Male'] += 1 
        if int(item['DateAcquired']) in range (2000,2011): 
            if (item['Gender'] == 'Female'):  
                gender['2000s']['Female'] += 1 
            else: 
                gender['2000s']['Male'] += 1 
    print(gender)

{'1930s': {'Male': 189, 'Female': 40}, '1940s': {'Male': 106, 'Female': 14}, '1950s': {'Male': 161, 'Female': 13}, '1960s': {'Male': 230, 'Female': 19}, '1970s': {'Male': 499, 'Female': 71}, '1980s': {'Male': 301, 'Female': 45}, '1990s': {'Male': 348, 'Female': 88}, '2000s': {'Male': 402, 'Female': 145}}


Percentage of female-male artists acquired every ten years

In [133]:
for el in gender: 
    tot = gender[el]['Male'] + gender[el]['Female'] 
    percentage = (gender[el]['Male']/tot)*100 
    print (el, 'Male', round(percentage),'%', 'Female', round(100-percentage),'%')

1930s Male 83 % Female 17 %
1940s Male 88 % Female 12 %
1950s Male 93 % Female 7 %
1960s Male 92 % Female 8 %
1970s Male 88 % Female 12 %
1980s Male 87 % Female 13 %
1990s Male 80 % Female 20 %
2000s Male 73 % Female 27 %


## Nationalities 

Does nationality affect the artists' selection? Does acquisition campaigns show different tendencies and patterns throughout the years, when it comes to the artists' nationality?

### MoMA
For every ten years, we count the nationalities' frequency.

In [107]:
def cleanDates(date): 
    if '.' in date: 
        date = date.split('.')[0] 
    return date

In [232]:
display(MomA_artists)

NameError: name 'MomA_artists' is not defined

In [230]:
MoMA_nationalities = pd.merge(MoMA_artists,MoMA[['Id', 'DateAcquired']],on='Id', how='left') 
MoMA_nationalities.fillna(value='0', inplace=True) 
MoMA_nationalities["DateAcquired"] = MoMA_nationalities["DateAcquired"].astype(str) 
#MoMA_nationalities["DateAcquired"] = MoMA_nationalities["DateAcquired"].apply(cleanDates) 
MoMA_nationalities = MoMA.drop_duplicates(subset='Artist', keep="first") 
MoMA_nationalities.to_csv('MoMaNationalities.csv')

In [231]:
from collections import defaultdict  
 
with open('MoMaNationalities.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    nationalities = defaultdict(dict) 
    for item in reader: 
        if int(item['DateAcquired']) in range (1928,1941): 
            if item['Nationality'] not in nationalities['1930s']: 
                nationalities['1930s'][item['Nationality']] = 1 
            else: 
                nationalities['1930s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1940,1951): 
            if item['Nationality'] not in nationalities['1940s']: 
                nationalities['1940s'][item['Nationality']] = 1 
            else: 
                nationalities['1940s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1950,1961): 
            if item['Nationality'] not in nationalities['1950s']: 
                nationalities['1950s'][item['Nationality']] = 1 
            else: 
                nationalities['1950s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1960,1971): 
            if item['Nationality'] not in nationalities['1960s']: 
                nationalities['1960s'][item['Nationality']] = 1 
            else: 
                nationalities['1960s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1970,1981): 
            if item['Nationality'] not in nationalities['1970s']: 
                nationalities['1970s'][item['Nationality']] = 1 
            else: 
                nationalities['1970s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1980,1991): 
            if item['Nationality'] not in nationalities['1980s']: 
                nationalities['1980s'][item['Nationality']] = 1 
            else: 
                nationalities['1980s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1990,2001): 
            if item['Nationality'] not in nationalities['1990s']: 
                nationalities['1990s'][item['Nationality']] = 1 
            else: 
                nationalities['1990s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (2000,2011): 
            if item['Nationality'] not in nationalities['2000s']: 
                nationalities['2000s'][item['Nationality']] = 1 
            else: 
                nationalities['2000s'][item['Nationality']] += 1 
         
                 
 
print(nationalities)

defaultdict(<class 'dict'>, {'1990s': {'(Austrian)': 20, '(French)': 70, '()': 15, '(American)': 630, '(Dutch) (Dutch)': 3, '(Swedish)': 5, '(Swedish) (Swedish)': 2, '(British)': 85, '(American) (American) (American)': 5, '(Japanese)': 68, '(Dutch) (British) (British) (Dutch)': 2, '(Argentine)': 7, '(Brazilian) (Brazilian)': 4, '(American) (American)': 38, '() (American) (American)': 3, '(Italian)': 53, '(Spanish)': 29, '(Dutch) (British) (Dutch) (British)': 1, '(Austrian) (Polish)': 1, '(American) (American) (American) (American)': 3, '(Iranian)': 2, '(Dutch) (German) (Dutch) (German) (Canadian) (Belgian) (American) (Dutch) (Spanish)': 1, '(Dutch)': 37, '(Swiss) (Swiss) (Swiss)': 1, '(French) (French)': 7, '() (American)': 4, '(Japanese) (Japanese)': 5, '(Swiss)': 41, '(Dutch) (British)': 2, '(Italian) (Italian)': 6, '(Italian) (Italian) (Italian)': 2, '(Italian) (Italian) (Italian) (Italian) (Italian) (Italian)': 4, '(German) () ()': 2, '(British) (British)': 4, '(American) (American

In [113]:
Tate_nationalities = pd.merge(Tate_artists,Tate[['Id', 'DateAcquired']], on = 'Id', how = 'left')
Tate_nationalities = Tate_nationalities.drop_duplicates(subset='name', keep = "first")
Tate_nationalities.fillna(value='0', inplace=True)
Tate_nationalities["DateAcquired"] = Tate_nationalities["DateAcquired"].astype(str)
Tate_nationalities.rename(columns = {'placeOfBirth':'Nationality'}, inplace = True)
Tate_nationalities.to_csv("TateNationalities.csv")

In [114]:
from collections import defaultdict 

with open('TateNationalities.csv', mode='r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    nationalities = defaultdict(dict)
    for item in reader:
        if int(item['DateAcquired']) in range (1928,1941):
            if item['Nationality'] not in nationalities['1930s']:
                nationalities['1930s'][item['Nationality']] = 1
            else:
                nationalities['1930s'][item['Nationality']] += 1
        if int(item['DateAcquired']) in range (1940,1951):
            if item['Nationality'] not in nationalities['1940s']:
                nationalities['1940s'][item['Nationality']] = 1
            else:
                nationalities['1940s'][item['Nationality']] += 1
        if int(item['DateAcquired']) in range (1950,1961):
            if item['Nationality'] not in nationalities['1950s']:
                nationalities['1950s'][item['Nationality']] = 1
            else:
                nationalities['1950s'][item['Nationality']] += 1
        if int(item['DateAcquired']) in range (1960,1971):
            if item['Nationality'] not in nationalities['1960s']:
                nationalities['1960s'][item['Nationality']] = 1
            else:
                nationalities['1960s'][item['Nationality']] += 1
        if int(item['DateAcquired']) in range (1970,1981):
            if item['Nationality'] not in nationalities['1970s']:
                nationalities['1970s'][item['Nationality']] = 1
            else:
                nationalities['1970s'][item['Nationality']] += 1
        if int(item['DateAcquired']) in range (1980,1991):
            if item['Nationality'] not in nationalities['1980s']:
                nationalities['1980s'][item['Nationality']] = 1
            else:
                nationalities['1980s'][item['Nationality']] += 1
        if int(item['DateAcquired']) in range (1990,2001):
            if item['Nationality'] not in nationalities['1990s']:
                nationalities['1990s'][item['Nationality']] = 1
            else:
                nationalities['1990s'][item['Nationality']] += 1
        if int(item['DateAcquired']) in range (2000,2011): 
            if item['Nationality'] not in nationalities['2000s']: 
                nationalities['2000s'][item['Nationality']] = 1 
            else: 
                nationalities['2000s'][item['Nationality']] += 1 
        
                

print(nationalities)

defaultdict(<class 'dict'>, {'2000s': {'Poland': 9, 'United States': 99, 'Germany': 39, 'Finland': 1, 'China': 10, 'Iraq': 1, 'Russia': 2, 'United Kingdom': 150, 'Belgium': 5, 'México': 7, 'Perú': 3, 'Ukraine': 3, 'Îran': 5, 'Italia': 12, 'Venezuela': 4, 'Turkey': 2, '0': 29, 'France': 14, 'Israel': 5, 'Brasil': 17, 'Jugoslavija': 2, 'Uganda': 1, 'Norge': 1, 'Nederland': 4, 'South Africa': 6, 'Romania': 2, 'Argentina': 8, 'Cuba': 4, 'Canada': 9, 'Ireland': 4, 'Greece': 2, 'Colombia': 5, 'Latvija': 1, 'Sweden': 1, 'Chile': 2, 'Czech Republica': 4, 'Danmark': 5, 'Spain': 5, 'Austria': 4, 'Switzerland': 6, 'Pakistan': 1, 'Mehoz': 2, 'India': 3, 'Japan': 10, 'Bahamas': 1, 'Hungery': 3, 'Bangladesh': 1, 'Hrvatska': 2, 'Slovenia': 2, "Taehan Min'guk": 1, 'Zimbabwe': 1, 'Sri Lanka': 1, 'New Zealand': 4, 'Luxembourg': 1, 'Ísland': 1, 'Pilipinas': 1, 'Lietuva': 2, 'Australia': 2, 'Al-Lubnan': 3, 'Kenya': 1, "Al-Jaza'ir": 1, 'Lao': 1, 'Malta': 1, 'Panamá': 1, 'Misr': 1, 'Portugal': 3, 'Shqipëria