#NOME FIGO
Art and history of art are no sealed compartments: they are heavily inter-dependent with social, political, economic factors, which in turn influence our very perception of what art is.

Cultural institutions and museums in particular play a fundamental role in this intertwined dynamics: through their selection activity, they have the potential to shape the public understanding of arts and its modifications throughout time. 
In some way, what makes into museums makes into history of art.

From these considerations stems our analysis: how do external (social, political, economic) factors influence the perception of art and its history?
A way to investigate it is by looking at the greatest and most representative museums around the world, and at their acquisition policies and campaigns in particular.

Our key questions:
In which ways have the acquisition campaigns of the major museums in the world changed throughout the years? 


Our workflow:
1. Interrogate WikiData:
    - What are the biggest collections around the world?
2. Find csv files for some of the major museums.
3. Select some representative time slots (both internal and external factors).
4. Analyse acquisitions during these time slots for every museum and compare:
    a) Difference between different slots in the same museum;
    b) Difference between different museums for the same time slot;

Our questions:
- What was the initial nucleus of each museum? 
- Internal survey: Is there a significant date or decade for the acquisitions? 
- External survey: What are the acquisition trends around the Xs/between the x and the y? / What are the acquisition trends within and across these museums? 
- During these years, who are the most represented makers? What is the most represented gender? What is the most represented movement? What is the most represented nationality? 


We analysed 5/4 of the (MET, MoMa, N+, Cleveland?, Tate) 

Wikidata interrogation: failure.

1. What are the largest art collections?

SELECT ?museum (COUNT(?work) AS ?works) WHERE {
  ?work wdt:P195 ?museum.
  ?museum wdt:P31 wd:Q207694
  }
GROUP BY ?museum 
ORDER BY DESC(?works)

2.  Which were the most visited museums in 2018?

SELECT ?museumLabel ?visitors ?year
WHERE {
  ?museum wdt:P31 wd:Q207694;
          wdt:P1705 ?museumLabel;
          wdt:P1174 ?visitors;
          p:P1174/pq:P585 ?year .
FILTER(YEAR(?year) = 2018).
}

ORDER BY DESC(?visitors)

Since WikiData was not providing reliable results, we decided to go back to its sources (The Art Newspaper https://www.theartnewspaper.com/) and manually collect data about the most visited museums in the last four years(2018-2022).

https://onedrive.live.com/view.aspx?resid=E34DDE1A3F2F2160!138&ithint=file%2cxlsx&authkey=!AN4u-K4bko37iOU
    
We verified the availability of open datasets for each of the top 20 most visited museums on this GitHub repository (https://github.com/Ambrosiani/museums-on-github), containing a list of museums with GitHub accounts.

Our analysis led us to the decision to focus on four museums:
- Tate Modern, London
- MoMa, NY
- Met, NY
- National Gallery of Art, Washington DC

**Info generale sui musei.

In [145]:
import pandas as pd
import csv
import re

#MoMa 

In [146]:
spreadsheet = pd.read_csv('https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artworks.csv')
pd.set_option('display.max_columns', None)
artworks = spreadsheet[['Title', 'Artist', 'ConstituentID', 'Nationality', 'BeginDate', 'EndDate', 'Gender', 'Date', 'Medium', 'CreditLine', 'Classification', 'Department', 'DateAcquired', 'URL']]
artists = pd.read_csv('https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artists.csv')
artists["ConstituentID"] = artists["ConstituentID"].astype(str)
MoMa = pd.merge(artworks,artists[['ConstituentID', 'Wiki QID']],on='ConstituentID', how='left')
MoMa.rename(columns = {'ConstituentID':'Id', 'BeginDate':'BirthDate', 'EndDate':'DeathDate'}, inplace = True)
MoMa.Date = MoMa.Date.fillna('Not available')
MoMa['Date'] = MoMa['Date'].astype(str)
#MoMa.to_csv("MoMa.csv")
#MoMa.head(30)

#Tate

In [147]:
spreadsheet = pd.read_csv('https://raw.githubusercontent.com/tategallery/collection/master/artwork_data.csv')
pd.set_option('display.max_columns', None)
artworks = spreadsheet[['artist', 'artistId', 'title', 'medium', 'creditLine', 'year', 'acquisitionYear', 'url']]
artworks.rename(columns = {'artistId':'id'}, inplace = True)
artworks.id = artworks.id.astype(str)
artists = pd.read_csv('https://raw.githubusercontent.com/tategallery/collection/master/artist_data.csv')
artists["id"] = artists["id"].astype(str)
Tate = pd.merge(artworks,artists[['id', 'gender', 'yearOfBirth', 'yearOfDeath']], on='id', how='left')
Tate.rename(columns = {'artist':'Artist', 'id':'Id', 'title':'Title', 'yearOfBirth':'BirthDate', 'yearOfDeath':'DeathDate', 'medium':'Medium', 'creditLine':'CreditLine', 'year':'Date', 'acquisitionYear':'DateAcquired', 'url':'URL', 'gender':'Gender'}, inplace = True)
Tate.to_csv("Tate.csv")
#Tate.head(3)

ParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.

In [None]:
#Met

In [None]:
spreadsheet = pd.read_csv('https://media.githubusercontent.com/media/metmuseum/openaccess/master/MetObjects.csv')
pd.set_option('display.max_columns', None)
Met = spreadsheet[['AccessionYear', 'Title', 'Culture', 'Artist Display Name', 'Artist Nationality', 'Artist Begin Date', 'Artist End Date', 'Artist Gender', 'Artist Wikidata URL', 'Object End Date', 'Medium', 'Credit Line', 'Classification', 'Link Resource', 'Object Wikidata URL']]
Met.rename(columns = {'Artist Display Name':'Artist', 'id':'Id', 'Artist Begin Date':'BirthDate', 'Artist End Date':'DeathDate', 'Credit Line':'CreditLine', 'Object End Date':'Date', 'AccessionYear':'DateAcquired', 'Artist Wikidata URL':'Wiki QID', 'Artist Gender':'Gender', 'Link Resource':'URL', 'Artist Nationality':'Nationality'}, inplace = True)
Met.to_csv("Met.csv")

In [None]:
#Nga

In [None]:
spreadsheet = pd.read_csv('https://raw.githubusercontent.com/NationalGalleryOfArt/opendata/main/data/objects.csv')
pd.set_option('display.max_columns', None)
Nga = spreadsheet[['accessionnum', 'title', 'endyear', 'medium', 'attribution', 'creditline', 'classification']]
Nga.rename(columns = {'attribution':'Artist', 'id':'Id', 'title':'Title', 'medium':'Medium', 'creditline':'CreditLine', 'endyear':'Date', 'accessionnum':'DateAcquired', 'classification':'Classification', 'Object End Date':'Date'}, inplace = True)
Nga.to_csv("Nga.csv")

#Exploring our Museums
<br>
Next, we decided to explore the four collections.
- How many items does each collection contain?
- Which timespan do items cover overall?
- First and last acquisition date for each museum. Tate's csv last update dates back to 2014.
- Total artists' number.
- Most represented artist, gender and nationality in general?
- Which artworks typologies are represented in each museum and in which proportion?

In [None]:
museums=[MoMa, Met, Tate, Nga]
names = ['Moma', 'Met', 'Tate', 'Nga']
for museum in museums:
    selected_rows = museum[~museum['Title'].isnull()]
    name = names.pop(0)
    print("Total items at", name, ":", len(selected_rows.index))

In [None]:
#ARTWORKS DATES

#Clean MoMa's artworks' creation dates 

In [None]:
'''
def cleanDates(date):
    if '-' in date:
        splitted = date.split('-')
        year = splitted[0]
        return year
    if '/' in date:
        splitted = date.split('/')
        year = splitted[0]
        return year
    if ',' in date:
        splitted = date.split(',')
        year = splitted[0]
        return year
    if '.' in date:
        splitted = date.split('.')
        year = splitted[0]
        return year
    else:
        return date
        '''

In [None]:
'''
def cleanDates(date):
    date = re.sub('', '', date) 

    return date
    '''

In [148]:
def cleanDatesMoma(date):
    if '-' in date:
        splitted = date.split('-')
        date = ' '.join(splitted) 
    if '/' in date:
        splitted = date.split('/')
        date = ' '.join(splitted) 
    if ',' in date:
        splitted = date.split(',')
        date = ' '.join(splitted) 
    if '.' in date:
        splitted = date.split('.')
        date = ' '.join(splitted) 
        
    x = re.search("\d{4}", date)
    if x:
        date = x.group()
    return date

In [149]:
MoMaNew = MoMa.copy(deep=True)
MoMaNew["Date"] = MoMaNew["Date"].apply(cleanDatesMoma)
MoMaNew.to_csv("MoMaNew.csv")
#MoMaNew.head(30)

Artworks' timespan

In [None]:
museums=['MoMaNew.csv','Met.csv', 'Nga.csv', 'Tate.csv']
names = ['MoMa','Met', 'Nga', 'Tate']
for museum in museums:
    with open(museum, mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        years=[]
        for item in reader:
            if item['Date'] != ''and item['Date'] != '(n d )'and item['Date'] != 'TBD'and item['Date'] != 'nd' and len(item['Date'])<=6 and item['Date'] != 'n d ' and '(' not in item['Date'] and item['Date'] != 'n d ' and item['Date'] != 'n d' and item['Date'] != 'n  d ' and item['Date'] != 'Unkown' and item['Date'] != 'TBC':
                years.append(str(item['Date']))        
        years.sort()
        name = names.pop(0)
    print("Most ancient artwork at", name, "dates back to", years[0])
    print("Most recent artwork at", name, "dates back to", years[-1])

#Artworks acquisition

Clean MoMa's artworks' acquisition dates

In [None]:
def cleanAcquisitionDatesMoma(date):
    if '-' in date:
        date = date.split('-')[0]
    return date

In [150]:
MoMaNext = MoMaNew.copy(deep=True)
MoMaNext = MoMaNext[MoMaNext['DateAcquired'].notna()]
MoMaNext["DateAcquired"] = MoMaNext["DateAcquired"].apply(cleanDatesMoma)
MoMaNext.to_csv("MoMaNew.csv")
MoMaNext.head(200)

Unnamed: 0,Title,Artist,Id,Nationality,BirthDate,DeathDate,Gender,Date,Medium,CreditLine,Classification,Department,DateAcquired,URL,Wiki QID
0,"Ferdinandsbrücke Project, Vienna, Austria (Ele...",Otto Wagner,6210,(Austrian),(1841),(1918),(Male),1896,Ink and cut-and-pasted painted pages on paper,Fractional and promised gift of Jo Carole and ...,Architecture,Architecture & Design,1996,http://www.moma.org/collection/works/2,Q84287
1,"City of Music, National Superior Conservatory ...",Christian de Portzamparc,7470,(French),(1944),(0),(Male),1987,Paint and colored pencil on print,Gift of the architect in honor of Lily Auchinc...,Architecture,Architecture & Design,1995,http://www.moma.org/collection/works/3,Q312838
2,"Villa near Vienna Project, Outside Vienna, Aus...",Emil Hoppe,7605,(Austrian),(1876),(1957),(Male),1903,"Graphite, pen, color pencil, ink, and gouache ...",Gift of Jo Carole and Ronald S. Lauder,Architecture,Architecture & Design,1997,http://www.moma.org/collection/works/4,Q1336246
3,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,(),(1944),(0),(Male),1980,Photographic reproduction with colored synthet...,Purchase and partial gift of the architect in ...,Architecture,Architecture & Design,1995,http://www.moma.org/collection/works/5,Q123966
4,"Villa, project, outside Vienna, Austria, Exter...",Emil Hoppe,7605,(Austrian),(1876),(1957),(Male),1903,"Graphite, color pencil, ink, and gouache on tr...",Gift of Jo Carole and Ronald S. Lauder,Architecture,Architecture & Design,1997,http://www.moma.org/collection/works/6,Q1336246
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,"Ibirapuera Park project, São Paulo, Brazil (Si...","Roberto Burle Marx, Oscar Niemeyer","6934, 8103",(Brazilian) (Brazilian),(1909) (1907),(1994) (2012),(Male) (Male),1953,Gouache and graphite on board,"Gift of Roblee McCarthy, Jr. Fund and Lily Auc...",Architecture,Architecture & Design,1991,http://www.moma.org/collection/works/252,
196,"Ibirapuera Park Project, São Paulo, Brazil (Pe...","Roberto Burle Marx, Oscar Niemeyer","6934, 8103",(Brazilian) (Brazilian),(1909) (1907),(1994) (2012),(Male) (Male),1953,Gouache on board,"Gift of Roblee McCarthy, Jr. Fund and Lily Auc...",Architecture,Architecture & Design,1991,http://www.moma.org/collection/works/253,
197,"Prototype Architecture School No. 5, project, ...",Neil M. Denari,7933,(American),(1957),(0),(Male),1992,"Ink, airbrush, and cut-and-pasted printed self...",Ralph Fehlbaum Purchase Fund,Architecture,Architecture & Design,1998,http://www.moma.org/collection/works/255,Q6135237
198,"Bismarck Monument, project, Bingen, Germany, P...",Ludwig Mies van der Rohe,7166,(American),(1886),(1969),(Male),1910,Gouache on linen,"Rob Beyer Purchase Fund, Edward Larrabee Barne...",Architecture,Architecture & Design,1998,http://www.moma.org/collection/works/256,Q41508


Acquisition' timespan

In [None]:
museums=['MoMaNew.csv','Met.csv', 'Nga.csv', 'Tate.csv']
names = ['MoMa','Met', 'Nga', 'Tate']
for museum in museums:
    with open(museum, mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        acquisitionyears=[]
        for item in reader:
            if '.' in item['DateAcquired']:
                    item['DateAcquired']= item['DateAcquired'].split('.')[0]
            if item['DateAcquired'] != '' and item['DateAcquired'] != 'Object Number':
                acquisitionyears.append(item['DateAcquired'])
        acquisitionyears.sort()
        name = names.pop(0)
    print("Most ancient artwork at", name, "dates back to", acquisitionyears[0])
    print("Most recent artwork at", name, "dates back to", acquisitionyears[-1])

Total artists' number.

In [207]:
museums=['MoMaNew.csv']
names = ['MoMa']
for museum in museums:
    with open(museum, mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        artists = set() 
        for item in reader:
            if item['Artist']!= '' and 'Unidentified'not in item['Artist'] and 'Various' not in item['Artist']:
                if ',' in item['Artist']:
                    item['Artist'] = item['Artist'].split(',')
                    for n in range(len(item['Artist'])):
                        artist= item['Artist'][n]
                        artists.add(artist)
                else:
                    artists.add(item['Artist'])
    name = names.pop(0)
    print("Number of artists at", name, "is", len(artists) )

Number of artists at MoMa is 14590


Most represented gender in general?

In [204]:
MoMaGender = MoMa.drop_duplicates(subset='Artist', keep="first")
MoMaGender.to_csv('MoMaGender.csv')

In [206]:
museums=['MoMaGender.csv']
names = ['MoMa']
for museum in museums:
    with open(museum, mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        Gender = {'Male':0, 'Female':0}
        for item in reader:
                if 'Male' in item['Gender']:
                    Gender['Male'] += 1
                if 'Female' in item['Gender']:
                    Gender['Female'] += 1
    print(Gender)

{'Male': 10474, 'Female': 2809}


Most represented gender and nationality in general?

In [272]:
museums=['MoMaGender.csv']
names = ['MoMa']
for museum in museums:
    with open(museum, mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        nationalities = set() 
        for item in reader:
            if ' ' in item['Nationality']:
                    item['Nationality'] = item['Nationality'].split(' ')
                    for n in range(len(item['Nationality'])):
                        nationality= item['Nationality'][n]
                        nationalities.add(nationality)
            else:
                    nationalities.add(item['Nationality'])

print(nationalities)

{'', '(Indian)', '(Palestinian)', '(Senegalese)', '(Beninese)', '(Cambodian)', '(Chinese)', '(Cameroonian)', '(Swiss)', '(Czechoslovakian)', '(Bahamian)', '(Ghanaian)', '(Korean)', '(Iraqi)', '(Canadian)', '(Colombian)', '(Singaporean)', '(Ukrainian)', '(Turkish)', '(Luxembourger)', '(Polish)', '(Thai)', '(Nigerian)', '(Kuwaiti)', '(Lithuanian)', '(Australian)', '(Irish)', '(Chilean)', '(New', '(Austrian)', '(Albanian)', '(Ethiopian)', '(Nicaraguan)', '(Ugandan)', '(Malaysian)', '(Iranian)', '(Haitian)', '(Hungarian)', '(Azerbaijani)', '(Macedonian)', '(Zimbabwean)', '(Taiwanese)', '(Sahrawi)', '(South', '(Lebanese)', '(Afghan)', '(Filipino)', '(Latvian)', '(Ivorian)', 'Rican)', '(Estonian)', '(Serbian)', '(Namibian)', '(Kyrgyz)', '(Catalan)', '(Paraguayan)', '(Ecuadorian)', '(Moroccan)', '(Coptic)', '(Kenyan)', '(Canadian', '(Tunisian)', '(Portuguese)', '(Greek)', '(French)', 'African)', '(Venezuelan)', 'Leonean)', '(Mexican)', '(Malian)', '(Dutch)', '(Egyptian)', '(Bosnian)', '(Danis

In [253]:
museums=['MoMaGender.csv']
names = ['MoMa']
for museum in museums:
    with open(museum, mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        Nationalities ={}
        for nationality in nationalities:  
            Nationalities[nationality]= 0
        for item in reader:
            for naz in nationalities:
                    if naz in item['Nationality']:
                        Nationalities[naz] += 1 
    print(Nationalities)

{'': 13698, '(Croatian': 47, '(Indian)': 39, 'Croatian)': 47, 'Chilean)': 68, '(Palestinian)': 3, '(Lebanese': 10, 'Indian)': 39, '(Senegalese)': 2, 'Bosnian)': 9, '(Beninese)': 1, '(Chinese)': 106, 'Italian)': 517, '(Cameroonian)': 2, '(Swiss)': 421, '(Hungarian': 86, '(Czechoslovakian)': 3, 'Slovenian)': 19, 'Swedish)': 123, '(Chinese': 106, '(Palestinian': 3, 'Chilean': 68, '(Bahamian)': 1, '(Thai': 7, '(Finnish': 52, 'Lithuanian)': 5, 'Catalan': 1, '(Cambodian': 1, '(Ghanaian)': 4, '(German': 1163, '(Latvian': 13, '(Icelandic': 21, 'Romanian': 27, 'Australian': 59, '(Korean)': 35, 'Slovak)': 9, '(Norwegian': 35, '(Welsh': 4, '(Iraqi)': 2, '(Portuguese': 16, 'Polish)': 148, '(Canadian)': 202, '(Uruguayan': 24, '(Egyptian': 14, '(Colombian)': 57, 'Egyptian': 14, '(Singaporean)': 2, 'Spanish)': 185, 'Scottish)': 22, '(Czech': 98, '(British': 962, '(Venezuelan': 68, '(Ukrainian)': 40, 'Israeli': 77, '(Turkish)': 20, '(Macedonian)': 5, '(Haitian': 16, '(Luxembourger)': 3, '(Austrian': 2

In [209]:
counts = MoMa['Nationality'].value_counts()
counts.to_csv('nationalities.csv')

#Our analysis.
<br>
1. What are the most acquired artists in museums (in general)?
    - Is there a gender gap in the selection of artists?
    - What are the most represented nationalities (in general)?
    - What are the most represented movements or genres (in general)?
2. How have acquisition criteria changed (over time) in museums?
    - In which years are artists' works mostly acquired?
    - When does the gender gap decreases (if it does)?
    - In which years artists' nationalities more influent on the selection?
    - In which years artists' movements/genres more influent on the selection?
3. If we compare criteria of all museums, in general and over time, do we see any similarity or significant difference?
    - Do certain museums acquire more works based on artists/artists' gender/nationality/movement than others?

Acquisition criteria.
1.  In which years are artists' works mostly acquired?<br>
To answer, we need to count how many times each year shows up in the DateAcquired column.

In [None]:
MoMa['year'] = pd.DatetimeIndex(MoMa['DateAcquired']).year
MoMaNew['DateAcquired'] = MoMa['year']
MoMaNew = MoMaNew[MoMaNew['DateAcquired'].notna()]
MoMaNew.to_csv("MoMaNew.csv")

In [None]:
with open('MoMaNew.csv', mode='r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    years={}
    for item in reader:
        if item['DateAcquired']not in years:
            years[item['DateAcquired']]= 1
        else:
            years[item['DateAcquired']]+= 1

    print(years)
    #all_years=list(years.keys())
    #print(sorted(all_years))
    

Trova modo di ordinare per value o usa la libreria che sa Laura
Visualizzazione :)

In [None]:
new_dict={}
for key in years:
    key_int=key.split('.')[0]
    key_int=int(key_int)
    if key_int in range(1928,1941):
        if '1930s' not in new_dict.keys():
               new_dict['1930s']= years[key]
        else:
            new_dict['1930s'] += years[key]
    if key_int in range(1940,1951):
        if '1940s' not in new_dict.keys():
               new_dict['1940s']= years[key]
        else:
            new_dict['1940s'] += years[key]
    
    if key_int in range(1950,1961):
        if '1950s' not in new_dict.keys():
               new_dict['1950s']= years[key]
        else:
            new_dict['1950s'] += years[key]
    
    if key_int in range(1960,1971):
        if '1960s' not in new_dict.keys():
               new_dict['1960s']= years[key]
        else:
            new_dict['1960s'] += years[key]
    
    if key_int in range(1970,1981):
        if '1970s' not in new_dict.keys():
               new_dict['1970s']= years[key]
        else:
            new_dict['1970s'] += years[key]
    if key_int in range(1980,1991):
        if '1980s' not in new_dict.keys():
               new_dict['1980s']= years[key]
        else:
            new_dict['1980s'] += years[key]
    
    if key_int in range(1990,2001):
        if '1990s' not in new_dict.keys():
               new_dict['1990s']= years[key]
        else:
            new_dict['1990s'] += years[key]
        
    
print(new_dict)
    

When does the gender gap decreases (if it does)? <br>
Per ogni 10 anni, percentuale di uomini e donne acquisiti e differenza.

In [None]:
with open('MoMaNew.csv', mode='r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    years={}
    for item in reader:
        if item['DateAcquired']not in years:
            years[item['DateAcquired']]= 1
        else:
            years[item['DateAcquired']]+= 1

    print(years)

When does the gender gap decreases (if it does)?
Per ogni 10 anni, percentuale di uomini e donne acquisiti e differenza.

In which years artists' nationalities more influent on the selection?
Per ogni 10 anni, percentuale di nazionalità acquisite e differenza.

In which years artists'movements/genres more influent on the selection?
Per ogni 10 anni, percentuale di nazionalità acquisite e differenza.

Nga non ha Gender
Met molti Gender sono NaN
