#NOME FIGO
Art and history of art are no sealed compartments: they are heavily inter-dependent with social, political, economic factors, which in turn influence our very perception of what art is.

Cultural institutions and museums in particular play a fundamental role in this intertwined dynamics: through their selection activity, they have the potential to shape the public understanding of arts and its modifications throughout time. 
In some way, what makes into museums makes into history of art.

From these considerations stems our analysis: how do external (social, political, economic) factors influence the perception of art and its history?
A way to investigate it is by looking at the greatest and most representative museums around the world, and at their acquisition policies and campaigns in particular.

Our key questions:
In which ways have the acquisition campaigns of the major museums in the world changed throughout the years? 


Our workflow:
1. Interrogate WikiData:
    - What are the biggest collections around the world?
2. Find csv files for some of the major museums.
3. Select some representative time slots (both internal and external factors).
4. Analyse acquisitions during these time slots for every museum and compare:
    a) Difference between different slots in the same museum;
    b) Difference between different museums for the same time slot;

Our questions:
- What was the initial nucleus of each museum? 
- Internal survey: Is there a significant date or decade for the acquisitions? 
- External survey: What are the acquisition trends around the Xs/between the x and the y? / What are the acquisition trends within and across these museums? 
- During these years, who are the most represented makers? What is the most represented gender? What is the most represented movement? What is the most represented nationality? 


We analysed 5/4 of the (MET, MoMa, N+, Cleveland?, Tate) 

Wikidata interrogation: failure.

1. What are the largest art collections?

SELECT ?museum (COUNT(?work) AS ?works) WHERE {
  ?work wdt:P195 ?museum.
  ?museum wdt:P31 wd:Q207694
  }
GROUP BY ?museum 
ORDER BY DESC(?works)

2.  Which were the most visited museums in 2018?

SELECT ?museumLabel ?visitors ?year
WHERE {
  ?museum wdt:P31 wd:Q207694;
          wdt:P1705 ?museumLabel;
          wdt:P1174 ?visitors;
          p:P1174/pq:P585 ?year .
FILTER(YEAR(?year) = 2018).
}

ORDER BY DESC(?visitors)

Since WikiData was not providing reliable results, we decided to go back to its sources (The Art Newspaper https://www.theartnewspaper.com/) and manually collect data about the most visited museums in the last four years(2018-2022).

https://onedrive.live.com/view.aspx?resid=E34DDE1A3F2F2160!138&ithint=file%2cxlsx&authkey=!AN4u-K4bko37iOU
    
We verified the availability of open datasets for each of the top 20 most visited museums on this GitHub repository (https://github.com/Ambrosiani/museums-on-github), containing a list of museums with GitHub accounts.

Our analysis led us to the decision to focus on four museums:
- Tate Modern, London
- MoMa, NY
- Met, NY
- National Gallery of Art, Washington DC

**Info generale sui musei.

In [1]:
import pandas as pd
import csv
import re

First: let us create some pandas dataframes containing all needed information: for each Museum, we will integrate different csv files, selecting the data we need for each of them. 

#MoMa 

In [2]:
spreadsheet = pd.read_csv('https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artworks.csv')
pd.set_option('display.max_columns', None)
artworks = spreadsheet[['Title', 'Artist', 'ConstituentID', 'Nationality', 'BeginDate', 'EndDate', 'Gender', 'Date', 'Medium', 'CreditLine', 'Classification', 'Department', 'DateAcquired', 'URL']]
artists = pd.read_csv('https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artists.csv')
artists["ConstituentID"] = artists["ConstituentID"].astype(str)
MoMa = pd.merge(artworks,artists[['ConstituentID', 'Wiki QID']],on='ConstituentID', how='left')
MoMa.rename(columns = {'ConstituentID':'Id', 'BeginDate':'BirthDate', 'EndDate':'DeathDate'}, inplace = True)
MoMa.Date = MoMa.Date.fillna('Not available')
MoMa['Date'] = MoMa['Date'].astype(str)
#MoMa.to_csv("MoMa.csv")

#Tate

In [3]:
spreadsheet = pd.read_csv('https://raw.githubusercontent.com/tategallery/collection/master/artwork_data.csv')
pd.set_option('display.max_columns', None)
artworks = spreadsheet[['artist', 'artistId', 'title', 'medium', 'creditLine', 'year', 'acquisitionYear', 'url']]
artworks.rename(columns = {'artistId':'id'}, inplace = True)
artworks.id = artworks.id.astype(str)
artists = pd.read_csv('https://raw.githubusercontent.com/tategallery/collection/master/artist_data.csv')
artists["id"] = artists["id"].astype(str)
Tate = pd.merge(artworks,artists[['id', 'gender', 'yearOfBirth', 'yearOfDeath']], on='id', how='left')
Tate.rename(columns = {'artist':'Artist', 'id':'Id', 'title':'Title', 'yearOfBirth':'BirthDate', 'yearOfDeath':'DeathDate', 'medium':'Medium', 'creditLine':'CreditLine', 'year':'Date', 'acquisitionYear':'DateAcquired', 'url':'URL', 'gender':'Gender'}, inplace = True)
Tate.to_csv("Tate.csv")

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


#Met

In [14]:
spreadsheet = pd.read_csv('https://media.githubusercontent.com/media/metmuseum/openaccess/master/MetObjects.csv')
pd.set_option('display.max_columns', None)
Met = spreadsheet[['AccessionYear', 'Title', 'Culture', 'Artist Display Name', 'Artist Nationality', 'Artist Begin Date', 'Artist End Date', 'Artist Gender', 'Artist Wikidata URL', 'Object End Date', 'Medium', 'Credit Line', 'Classification', 'Link Resource', 'Object Wikidata URL']]
Met.rename(columns = {'Artist Display Name':'Artist', 'id':'Id', 'Artist Begin Date':'BirthDate', 'Artist End Date':'DeathDate', 'Credit Line':'CreditLine', 'Object End Date':'Date', 'AccessionYear':'DateAcquired', 'Artist Wikidata URL':'Wiki QID', 'Artist Gender':'Gender', 'Link Resource':'URL', 'Artist Nationality':'Nationality'}, inplace = True)
#Met.to_csv("Met.csv")

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


#Nga

In [5]:
spreadsheet = pd.read_csv('https://raw.githubusercontent.com/NationalGalleryOfArt/opendata/main/data/objects.csv')
pd.set_option('display.max_columns', None)
Nga = spreadsheet[['accessionnum', 'title', 'endyear', 'medium', 'attribution', 'creditline', 'classification']]
Nga.rename(columns = {'attribution':'Artist', 'id':'Id', 'title':'Title', 'medium':'Medium', 'creditline':'CreditLine', 'endyear':'Date', 'accessionnum':'DateAcquired', 'classification':'Classification', 'Object End Date':'Date'}, inplace = True)
Nga.to_csv("Nga.csv")

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


#Exploring our Museums
<br>
Now that we have our dataframes, we can explore the four collections.
<br>
- How many items does each collection contain?
- Which timespan do items cover overall?
- First and last acquisition date for each museum. Tate's csv last update dates back to 2014.
- Total artists' number.
- Most represented artist, gender and nationality in general?

In [6]:
museums=[MoMa, Met, Tate, Nga]
names = ['Moma', 'Met', 'Tate', 'Nga']
for museum in museums:
    selected_rows = museum[~museum['Title'].isnull()]
    name = names.pop(0)
    print("Total artworks at", name, ":", len(selected_rows.index))

Total artworks at Moma : 139912
Total artworks at Met : 448619
Total artworks at Tate : 69201
Total artworks at Nga : 137923


#ARTWORKS DATES

#Clean MoMa's artworks' creation dates 

In [7]:
def cleanDatesMoma(date):
    if '-' in date:
        splitted = date.split('-')
        date = ' '.join(splitted) 
    if '/' in date:
        splitted = date.split('/')
        date = ' '.join(splitted) 
    if ',' in date:
        splitted = date.split(',')
        date = ' '.join(splitted) 
    if '.' in date:
        splitted = date.split('.')
        date = ' '.join(splitted) 
        
    x = re.search("\d{4}", date)
    if x:
        date = x.group()
    return date

In [8]:
MoMa["Date"] = MoMa["Date"].apply(cleanDatesMoma)

Artworks' timespan

In [9]:
museums=['MoMa.csv','Nga.csv', 'Tate.csv', 'metclean2.csv']
names = ['MoMa','Nga', 'Tate', 'Met']
for museum in museums:
    with open(museum, mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        years=[]
        for item in reader:
            if item['Date'] != ''and item['Date'] != '(n d )'and item['Date'] != 'TBD'and item['Date'] != 'nd'and item['Date']!='c  196?' and 'c.' not in item['Date'] and item['Date'] != 'no date' and item['Date'] != 'date of publicati' and item['Date'] != 'New York' and item['Date'] != 'Not available' and item['Date'] != 'Various' and item['Date'] != 'Various' and item['Date'] != 'unknown' and 'century' not in item['Date'] and item['Date'] != 'Unknown' and item['Date'] != 'n d ' and '(' not in item['Date'] and item['Date'] != 'n d ' and item['Date'] != 'n d' and item['Date'] != 'n  d ' and item['Date'] != 'Unkown' and item['Date'] != 'TBC':
                years.append(item['Date'])
        clean = []
        for el in years:
            if '.'in el:
                el = el.split('.')[0]
            clean.append(int(el))  
        clean.sort()
        name = names.pop(0)
    print("Most ancient artwork at", name, "dates back to", clean[0])
    print("Most recent artwork at", name, "dates back to", clean[-1])

Most ancient artwork at MoMa dates back to 1768
Most recent artwork at MoMa dates back to 2022
Most ancient artwork at Nga dates back to -490
Most recent artwork at Nga dates back to 2021
Most ancient artwork at Tate dates back to 1545
Most recent artwork at Tate dates back to 2012


FileNotFoundError: [Errno 2] No such file or directory: 'metclean2.csv'

#Artworks acquisition

Clean MoMa's artworks' acquisition dates

In [12]:
MoMa = MoMa[MoMa['DateAcquired'].notna()]
MoMa["DateAcquired"] = MoMa["DateAcquired"].apply(cleanDatesMoma)

Acquisition' timespan

In [13]:
museums=['MoMa.csv','Met.csv', 'Nga.csv', 'Tate.csv']
names = ['MoMa','Met', 'Nga', 'Tate']
for museum in museums:
    with open(museum, mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        acquisitionyears=[]
        for item in reader:
            if '.' in item['DateAcquired']:
                    item['DateAcquired']= item['DateAcquired'].split('.')[0]
            if item['DateAcquired'] != '' and item['DateAcquired'] != 'Object Number':
                acquisitionyears.append(item['DateAcquired'])
        acquisitionyears.sort()
        name = names.pop(0)
    print("First recorderd acquisition", name, "dates back to", acquisitionyears[0])
    print("Last recorded acquisition", name, "dates back to", acquisitionyears[-1])

First recorderd acquisition MoMa dates back to 1929-11-19
Last recorded acquisition MoMa dates back to 2022-09-20
First recorderd acquisition Met dates back to 1870
Last recorded acquisition Met dates back to 2022
First recorderd acquisition Nga dates back to 1937
Last recorded acquisition Nga dates back to 2022
First recorderd acquisition Tate dates back to 1823
Last recorded acquisition Tate dates back to 2013


Total artists' number.

In [14]:
def cleanArtistsTate(name):
    if ',' in name:
        splitted = name.split(',')
        name = ''.join(splitted) 
        return name

In [15]:
TateNew = Tate.copy(deep=True)
TateNew["Artist"] = TateNew["Artist"].apply(cleanArtistsTate)
TateNew.to_csv("TateNew.csv")

In [16]:
museums=['MoMa.csv', 'Met.csv', 'TateNew.csv', 'Nga.csv']
names = ['MoMa', 'Met','Tate', 'Nga']
for museum in museums:
    with open(museum, mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        artists = set() 
        for item in reader:
            if item['Artist']!= '' and 'Unidentified'not in item['Artist'] and 'Various' not in item['Artist']:
                if ',' in item['Artist']:
                    item['Artist'] = item['Artist'].split(',')
                    for n in range(len(item['Artist'])):
                        artist= item['Artist'][n]
                        artists.add(artist)
                elif '|' in item['Artist']:
                    artists_list = item['Artist'].split('|')
                    for n in range(len(artists_list)):
                        artista= artists_list[n]
                        artists.add(artista)
                else:
                    artists.add(item['Artist'])
    name = names.pop(0)
    print("Number of artists at", name, "is", len(artists) )

Number of artists at MoMa is 14738
Number of artists at Met is 60950
Number of artists at Tate is 3281
Number of artists at Nga is 16860


Most represented gender in general?
Only for MoMa and Tate since other csv files need Wikidata integration

In [5]:
MoMaGender = MoMa.drop_duplicates(subset='Artist', keep="first")
MoMaGender.to_csv('MoMaGender.csv')

In [19]:
TateGender = Tate.drop_duplicates(subset='Artist', keep="first")
TateGender.to_csv('TateGender.csv')

In [20]:
museums=['MoMaGender.csv', 'TateGender.csv']
names = ['MoMa', 'Tate']
for museum in museums:
    with open(museum, mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        Gender = {'Male':0, 'Female':0}
        for item in reader:
                if 'Male' in item['Gender']:
                    Gender['Male'] += 1
                if 'Female' in item['Gender']:
                    Gender['Female'] += 1
    print(Gender)

{'Male': 10474, 'Female': 2809}
{'Male': 2791, 'Female': 492}


Most represented nationality in general?

# Nationalities at MoMa

In [6]:
def cleanMoMaNationality(nationality):
    if ('(') or (')') in nationality:
        a = nationality.replace('(', '').replace(')', ',')
    return a.strip()
    

In [7]:
MoMaGender = MoMaGender[MoMaGender['Nationality'].notna()]
MoMaGender["Nationality"] = MoMaGender["Nationality"].apply(cleanMoMaNationality)
MoMaGender.to_csv('MoMaGender.csv')


In [8]:
museums=['MoMaGender.csv']
names = ['MoMa']
for museum in museums:
    with open(museum, mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        nationalities = set() 
        for item in reader:
            if ',' in item['Nationality']:
                    item['Nationality'] = item['Nationality'].split(',')
                    for n in range(len(item['Nationality'])):
                        nationality= item['Nationality'][n].strip()
                        nationalities.add(nationality)
            else:
                    nationalities.add(item['Nationality'])
        count_naz= set()
        for el in nationalities:
            if el != '' and el != 'Nationality unknown' and el != ',' and el != ' ':
                count_naz.add(el)
                
print(count_naz)
print(len(count_naz))


{'Swedish', 'Venezuelan', 'Swiss', 'Icelandic', 'Canadian', 'Japanese', 'Kyrgyz', 'Slovak', 'Portuguese', 'Ghanaian', 'Costa Rican', 'Scottish', 'New Zealander', 'Greek', 'Algerian', 'Iranian', 'Ecuadorian', 'Irish', 'American', 'Brazilian', 'Vietnamese', 'Cambodian', 'Moroccan', 'Taiwanese', 'Zimbabwean', 'Norwegian', 'Kuwaiti', 'Sudanese', 'Malian', 'Uruguayan', 'Bolivian', 'Canadian Inuit', 'Ivorian', 'Tanzanian', 'Bulgarian', 'Puerto Rican', 'Finnish', 'Cypriot', 'Bahamian', 'Colombian', 'Indian', 'Congolese', 'Hungarian', 'Senegalese', 'Singaporean', 'Beninese', 'Chilean', 'Mozambican', 'Argentine', 'Malaysian', 'Namibian', 'Salvadoran', 'Czechoslovakian', 'Ugandan', 'Tunisian', 'Macedonian', 'Dutch', 'Palestinian', 'British', 'Thai', 'Catalan', 'Danish', 'South African', 'Albanian', 'Bosnian', 'Paraguayan', 'Yugoslav', 'Bangladeshi', 'Cameroonian', 'Panamanian', 'Nicaraguan', 'Luxembourger', 'Peruvian', 'Pakistani', 'Israeli', 'Lithuanian', 'Kenyan', 'Croatian', 'Azerbaijani', 'A

In [18]:
museums=['MoMaGender.csv']
names = ['MoMa']
for museum in museums:
    with open(museum, mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        Nationalities ={}
        for nationality in count_naz:  
            Nationalities[nationality]= 0
        for item in reader:
            for naz in count_naz:
                    if naz in item['Nationality']:
                        Nationalities[naz] += 1 
    print(Nationalities)

{'Swedish': 123, 'Venezuelan': 68, 'Swiss': 421, 'Icelandic': 21, 'Canadian': 205, 'Japanese': 545, 'Kyrgyz': 1, 'Slovak': 9, 'Portuguese': 16, 'Ghanaian': 4, 'Costa Rican': 2, 'Scottish': 22, 'New Zealander': 10, 'Greek': 13, 'Algerian': 4, 'Iranian': 11, 'Ecuadorian': 4, 'Irish': 24, 'American': 5584, 'Brazilian': 185, 'Vietnamese': 3, 'Cambodian': 1, 'Moroccan': 8, 'Taiwanese': 4, 'Zimbabwean': 5, 'Norwegian': 35, 'Kuwaiti': 1, 'Sudanese': 2, 'Malian': 3, 'Uruguayan': 24, 'Bolivian': 3, 'Canadian Inuit': 3, 'Ivorian': 2, 'Tanzanian': 1, 'Bulgarian': 5, 'Puerto Rican': 6, 'Finnish': 52, 'Cypriot': 1, 'Bahamian': 1, 'Colombian': 57, 'Indian': 39, 'Congolese': 6, 'Hungarian': 86, 'Senegalese': 2, 'Singaporean': 2, 'Beninese': 1, 'Chilean': 68, 'Mozambican': 1, 'Argentine': 150, 'Malaysian': 2, 'Namibian': 2, 'Salvadoran': 2, 'Czechoslovakian': 3, 'Ugandan': 1, 'Tunisian': 2, 'Macedonian': 5, 'Dutch': 288, 'Palestinian': 3, 'British': 962, 'Thai': 7, 'Catalan': 1, 'Danish': 125, 'South 

# Nationalities at Met

In [37]:
def cleanNazMet(naz):
    if ',' in naz:
        naz = naz.split(',')[0]
    if '(' in naz:
        naz = naz.split('(')[0]
    if '?' in naz:
        naz = naz.replace('?', '')
    if '1866–1932' in naz:
        naz = naz.replace(' 1866–1932', '')
    if 'born' in naz:
        naz = naz.replace(' born', '')
    return naz   

In [38]:
MetNew = Met.copy(deep=True)
MetNew = MetNew[MetNew['Nationality'].notna()]
MetNew["Nationality"] = MetNew["Nationality"].apply(cleanNazMet)
MetNew.to_csv("MetNew.csv")

In [41]:
museums=['MetNew.csv']
names = ['Met']
for museum in museums:
    with open(museum, mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        nationalities = set() 
        for item in reader:
            if '|' in item['Nationality']:
                    item['Nationality'] = item['Nationality'].split('|')
                    for n in range(len(item['Nationality'])):
                        nationality= item['Nationality'][n].strip()
                        nationalities.add(nationality)
            else:
                nationalities.add(item['Nationality'].strip())
print(nationalities)

{'', 'Swedish', 'Venezuelan', 'Franco-Netherlandish', 'Fench', 'Swiss', 'Ukranian', 'French and English', 'Icelandic', 'Gubbio', 'Ancient Greek', 'Bohemian', 'American or British', 'Canadian', 'Spanish/Mexican', 'Siena', 'Japanese', 'Khitan', 'Lombard', 'Portuguese', 'Ghanaian', 'Yugoslavian', 'Austro-Hungarian', 'Nigeria', 'French Italian', 'Geman', 'Liechtensteiner', 'Costa Rican', 'French/Flemish', 'American and Canadian', 'Scottish or American', 'West Coast Africa', 'German American', 'Scottish', 'Italy', 'French/Dutch', 'New Zealander', 'Greek', 'Algerian', 'Tibetan', 'Iranian', 'Ecuadorian', 'Irish', 'Veronese', 'Austrian or German', 'American', 'Brazilian', 'Alsatian', 'American and French', 'Swiss or Austrian', 'British or American', 'British French', 'Upper Rhine', 'Northern France', 'Japanese American', 'French and Canadian', 'United States', 'North African', 'Danish-Icelandic', 'Democratic Republic of Congo', 'European', 'possibly Iranian', 'U.S.', 'Taiwanese', 'South Africa

In [42]:
with open('MetNew.csv', mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        MetNationalities ={}
        for nationality in nationalities:  
            MetNationalities[nationality]= 0
        for item in reader:
            for naz in nationalities:
                    if naz in item['Nationality']:
                        MetNationalities[naz] += 1 
print(MetNationalities)

{'': 275535, 'Swedish': 294, 'Venezuelan': 15, 'Franco-Netherlandish': 6, 'Fench': 9, 'Swiss': 1008, 'Ukranian': 6, 'French and English': 2, 'Icelandic': 4, 'Gubbio': 2, 'Ancient Greek': 2, 'Bohemian': 1299, 'American or British': 1, 'Canadian': 247, 'Spanish/Mexican': 1, 'Siena': 2, 'Japanese': 6526, 'Khitan': 1, 'Lombard': 1, 'Portuguese': 33, 'Ghanaian': 4, 'Yugoslavian': 4, 'Austro-Hungarian': 42, 'Nigeria': 22, 'French Italian': 6, 'Geman': 7, 'Liechtensteiner': 1, 'Costa Rican': 1, 'French/Flemish': 1, 'American and Canadian': 3, 'Scottish or American': 1, 'West Coast Africa': 1, 'German American': 12, 'Scottish': 416, 'Italy': 4, 'French/Dutch': 33, 'New Zealander': 5, 'Greek': 170, 'Algerian': 17, 'Tibetan': 2, 'Iranian': 42, 'Ecuadorian': 1, 'Irish': 468, 'Veronese': 1, 'Austrian or German': 5, 'American': 68664, 'Brazilian': 46, 'Alsatian': 1, 'American and French': 37, 'Swiss or Austrian': 1, 'British or American': 5, 'British French': 1, 'Upper Rhine': 1, 'Northern France':

# Nationalities at Nga

In [53]:
NgaNationalities = pd.read_csv('https://raw.githubusercontent.com/NationalGalleryOfArt/opendata/main/data/constituents.csv')
pd.set_option('display.max_columns', None)
NgaNationalities = NgaNationalities[['artistofngaobject', 'nationality']]
NgaNationalities = NgaNationalities[NgaNationalities['artistofngaobject']==1]
NgaNationalities = NgaNationalities[NgaNationalities['nationality'].notna()]
NgaNationalities = NgaNationalities[NgaNationalities['nationality'] != 'Unknown']

In [56]:
def cleanNazNga(naz):
    if ' (?)' in naz:
         naz = naz.replace(' (?)', '')
    return naz   

In [57]:
NgaNationalities["nationality"] = NgaNationalities["nationality"].apply(cleanNazNga)
NgaNationalities.to_csv("NgaNationalities.csv")

In [69]:
with open('NgaNationalities.csv', mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        nationalities = set() 
        for item in reader:
            nationality= item['nationality'].strip()
            nationalities.add(nationality)

print(nationalities)

{'Swedish', 'Venezuelan', 'German or Netherlandish', 'Swiss', 'Icelandic', 'Bavarian', 'Bohemian', 'Canadian', 'Japanese', 'Slovak', 'Portuguese', 'Yugoslavian', 'Scottish', 'Veneto-Islamic', 'New Zealander', 'Greek', 'Iranian', 'Irish', 'West Indian', 'Swiss or German', 'American', 'Brazilian', 'European', 'Moroccan', 'Taiwanese', 'Norwegian', 'South German', 'Etruscan', 'Malian', 'Uruguayan', 'Bulgarian', 'Finnish', 'Bahamian', 'New Zealand', 'Colombian', 'Indian', 'Franco-Flemish', 'italian', 'Hungarian', 'Hispano-Flemish', 'British-American', 'Chilean', 'Armenian', 'Nepalese', 'Roman', 'Mexican/American', 'Martiniquais', 'Genoese', 'Argentine', 'Scandinavian', '\u206eItalian', 'Dalmatian', 'Dutch', 'British', 'Danish', 'South African', 'Albanian', 'Bosnian', 'Nicaraguan', 'Peruvian', 'Latin', 'American and Mexican', 'Germany', 'Persian', 'Argentinean', 'Israeli', 'Israeli and Czech', 'Mosan', 'Netherlandish', 'Lithuanian', 'Croatian', 'Austrian', 'Cuban', 'English', 'Welsh', 'Itali

In [70]:
with open('NgaNationalities.csv', mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        NgaNationalities ={}
        for nationality in nationalities:  
            NgaNationalities[nationality]= 0
        for item in reader:
            for naz in nationalities:
                    if naz in item['nationality']:
                        NgaNationalities[naz] += 1 
print(NgaNationalities)

{'Swedish': 39, 'Venezuelan': 4, 'German or Netherlandish': 1, 'Swiss': 158, 'Icelandic': 1, 'Bavarian': 1, 'Bohemian': 11, 'Canadian': 48, 'Japanese': 104, 'Slovak': 1, 'Portuguese': 5, 'Yugoslavian': 14, 'Scottish': 55, 'Veneto-Islamic': 1, 'New Zealander': 2, 'Greek': 23, 'Iranian': 2, 'Irish': 29, 'West Indian': 1, 'Swiss or German': 1, 'American': 6810, 'Brazilian': 16, 'European': 7, 'Moroccan': 1, 'Taiwanese': 1, 'Norwegian': 9, 'South German': 1, 'Etruscan': 1, 'Malian': 1, 'Uruguayan': 2, 'Bulgarian': 3, 'Finnish': 4, 'Bahamian': 1, 'New Zealand': 3, 'Colombian': 2, 'Indian': 7, 'Franco-Flemish': 3, 'italian': 1, 'Hungarian': 27, 'Hispano-Flemish': 1, 'British-American': 1, 'Chilean': 6, 'Armenian': 1, 'Nepalese': 1, 'Roman': 15, 'Mexican/American': 1, 'Martiniquais': 1, 'Genoese': 1, 'Argentine': 22, 'Scandinavian': 2, '\u206eItalian': 2, 'Dalmatian': 1, 'Dutch': 472, 'British': 1145, 'Danish': 26, 'South African': 8, 'Albanian': 1, 'Bosnian': 1, 'Nicaraguan': 1, 'Peruvian': 

#Our analysis.
<br>
1. What are the most acquired artists in museums (in general)?
    - Is there a gender gap in the selection of artists?
    - What are the most represented nationalities (in general)?
    - What are the most represented movements or genres (in general)?
2. How have acquisition criteria changed (over time) in museums?
    - In which years are artists' works mostly acquired?
    - When does the gender gap decreases (if it does)?
    - In which years artists' nationalities more influent on the selection?
    - In which years artists' movements/genres more influent on the selection?
3. If we compare criteria of all museums, in general and over time, do we see any similarity or significant difference?
    - Do certain museums acquire more works based on artists/artists' gender/nationality/movement than others?

Acquisition criteria.
1.  In which years are artists' works mostly acquired?<br>
To answer, we need to count how many times each year shows up in the DateAcquired column.

In [None]:
MoMa['year'] = pd.DatetimeIndex(MoMa['DateAcquired']).year
MoMaNew['DateAcquired'] = MoMa['year']
MoMaNew = MoMaNew[MoMaNew['DateAcquired'].notna()]
MoMaNew.to_csv("MoMaNew.csv")

In [None]:
with open('MoMaNew.csv', mode='r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    years={}
    for item in reader:
        if item['DateAcquired']not in years:
            years[item['DateAcquired']]= 1
        else:
            years[item['DateAcquired']]+= 1

    print(years)
    #all_years=list(years.keys())
    #print(sorted(all_years))
    

Trova modo di ordinare per value o usa la libreria che sa Laura
Visualizzazione :)

In [None]:
new_dict={}
for key in years:
    key_int=key.split('.')[0]
    key_int=int(key_int)
    if key_int in range(1928,1941):
        if '1930s' not in new_dict.keys():
               new_dict['1930s']= years[key]
        else:
            new_dict['1930s'] += years[key]
    if key_int in range(1940,1951):
        if '1940s' not in new_dict.keys():
               new_dict['1940s']= years[key]
        else:
            new_dict['1940s'] += years[key]
    
    if key_int in range(1950,1961):
        if '1950s' not in new_dict.keys():
               new_dict['1950s']= years[key]
        else:
            new_dict['1950s'] += years[key]
    
    if key_int in range(1960,1971):
        if '1960s' not in new_dict.keys():
               new_dict['1960s']= years[key]
        else:
            new_dict['1960s'] += years[key]
    
    if key_int in range(1970,1981):
        if '1970s' not in new_dict.keys():
               new_dict['1970s']= years[key]
        else:
            new_dict['1970s'] += years[key]
    if key_int in range(1980,1991):
        if '1980s' not in new_dict.keys():
               new_dict['1980s']= years[key]
        else:
            new_dict['1980s'] += years[key]
    
    if key_int in range(1990,2001):
        if '1990s' not in new_dict.keys():
               new_dict['1990s']= years[key]
        else:
            new_dict['1990s'] += years[key]
        
    
print(new_dict)
    

When does the gender gap decreases (if it does)? <br>
Per ogni 10 anni, percentuale di uomini e donne acquisiti e differenza.

In [None]:
with open('MoMaNew.csv', mode='r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    years={}
    for item in reader:
        if item['DateAcquired']not in years:
            years[item['DateAcquired']]= 1
        else:
            years[item['DateAcquired']]+= 1

    print(years)

When does the gender gap decreases (if it does)?
Per ogni 10 anni, percentuale di uomini e donne acquisiti e differenza.

In which years artists' nationalities more influent on the selection?
Per ogni 10 anni, percentuale di nazionalità acquisite e differenza.

In which years artists'movements/genres more influent on the selection?
Per ogni 10 anni, percentuale di nazionalità acquisite e differenza.

Nga non ha Gender
Met molti Gender sono NaN
