## Data Exploration and Description Exercise

The Tate is the name given to the National Gallery of British art and modern art in England. A central space for the conservation and exhibition of works of art in the world. Its collection includes a representative sample of well-known artists from the nineteenth century to the present day; however, in light of the new paradigms of representation, questions arise about the inclusion of women artists or artists with non-hegemonic geographical origins in its collection. Considering this scenario, the research question of this exercise is: Can it be said that the canon materialized by the Tate Museum's collection is eminently masculine and Western or is there a significant inclusion of women artists or people from non-hegemonic countries? For this purpose, this paper will conduct an exploratory analysis of a metadata file named artist_data.csv. 

In [54]:
df = pd.read_csv('artist_data.csv',sep=",")
df.head()

Unnamed: 0,id,name,gender,dates,yearOfBirth,yearOfDeath,placeOfBirth,placeOfDeath,url
0,10093,"Abakanowicz, Magdalena",Female,born 1930,1930.0,,Polska,,http://www.tate.org.uk/art/artists/magdalena-a...
1,0,"Abbey, Edwin Austin",Male,1852–1911,1852.0,1911.0,"Philadelphia, United States","London, United Kingdom",http://www.tate.org.uk/art/artists/edwin-austi...
2,2756,"Abbott, Berenice",Female,1898–1991,1898.0,1991.0,"Springfield, United States","Monson, United States",http://www.tate.org.uk/art/artists/berenice-ab...
3,1,"Abbott, Lemuel Francis",Male,1760–1803,1760.0,1803.0,"Leicestershire, United Kingdom","London, United Kingdom",http://www.tate.org.uk/art/artists/lemuel-fran...
4,622,"Abrahams, Ivor",Male,born 1935,1935.0,,"Wigan, United Kingdom",,http://www.tate.org.uk/art/artists/ivor-abraha...


In [55]:
df.head(10)

Unnamed: 0,id,name,gender,dates,yearOfBirth,yearOfDeath,placeOfBirth,placeOfDeath,url
0,10093,"Abakanowicz, Magdalena",Female,born 1930,1930.0,,Polska,,http://www.tate.org.uk/art/artists/magdalena-a...
1,0,"Abbey, Edwin Austin",Male,1852–1911,1852.0,1911.0,"Philadelphia, United States","London, United Kingdom",http://www.tate.org.uk/art/artists/edwin-austi...
2,2756,"Abbott, Berenice",Female,1898–1991,1898.0,1991.0,"Springfield, United States","Monson, United States",http://www.tate.org.uk/art/artists/berenice-ab...
3,1,"Abbott, Lemuel Francis",Male,1760–1803,1760.0,1803.0,"Leicestershire, United Kingdom","London, United Kingdom",http://www.tate.org.uk/art/artists/lemuel-fran...
4,622,"Abrahams, Ivor",Male,born 1935,1935.0,,"Wigan, United Kingdom",,http://www.tate.org.uk/art/artists/ivor-abraha...
5,2606,Absalon,Male,1964–1993,1964.0,1993.0,"Tel Aviv-Yafo, Yisra'el","Paris, France",http://www.tate.org.uk/art/artists/absalon-2606
6,9550,"Abts, Tomma",Female,born 1967,1967.0,,"Kiel, Deutschland",,http://www.tate.org.uk/art/artists/tomma-abts-...
7,623,"Acconci, Vito",Male,born 1940,1940.0,,"New York, United States",,http://www.tate.org.uk/art/artists/vito-acconc...
8,624,"Ackling, Roger",Male,1947–2014,1947.0,2014.0,"Isleworth, United Kingdom",,http://www.tate.org.uk/art/artists/roger-ackli...
9,625,"Ackroyd, Norman",Male,born 1938,1938.0,,"Leeds, United Kingdom",,http://www.tate.org.uk/art/artists/norman-ackr...


In [66]:
#What kind of information does the file contain?
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3532 entries, 0 to 3531
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   id            3532 non-null   int64  
 1   name          3532 non-null   object 
 2   gender        3416 non-null   object 
 3   dates         3470 non-null   object 
 4   yearOfBirth   3472 non-null   float64
 5   yearOfDeath   2228 non-null   float64
 6   placeOfBirth  3040 non-null   object 
 7   placeOfDeath  1453 non-null   object 
 8   url           3532 non-null   object 
dtypes: float64(2), int64(1), object(6)
memory usage: 248.5+ KB


In [67]:
#count how many artists in the collection are women and how many men

df['gender'].value_counts()

Male      2895
Female     521
Name: gender, dtype: int64

In [59]:
df['gender'].value_counts(normalize=True)

Male      0.847482
Female    0.152518
Name: gender, dtype: float64

##Considering the above results it can be inferred, first: the metadata file provides very useful information in relation to the research question, such as: gender, date of birth and death, and place of birth and death of the artists. Second, it can be seen that the Tate collection has a very unequal proportion of male and female artists. Only 15.25% of the artists are women, while 84.74% are men. 

In [68]:
#What is the average year of birth and death of male artists?

df.loc[df['gender'] == 'Male', :].mean()

  df.loc[df['gender'] == 'Male', :].mean()


id             2812.151295
yearOfBirth    1880.688370
yearOfDeath    1917.645310
dtype: float64

In [69]:
#What is the average year of birth and death of female artists?

df.loc[df['gender'] == 'Female', :].mean()

  df.loc[df['gender'] == 'Female', :].mean()


id             5016.917466
yearOfBirth    1925.695312
yearOfDeath    1962.678571
dtype: float64

In [70]:
#What is the average year of birth of male artists?

df.loc[df['gender'] == 'Male', 'yearOfBirth'].mean()

1880.6883704735376

In [71]:
#What is the average year of birth of female artists?

df.loc[df['gender'] == 'Female', 'yearOfBirth'].mean()

1925.6953125

##In relation to the above analysis, a number of very interesting observations can be made. The data show that most of the male artists in the collection were born at the end of the 19th century, while the women were born at the beginning of the 20th century, which may suggest that the professionalization of the artist's craft among women began to consolidate during the first half of the 20th century, as well as their inclusion in museums and exhibitions. 

In [35]:
#df.loc[df['gender'] == 'Female', 'placeOfBirth'].mean()

In [38]:
#determine the birthplaces of the artists in the collection

df['placeOfBirth'].value_counts()

London, United Kingdom       446
Paris, France                 57
Edinburgh, United Kingdom     47
New York, United States       43
Glasgow, United Kingdom       35
                            ... 
Gdansk, Polska                 1
Zundert, Nederland             1
Randfontein, South Africa      1
Tula, Rossiya                  1
España                         1
Name: placeOfBirth, Length: 1263, dtype: int64

In [40]:
df['placeOfBirth'].value_counts(normalize=True)

London, United Kingdom       0.146711
Paris, France                0.018750
Edinburgh, United Kingdom    0.015461
New York, United States      0.014145
Glasgow, United Kingdom      0.011513
                               ...   
Gdansk, Polska               0.000329
Zundert, Nederland           0.000329
Randfontein, South Africa    0.000329
Tula, Rossiya                0.000329
España                       0.000329
Name: placeOfBirth, Length: 1263, dtype: float64

In [75]:
#What is the average year of birth of female artists who were born in the most prominent birthplace in the collection?

df.loc[(df['gender'] == 'Female') & (df['placeOfBirth'] == 'London, United Kingdom'),
        'yearOfBirth'].mean()

1914.7936507936508

In [76]:
#What is the average year of birth of male artists who were born in the most prominent birthplace in the collection?

df.loc[(df['gender'] == 'Male') & (df['placeOfBirth'] == 'London, United Kingdom'),
        'yearOfBirth'].mean()

1859.84554973822

In [80]:
# What is the average year of birth for female artists who were born in a city that is not the most frequent place of birth in the collection? 
    
df.loc[(df['gender'] == 'Female') & (df['placeOfBirth'] == 'Glasgow, United Kingdom'),
        'yearOfBirth'].mean()

1946.2

In [81]:
# What is the average year of birth for male artists who were born in a city that is not the most frequent place of birth in the collection? 

df.loc[(df['gender'] == 'Male') & (df['placeOfBirth'] == 'Glasgow, United Kingdom'),
        'yearOfBirth'].mean()

1908.4

##Based on the above analysis, several observations emerge, the first of which is that most of the artists in the museum's collection come from Western and hegemonic countries such as the United Kingdom, the United States or France. While countries such as Spain, Russia, South Africa or Poland have a minimal proportion. Nor is there any presence of African, Asian or Latin American artists in the data analyzed. On the other hand, it appears that the years of birth of artists coming from hegemonic countries are much earlier than those of artists coming from non-hegemonic countries, which could suggest that the inclusion in the canon of these artists occurred much later than the entry of artists from hegemonic countries. 

In [73]:
#Count the year of birth of the artists in the collection

df['yearOfBirth'].value_counts()   

1936.0    49
1930.0    48
1967.0    45
1928.0    45
1938.0    41
          ..
1718.0     1
1497.0     1
1680.0     1
1707.0     1
1652.0     1
Name: yearOfBirth, Length: 351, dtype: int64

In [77]:
df['yearOfBirth'].value_counts(normalize=True)

1936.0    0.014113
1930.0    0.013825
1967.0    0.012961
1928.0    0.012961
1938.0    0.011809
            ...   
1718.0    0.000288
1497.0    0.000288
1680.0    0.000288
1707.0    0.000288
1652.0    0.000288
Name: yearOfBirth, Length: 351, dtype: float64

In [83]:
#What is the average year of birth and death of the artists who were born in the most prominent birthplace in the collection? 

df.loc[df['placeOfBirth'] == 'London, United Kingdom', :].mean()

  df.loc[df['placeOfBirth'] == 'London, United Kingdom', :].mean()


id             1930.161435
yearOfBirth    1867.624719
yearOfDeath    1911.439169
dtype: float64

In [84]:
#How many female and male artists were born in each year of birth recorded in the database?

How many female and male artists were born in each year of birth recorded in the database?pd.crosstab(df['gender'], df['yearOfBirth'])

yearOfBirth,1497.0,1531.0,1540.0,1547.0,1551.0,1560.0,1561.0,1572.0,1577.0,1580.0,...,1975.0,1976.0,1977.0,1978.0,1979.0,1980.0,1981.0,1982.0,1988.0,2004.0
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Female,0,0,0,0,0,0,0,0,0,0,...,5,1,8,2,2,2,1,0,0,0
Male,1,1,2,1,1,1,1,1,2,1,...,10,3,10,4,4,0,0,3,1,1


In [85]:
#How many female and male artists were born in each place of birth recorded in the database?

pd.crosstab(df['gender'], df['placeOfBirth'])

placeOfBirth,"Aachen, Deutschland","Abbotsford, Canada","Aberdare, United Kingdom","Aberdeen, United Kingdom","Aberdeen, United States","Abergavenny, United Kingdom","Abinger, United Kingdom","Accrington, United Kingdom","Acton, United Kingdom","Addlestone, United Kingdom",...,"Zundert, Nederland","Zürich, Schweiz","s Gravenhage, Nederland","Ålesund, Norge",Éire,Îran,Österreich,"Šid, Jugoslavija","Škofja Loka, Slovenija","‘Afula, Yisra'el"
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Female,0,0,0,1,0,0,0,0,0,0,...,0,1,0,0,0,2,2,0,0,1
Male,1,1,1,5,2,1,1,1,1,1,...,1,5,3,1,3,0,1,1,1,0


##The overall analysis provided by the metadata suggests that the proportion of female artists is minimal in relation to the male artists in the Tate Museum's collection. On the other hand, it can be observed that most of the artists in the collection come from hegemonic countries and were born at the end of the 19th century.