# Analysis for the Museum of Modern Art

The Museum of Modern Art (MoMA) acquired its first artworks in 1929, the year it was established. Today, the Museum’s
evolving collection contains almost 200,000 works from around the world spanning the last 150 years. The collection
includes an ever-expanding range of visual expression, including painting, sculpture, printmaking, drawing, photography,
architecture, design, film, and media and performance art.
The data, uploaded 2/15/2017, includes the title, artist, date, and medium of every artwork in the MoMA collection.

In [1]:
import pandas as pd
import plotly.express as px

In [2]:
moma = pd.read_csv('cleaned_moma.csv', dtype='unicode')
moma.head()

Unnamed: 0,artwork_id,title,artist_id,name_x,date,acquisition_date,credit,classification,weight_kg,nationality,gender,birth_year,death_year,acquisition_year,age_at_acq,age_at_death
0,33599,Study For Head Bath,1,Robert Arneson,1977,1981-04-28,Gift Of The Friends Of Contemporary Drawing,Drawing,0,American,Male,1930,1992,1981,51,62
1,64139,General Nuke,1,Robert Arneson,1986,1997-05-28,Gift Of Landfall Press,Print,0,American,Male,1930,1992,1997,62,62
2,61629,Bas-Relief,2,Doroteo Arnaiz,0,1965-03-09,Gift Of The Artist,Print,0,Spanish,Male,1936,0,1965,29,0
3,45972,Honey Under Sink,3,Bill Arnold,1971,1972-03-07,Purchase,Photograph,0,American,Male,1941,0,1972,31,0
4,45997,Honey Under Chair,3,Bill Arnold,1971,1972-03-07,Purchase,Photograph,0,American,Male,1941,0,1972,31,0


In [3]:
moma['weight_kg'] = moma['weight_kg'].astype(float).round(0).astype(int)
moma['weight_kg'] = moma['weight_kg'].astype(float).round(0).astype(int)
moma['age_at_acq'] = moma['age_at_acq'].astype(str).astype(int)
moma['age_at_death'] = moma['age_at_death'].astype(str).astype(int)
moma['birth_year'] = moma['birth_year'].astype(str).astype(int)
moma['death_year'] = moma['death_year'].astype(str).astype(int)
moma['acquisition_year'] = moma['acquisition_year'].astype(str).astype(int)
moma.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 130261 entries, 0 to 130260
Data columns (total 16 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   artwork_id        130261 non-null  object
 1   title             130261 non-null  object
 2   artist_id         130261 non-null  object
 3   name_x            130261 non-null  object
 4   date              130261 non-null  object
 5   acquisition_date  124798 non-null  object
 6   credit            130261 non-null  object
 7   classification    130261 non-null  object
 8   weight_kg         130261 non-null  int32 
 9   nationality       130261 non-null  object
 10  gender            130261 non-null  object
 11  birth_year        130261 non-null  int32 
 12  death_year        130261 non-null  int32 
 13  acquisition_year  130261 non-null  int32 
 14  age_at_acq        130261 non-null  int32 
 15  age_at_death      130261 non-null  int32 
dtypes: int32(6), object(10)
memory usage: 

####  Let's think of interesting questions to answer with this data
- Average num of artworks per artist
- Top 10 artists by # of art pieces
- Nationality with highest frequency of art pieces
- Frequency of artwork by classification type
- Does the frequency of art per artist change based on the type of art?
- Number of artists by gender
- Largest artwork by weight
- Average age of living artists when artwork was acquired
- Does the average age change by type of art?
- Youngest artist to be featured in MoMA
- Number of pieces acquired by year
- Did the MoMa become more diverse over time, or was the number of artworks per artist consistent over time?
- How has the number of artworks per gender changed over the years?

In [4]:
# average num of artworks per artist
art_per_artist = moma['name_x'].value_counts()
avg_art_per_artist = len(moma) / len(art_per_artist)
template = "Featured artists have an average of {q:.1f} artworks in the Museum of Modern Art."
output = template.format(q=avg_art_per_artist)
print(output)

Featured artists have an average of 9.6 artworks in the Museum of Modern Art.


In [132]:
# let's make stuff pretty
def prettifybar(df,x_axis, y_axis, title):
    fig = px.bar(df,x=x_axis, y=y_axis, title=title, text=y_axis, width=700, height=400)
    fig.update_yaxes(showticklabels=False, visible=False)
    fig.update_layout(margin=dict(l=30, r=30, t=60, b=30),title_x=0.5,uniformtext_minsize=8, uniformtext_mode='hide', yaxis_title=None, xaxis_title=None)
    return fig.show()

def prettifybinned(df,x_axis, y_axis, title):
    fig = px.bar(df,x=x_axis, y=y_axis, title=title, labels= {'x':Binning},text=y_axis, width=700, height=400)
    fig.update_yaxes(showticklabels=False, visible=False)
    fig.update_layout(
    xaxis = dict(
        tickmode = 'linear',
        tick0 = 1925,
        dtick = 10))
    fig.update_layout(margin=dict(l=30, r=30, t=60, b=30),title_x=0.5,uniformtext_minsize=8, uniformtext_mode='hide', yaxis_title=None, xaxis_title=None)
    return fig.show()

def prettifyscatter(df,x_axis,y_axis,title,color):
    fig = px.scatter(df,x=x_axis, y=y_axis, title=title, color=color)
    fig.update_yaxes(showticklabels=False, visible=False)
    fig.update_layout(margin=dict(l=30, r=30, t=60, b=30),title_x=0.5,uniformtext_minsize=8, uniformtext_mode='hide', yaxis_title=None, xaxis_title=None)
    return fig.show()

In [6]:
# top 10 artists by # of art pieces
top10_artist = moma['name_x'].value_counts().rename_axis('artists').reset_index(name='counts')
top10_artist = top10_artist[0:12]
top10_artist = top10_artist.drop(index=3)
top10_artist = top10_artist.drop(index=4) # this drops the unknown photographer row
top10_artists = top10_artist.reset_index(drop=True)
top10_artists
prettifybar(top10_artists, 'artists', 'counts', 'Largest # of Artworks By Artist')

Unnamed: 0,artists,counts
0,Eugène Atget,5050
1,Louise Bourgeois,3318
2,Ludwig Mies Van Der Rohe,2566
3,Jean Dubuffet,1435
4,Lee Friedlander,1317
5,Pablo Picasso,1310
6,Marc Chagall,1162
7,Henri Matisse,1063
8,Pierre Bonnard,894
9,Lilly Reich,823


In [81]:
# nationality with highest frequency of art pieces
top_nations = moma['nationality'].value_counts().rename_axis('nationality').reset_index(name='counts')
top_nations = top_nations[0:11]
top_nations = top_nations.drop(index=2)
top_nations
prettifybar(top_nations, 'nationality', 'counts', 'Largest # of Artworks By Country')

In [88]:
# frequency of artwork by classification type
top_classifications = moma['classification'].value_counts().rename_axis('classifications').reset_index(name='counts')
top_classifications = top_classifications[0:10]
top_classifications
prettifybar(top_classifications, 'classifications', 'counts', 'Largest # of Artworks By Classification')

In [87]:
# does the frequency of art per artist change based on the type of art?
table = pd.pivot_table(moma, index=['classification'], values=['artist_id', 'artwork_id'], aggfunc=pd.Series.nunique)
table['artwork_per_artist'] = (table['artwork_id'] / table['artist_id']).round(1)
table = table.sort_values(by=['artwork_id'],ascending=False)
table = table.rename(columns={"artist_id": "num_of_artists", "artwork_id": "num_of_artworks"})
table = table.reset_index(drop=False)

# let's drop the classifications that aren't actually types of art
table = table.drop(index=5)
table = table.drop(index=12)
table = table.drop(index=13)
tbl = table.reset_index(drop=True)
tbl
prettifybar(tbl, 'classification', 'artwork_per_artist', 'Artwork Per Artist By Classification')

In [89]:
# num of artists by gender
gender = pd.pivot_table(moma, index=['gender'], values=['artist_id'], aggfunc=pd.Series.nunique)
gender = gender.reset_index(drop=False)
gender
prettifybar(gender, 'gender', 'artist_id', '# of Artwork Per Gender')

In [14]:
# largest artwork by weight
heavy = moma.sort_values(by=['weight_kg'], ascending=False)
heaviest = heavy.iloc[0]
heaviest

artwork_id                                               80990
title                                                   Switch
artist_id                                                 5349
name_x                                           Richard Serra
date                                                      1999
acquisition_date                                    2000-02-07
credit              Gift Of Emily Carroll And Thomas W. Weisel
classification                                       Sculpture
weight_kg                                               185068
nationality                                           American
gender                                                    Male
birth_year                                                1938
death_year                                                   0
acquisition_year                                          2000
age_at_acq                                                  62
age_at_death                                           

In [90]:
# average age of living artists when artwork was acquired
living = moma[moma['age_at_acq'] > 0]
avg_age_living = round(living['age_at_acq'].mean(),1)
template = "There are {q:,} artworks from known living artists with available birth years and acquisition years. The average age of the artist at acquisition year is {y:,} year's old."
output = template.format(y=avg_age_living, q=len(living['age_at_acq']))
print(output)

There are 110,174 artworks from known living artists with available birth years and acquisition years. The average age of the artist at acquisition year is 60.9 year's old.


In [92]:
# does the average age change by type of art?
age_class = living.groupby('classification')['age_at_acq'].mean().round(1).reset_index()
age_class = age_class.sort_values(by='age_at_acq', ascending=False)
age_class = age_class.drop(index=9)
age_class = age_class.drop(index=13)
age_class = age_class.drop(index=0)
age_class.head(25)
prettifybar(age_class, 'classification', 'age_at_acq', 'Avg Age at Acquisition by Classification')

In [17]:
# youngest artist to be featured in MoMA
young = living.sort_values(by='age_at_acq', ascending=True)
# looks like the youngest artists were 1 years old, but how many 1 year olds were featured?
youngest = moma[moma['age_at_acq'] == 1]
len(youngest)

87

In [140]:
# number of pieces acquired by year
yearly = pd.pivot_table(moma, index=['acquisition_year'], values=['artwork_id', 'artist_id'], aggfunc=pd.Series.nunique)
yearly = yearly.reset_index(drop=False)
yearly = yearly.sort_values(by='acquisition_year', ascending=True)
yearly = yearly.drop(index=0)
yearly.head(10)

## # let's bucket the years so we can better understand trends over time
bins = [1925, 1934, 1944, 1954, 1964, 1974, 1984, 1994, 2004, 2014, 2024]
labels =[1925, 1935, 1945, 1955, 1965, 1975, 1985, 1995, 2005, 2015]
# labels =['25-34', '35-44', '45-54', '55-64', '65-74', '75-84', '85-94', '95-04', '05-14', '15-24']
yearly2 = yearly.sort_values(by='acquisition_year', ascending=True)
yearly2['binned'] = pd.cut(yearly2['acquisition_year'], bins,labels=labels)
yearly3 = pd.pivot_table(yearly2, index=['binned'], values=['artwork_id', 'artist_id'], aggfunc=sum)
yearly3 = yearly3.reset_index(drop=False)
yearly3.head(10)

Unnamed: 0,binned,artist_id,artwork_id
0,1925,94,390
1,1935,1509,6269
2,1945,1760,5846
3,1955,2931,18488
4,1965,3533,22887
5,1975,3179,10462
6,1985,3311,10738
7,1995,3596,13688
8,2005,5790,30227
9,2015,1209,5803


In [141]:
prettifybinned(yearly3, 'binned', 'artwork_id', '# of Artworks Acquired By 10 Year Window')

In [142]:
# Did the MoMa become more diverse over time, or was the number of artworks per artist consistent over time?

# pull the number of artworks and artists
art_count = pd.pivot_table(moma, index=['acquisition_year'], values=['artwork_id', 'artist_id'], aggfunc=pd.Series.nunique)
art_count = art_count.reset_index(drop=False)
bins = [1925, 1934, 1944, 1954, 1964, 1974, 1984, 1994, 2004, 2014, 2024]
labels =['25-34', '35-44', '45-54', '55-64', '65-74', '75-84', '85-94', '95-04', '05-14', '15-24']
art_group = art_count.sort_values(by='acquisition_year', ascending=True)
art_group['binned'] = pd.cut(art_count['acquisition_year'], bins,labels=labels)
art_group = art_group.drop(index=0)
art_group.head(6)

year_group =  art_group.groupby('binned')['artist_id'].sum().rename_axis('year_groups').reset_index(name='# of artists')
bin_merge = pd.merge(yearly_binned, year_group, how='outer')
bin_merge['artwork_per_artist'] = (bin_merge['# of artworks'] / bin_merge['# of artists']).round(1)
bin_merge.head(10)

Unnamed: 0,year_groups,# of artworks,# of artists,artwork_per_artist
0,25-34,390,94,4.1
1,35-44,6269,1509,4.2
2,45-54,5846,1760,3.3
3,55-64,18488,2931,6.3
4,65-74,22887,3533,6.5
5,75-84,10462,3179,3.3
6,85-94,10738,3311,3.2
7,95-04,13688,3596,3.8
8,05-14,30227,5790,5.2
9,15-24,5803,1209,4.8


In [144]:
# how has the number of artworks per gender changed over the years?
gender_time = pd.pivot_table(moma, index=['acquisition_year', 'gender'], values=['artwork_id'], aggfunc=pd.Series.nunique)
gender_time = gender_time.reset_index(drop=False)
gender_time = gender_time.drop(index=0)
gender_time = gender_time.drop(index=1)
gender_time = gender_time.drop(index=2)
gender_time.head(15)


bins = [1925, 1934, 1944, 1954, 1964, 1974, 1984, 1994, 2004, 2014, 2024]
labels =[1925, 1935, 1945, 1955, 1965, 1975, 1985, 1995, 2005, 2015]
# labels =['25-34', '35-44', '45-54', '55-64', '65-74', '75-84', '85-94', '95-04', '05-14', '15-24']
gender_time['binned'] = pd.cut(gender_time['acquisition_year'], bins,labels=labels)
gender_time.head(15)

gender_time_grp = pd.pivot_table(gender_time, index=['binned', 'gender'], values=['artwork_id'], aggfunc=sum)
gender_time_grp = gender_time_grp.reset_index(drop=False)
gender_time_grp['binned'] = gender_time_grp['binned'].astype(str)
gender_time_grp.head(30)


Unnamed: 0,binned,gender,artwork_id
0,1925,Female,5
1,1925,Gender Unknown,41
2,1925,Male,344
3,1935,Female,435
4,1935,Gender Unknown,1070
5,1935,Male,4764
6,1945,Female,321
7,1945,Gender Unknown,297
8,1945,Male,5228
9,1955,Female,504


In [146]:
prettifyscatter(gender_time_grp,'binned', 'artwork_id', "# of ArtWorks by Gender Over 10 Year Periods", 'gender')