# MoMA-project
This is a capstone project for the Code:You Data Analytics course. This project will analyze MoMA (Museum of Modern Art) data to explore the gender diversity of their artists, and to uncover whether or not their artist gender diversity has improved over time. In addition, to compare MoMA artists' data with prominent art history textbook data. 

## Project Goal
 **To determine artist gender diversity at MoMA.**
 
 The end result will show the proportions of male, female, nonbinary, and transgender artists represented in MoMA's collection from 1929 to 2024, using a time series analysis. In addition, the result will show the proportions of artists' gender diversity prior to and after the museum's 2019 renovation/expansion. Finally, MoMA's artist data will be joined with the data from two popular art history textbooks: *Janson's History of Art* and *Gardner's Art Through the Ages*, to compare artists and gender diversity proportions. 
 
 Discovering gender inequalities in the art world, art museums, and in art history textbooks will hopefully inspire institutions, patrons, and authors to be more inclusive. 

 In this data discovery notebook, I will:
 1. Research gender diversity in the present day artist population.
 2. Research gender diversity in US art museums.
 3. Research MoMA's mission statement and changes made to MoMA's collection after their 2019 renovation and expansion. 
 4. Research popular art history textbooks.
 5. Determine data needed to support.
 6. Review available data for MoMA and the art history textbooks. 
 7. Identify the type of analysis that can be done and list questions that can be answered.
 8. List the cleaning steps that will be needed. 
 

## 1. Present Day Artist Population
Per the National Endowment for the Arts, 48% of the artist in the U.S. are female. However, female artists earned $0.80 for every dollar earned by male artists. 

[National Endowment for the Arts report](https://www.arts.gov/sites/default/files/Artists-in-the-Workforce-Selected-Demographic-Characteristics-Prior-to-COVID%E2%80%9019.pdf)

## 2. Gender Diversity in U.S. Art Museums
Topaz et al. (2019) conducted a large-scale study of artist diversity in U.S. art museums. By analyzing 10,000 artist records from 18 major U.S. museums, they discovered that 87% of artists in U.S. art museums are male and 85% are white.

[Topaz et al. (2019) Diversity of artists in major U.S. museums. *PLoS ONE* 14(3): e0212852](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0212852)

## 3. MoMA's Mission Statement and 2019 Renovation and Expansion
[MoMA's mission statement](https://www.moma.org/about/mission-statement/) states, "The Museum of Modern Art connects people from around the world to the art of our time. We aspire to be a catalyst for experimentation, learning, and creativity, a gathering place for all, and a home for artists and their ideas."

In 2019, the museum underwent a large expansion, adding 40,000 square feet and costing $400 million. One of the goals of this expansion project was to "reflect a more inclusive, diverse vision of modern art" (Kenney 2019).

[Kenney (2019) New York's MoMA to close for four months for renovation. *The Art Newspaper*](https://www.theartnewspaper.com/2019/02/05/new-yorks-moma-to-close-for-four-months-for-renovation)

## 4. Gender Diversity in Art History Textbooks
Women have long been underrepresented in art history textbooks. Two of the most popular art history textbooks used in U.S. universities is [*Janson's History of Art*](https://campusstore.miamioh.edu/jansons-history-art-portable-edition-book/bk/9780205161102) and [*Gardner's Art Through the Ages*](https://www.artsmart.com/top-art-history-books-to-know-and-read/#:~:text=Sold%20in%20two%20volumes%2C%20Art,the%20western%20ideals%20of%20art). In the latest edition of *Janson's History of Art*, of the 509 artists included, only 36 are women (Love 2021).

[Love (2021) Gender and Ethnic Diversity in Traditional Art History Textbooks. *Sartle*](https://www.sartle.com/blog/post/gender-and-ethnic-diversity-in-traditional-art-history-textbooks)


## 5. Data Needed to Support
**MoMA Data**
- Artists' names in MoMA's collection from the museum's inception to the present.
- Title of the artwork
- Gender of the artists
- Date when the artwork was acquired by the museum. 

**Art History Textbooks**
- Artists' names in *Janson's History of Art* and *Gardner's Art Through the Ages*
- Gender of the artists




## 6. Data Discovery - Kaggle MoMA and Art History Textbooks
The next step is to see what data is available in the Kaggle data set. 

### MoMA Art Data

Source:
- MoMA data from [Kaggle](https://www.kaggle.com/datasets/ugowda/the-museum-of-modern-art-moma-collection)

This data set contains a lists of artwork in MoMA's collection. To start off the analysis we will load and preview the data:

In [2]:
import pandas as pd

moma_art = pd.read_csv("../data/raw/artworks.csv")
moma_art                       

Unnamed: 0,Title,Artist,ConstituentID,ArtistBio,Nationality,BeginDate,EndDate,Gender,Date,Medium,...,OnView,Circumference (cm),Depth (cm),Diameter (cm),Height (cm),Length (cm),Weight (kg),Width (cm),Seat Height (cm),Duration (sec.)
0,"Ferdinandsbrücke Project, Vienna, Austria (Ele...",Otto Wagner,6210,"(Austrian, 1841–1918)",(Austrian),(1841),(1918),(male),1896,Ink and cut-and-pasted painted pages on paper,...,,,,,48.600000,,,168.900000,,
1,"City of Music, National Superior Conservatory ...",Christian de Portzamparc,7470,"(French, born 1944)",(French),(1944),(0),(male),1987,Paint and colored pencil on print,...,,,,,40.640100,,,29.845100,,
2,"Villa project, outside Vienna, Austria (Elevat...",Emil Hoppe,7605,"(Austrian, 1876–1957)",(Austrian),(1876),(1957),(male),1903,"Graphite, pen, color pencil, ink, and gouache ...",...,,,,,34.300000,,,31.800000,,
3,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,"(French and Swiss, born Switzerland 1944)",(),(1944),(0),(male),1980,Photographic reproduction with colored synthet...,...,,,,,50.800000,,,50.800000,,
4,"Villa project, outside Vienna, Austria (Exteri...",Emil Hoppe,7605,"(Austrian, 1876–1957)",(Austrian),(1876),(1957),(male),1903,"Graphite, color pencil, ink, and gouache on tr...",...,,,,,38.400000,,,19.100000,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
155911,In-text plate from The Alphabet of Creation,Ben Shahn,5366,"(American, born Lithuania. 1898–1969)",(American),(1898),(1969),(male),1954,Page from an illustrated book with forty-three...,...,,,,,27.000000,,,17.000000,,
155912,In-text plate from The Alphabet of Creation,Ben Shahn,5366,"(American, born Lithuania. 1898–1969)",(American),(1898),(1969),(male),1954,Page from an illustrated book with forty-three...,...,,,,,27.000000,,,17.000000,,
155913,In-text plate from The Alphabet of Creation,Ben Shahn,5366,"(American, born Lithuania. 1898–1969)",(American),(1898),(1969),(male),1954,Page from an illustrated book with forty-three...,...,,,,,27.000000,,,17.000000,,
155914,"German Pavilion, International Exposition, Bru...",Ludwig Mies van der Rohe,7166,"(American, born Germany. 1886–1969)",(American),(1886),(1969),(male),1934,Pencil and colored pencil on tracing paper,...,,,,,36.195072,,,108.585217,,


In [10]:
moma_art.shape

(155916, 30)

There are 155,916 pieces of artwork included in the data. Each piece of artwork has 30 attributes.

Next we will look at the list of attributes available.

In [11]:
moma_art.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 155916 entries, 0 to 155915
Data columns (total 30 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   Title               155868 non-null  object 
 1   Artist              154656 non-null  object 
 2   ConstituentID       154656 non-null  object 
 3   ArtistBio           149832 non-null  object 
 4   Nationality         154656 non-null  object 
 5   BeginDate           154656 non-null  object 
 6   EndDate             154656 non-null  object 
 7   Gender              154656 non-null  object 
 8   Date                153893 non-null  object 
 9   Medium              146646 non-null  object 
 10  Dimensions          147235 non-null  object 
 11  CreditLine          154393 non-null  object 
 12  AccessionNumber     155916 non-null  object 
 13  Classification      155916 non-null  object 
 14  Department          155916 non-null  object 
 15  DateAcquired        149134 non-nul

In [12]:
moma_art.Artist.describe()

count                       154656
unique                       14014
top       Ludwig Mies van der Rohe
freq                         14375
Name: Artist, dtype: object

There are 154,656 artist names listed. As previously noted, there are 155,916 pieces of art listed. Therefore, there must be 1,260 pieces of art listed without the artist's name identified. As noted above, there are 14,014 unique artist names. Therefore, we know that there can be several different pieces of art listed for each artist. 

In [13]:
moma_art.Title.describe()

count       155868
unique      107275
top       Untitled
freq          8421
Name: Title, dtype: object

There are 155,868 art titles listed. As previously noted, there are 155,916 pieces of art listed. Therefore, there must be 48 pieces of art listed without the title identified. As noted above, there are 107,275 unique title names. Therefore, we know that several of the pieces of art have the same title. 

In [14]:
moma_art.Gender.describe()

count     154656
unique       424
top       (male)
freq      119533
Name: Gender, dtype: object

There are 154,656 artists that have a gender listed. As previously noted, there are 155,916 pieces of art listed. Therefore, there must be 1,260 pieces of art listed without the artist's gender identified. As noted above, there are 424 unique entries for gender. We wil explore this next.

In [15]:
moma_art.Gender.unique()

array(['(male)', '(male) (male)', '(male) (female)',
       '(male) (male) (male)', '(female) (male) ()', '(female)',
       '(male) (female) (male) (female)', '() (male) (male)',
       '(male) (male) () (female)', '() (male) (male) (female)',
       '(male) (male) ()', '(male) (male) (female) (female)',
       '() (male) (female)', '(male) (female) (female) (male)',
       '(male) () (male) (female) (male) (male) (male) (male) (female)',
       '() (male)', '(male) ()', '()',
       '() (male) (male) (male) (male) (male) (male)', '(male) () ()',
       '() (male) (male) (male) (male) (male)', '(female) (male)',
       '() (male) (male) (male) (male) (male) (male) (male)',
       '() (male) (male) (male) (female) (male)',
       '(male) (male) (male) (female) (male) (male) (male) (male) (male) (male)',
       '(male) (male) (male) (male) (female) (male)',
       '(male) (female) (male)', '(male) (male) () ()',
       '(male) (male) (male) (male) (male) (male)',
       '(male) () (male

Here we can see that "(male)" and "(female)" are repeated for the same art piece. I would assume that this means multiple artists worked on the same art piece. This is something that will need to be resolved in the cleaning process. There are also empty parentheses that will need to be removed. 

In [16]:
moma_art.DateAcquired.describe()

count         149134
unique          1767
top       1968-04-11
freq           11690
Name: DateAcquired, dtype: object

There are 149,134 pieces of art with an acquisition date listed. As previously noted, there are 155,916 pieces of art listed. Therefore, there must be 6,782 pieces of art listed without an acquisition date listed. In addition, the data type for the "date acquired" field is an object type. I will have to convert this to a datetime data type in order to do a time series analysis. 

### Art History Textbook Data

Source:
- Art History Textbook data from [Kaggle](https://www.kaggle.com/datasets/joebeachcapital/art-history?select=artists.csv)

This data set contains a list of artists that are included in two popular art history textbooks: *Janson's History of Art* and *Gardner's Art Through the Ages*. To start off the analysis we will load and preview the data:

In [17]:
import pandas as pd

textbook_data = pd.read_csv("../data/raw/art_history.csv")
textbook_data

Unnamed: 0,artist_name,edition_number,year,artist_nationality,artist_nationality_other,artist_gender,artist_race,artist_ethnicity,book,space_ratio_per_page_total,artist_unique_id,moma_count_to_year,whitney_count_to_year,artist_race_nwi
0,Aaron Douglas,9.0,1991,American,American,Male,Black or African American,Not Hispanic or Latino origin,Gardner,0.353366,2,0,0,Non-White
1,Aaron Douglas,10.0,1996,American,American,Male,Black or African American,Not Hispanic or Latino origin,Gardner,0.373947,2,0,0,Non-White
2,Aaron Douglas,11.0,2001,American,American,Male,Black or African American,Not Hispanic or Latino origin,Gardner,0.303259,2,0,0,Non-White
3,Aaron Douglas,12.0,2005,American,American,Male,Black or African American,Not Hispanic or Latino origin,Gardner,0.377049,2,0,0,Non-White
4,Aaron Douglas,13.0,2009,American,American,Male,Black or African American,Not Hispanic or Latino origin,Gardner,0.398410,2,0,0,Non-White
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3157,Winslow Homer,4.0,1991,American,American,Male,White,Not Hispanic or Latino origin,Janson,0.377853,407,1,0,White
3158,Winslow Homer,5.0,1995,American,American,Male,White,Not Hispanic or Latino origin,Janson,0.335776,407,1,0,White
3159,Winslow Homer,6.0,2001,American,American,Male,White,Not Hispanic or Latino origin,Janson,0.324369,407,1,0,White
3160,Winslow Homer,7.0,2007,American,American,Male,White,Not Hispanic or Latino origin,Janson,0.323356,407,1,0,White


In [18]:
textbook_data.shape

(3162, 14)

There are 3,162 artists included in the art history textbook data. Each artist entry has 14 attributes.

Next we will look at the list of attributes available.

In [19]:
textbook_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3162 entries, 0 to 3161
Data columns (total 14 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   artist_name                 3162 non-null   object 
 1   edition_number              3162 non-null   float64
 2   year                        3162 non-null   int64  
 3   artist_nationality          3139 non-null   object 
 4   artist_nationality_other    3162 non-null   object 
 5   artist_gender               3104 non-null   object 
 6   artist_race                 3133 non-null   object 
 7   artist_ethnicity            3104 non-null   object 
 8   book                        3162 non-null   object 
 9   space_ratio_per_page_total  3162 non-null   float64
 10  artist_unique_id            3162 non-null   int64  
 11  moma_count_to_year          3162 non-null   int64  
 12  whitney_count_to_year       3162 non-null   int64  
 13  artist_race_nwi             3162 

In [20]:
textbook_data.artist_name.describe()

count               3162
unique               413
top       Auguste Renoir
freq                  25
Name: artist_name, dtype: object

We know that there are 3,162 records in this data set, and there are 3,162 artists' names listed. Therefore, we can assume that there are no missing names in this data set. There are 413 unique artists' names listed. Therefore, we can assume that 2,749 artists are listed in both of the textbooks. 

In [21]:
textbook_data.artist_gender.describe()

count     3104
unique       2
top       Male
freq      2762
Name: artist_gender, dtype: object

There are 3,162 artists listed in this data set. However, the gender is identified for 3,104 artists. Therefore, we know that the gender is unknown or missing for 58 artists. 

In [22]:
textbook_data.artist_gender.unique()

array(['Male', 'Female', nan], dtype=object)

In the data set, the 58 "nan" genders are listed for artists with names listed as "N/A". Considering that we need the artists' name and gender for this analysis, these records will not be included in our analysis. 

## 7. Questions to be answered
The purpose of this project is to analyze MoMA data to determine artist gender diversity from the museum's inception to the present. In addition, to compare the MoMA diversity data with the art history textbook data. 
- What percentage of artists represented in MoMA's collection are male, female, nonbinary, and transgender?
- Has there been an improvement in gender diversity overtime?
- Was there an increase in gender diversity after the museum's 2019 renovation and expansion?
- What percentage of artists in the art history textbooks are male, female, nonbinary, and transgender?
- How many artists represented in MoMA's collection are also listed in the art history textbooks?

## 8. Cleaning Needed
### MoMA Data Cleaning Needed
**Columns to Rename**
- Title to title
- Artist to artist
- Gender to gender
- DateAcquired to date_acquired

**Fields to Keep and Clean**
- artist
- title
- gender
- date_acquired

**Data Types**
- Change the title, artist, and gender data types from objects to strings.
- Change the date_acquired data type from object to datetime.

**Filter Rows**
- Remove rows for artists with missing names and genders.

**Art History Textbook Data Fields to Keep and Clean**
- artist_name
- artist_gender



**Art History Textbook Data Columns**
- artist_name to artist
- artist_gender to gender