# CSV Export

This notebook is for visualizing the CSV exports from source files to Django.

In [1]:
import pandas as pd

In [2]:
cols = [
    'ExhibitionID',
    'ExhibitionNumber',
    'ExhibitionTitle',
    'ConstituentURL', 
    'FirstName',
    'MiddleName',
    'LastName',
    'Suffix',
    'ExhibitionURL',
    'ExhibitionRole',
    'DisplayName',
]

exh = pd.read_csv(
    '~/data1/moma/exhibitions/MoMAExhibitions1929to1989.csv', 
    usecols=cols,
    dtype={
        'ExhibitionID': 'Int64',
    },
    converters={
        'ExhibitionTitle': str,
        'FirstName': str,
        'LastName': str,
        'MiddleName': str,
        'Suffix': str,
    },
    encoding="iso8859-1",
)

## The complete list of columns

['ExhibitionID',
 'ExhibitionNumber',
 'ExhibitionTitle',
 'ExhibitionCitationDate',
 'ExhibitionBeginDate',
 'ExhibitionEndDate',
 'ExhibitionSortOrder',
 'ExhibitionURL',
 'ExhibitionRole',
 'ExhibitionRoleinPressRelease',
 'ConstituentID',
 'ConstituentType',
 'DisplayName',
 'AlphaSort',
 'FirstName',
 'MiddleName',
 'LastName',
 'Suffix',
 'Institution',
 'Nationality',
 'ConstituentBeginDate',
 'ConstituentEndDate',
 'ArtistBio',
 'Gender',
 'VIAFID',
 'WikidataID',
 'ULANID',
 'ConstituentURL']
 

## Filter for artists

The CSV contains one role for each artist in a given exhibition. So let's filter for only artists.

In [3]:
artists = exh.loc[exh['ExhibitionRole'] == 'Artist']
artists = exh.loc[exh['ExhibitionTitle'] != "No#"]

In [10]:
artists

Unnamed: 0,ExhibitionID,ExhibitionNumber,ExhibitionTitle,ExhibitionURL,ExhibitionRole,DisplayName,FirstName,MiddleName,LastName,Suffix,ConstituentURL
0,2557,1,"Cézanne, Gauguin, Seurat, Van Gogh",moma.org/calendar/exhibitions/1767,Curator,"Alfred H. Barr, Jr.",Alfred,H.,Barr,Jr.,moma.org/artists/9168
1,2557,1,"Cézanne, Gauguin, Seurat, Van Gogh",moma.org/calendar/exhibitions/1767,Artist,Paul Cézanne,Paul,,Cézanne,,moma.org/artists/1053
2,2557,1,"Cézanne, Gauguin, Seurat, Van Gogh",moma.org/calendar/exhibitions/1767,Artist,Paul Gauguin,Paul,,Gauguin,,moma.org/artists/2098
3,2557,1,"Cézanne, Gauguin, Seurat, Van Gogh",moma.org/calendar/exhibitions/1767,Artist,Vincent van Gogh,Vincent,,van Gogh,,moma.org/artists/2206
4,2557,1,"Cézanne, Gauguin, Seurat, Van Gogh",moma.org/calendar/exhibitions/1767,Artist,Georges-Pierre Seurat,Georges-Pierre,,Seurat,,moma.org/artists/5358
5,2724,2,Paintings by 19 Living Americans,moma.org/calendar/exhibitions/1912,Artist,Charles Burchfield,Charles,,Burchfield,,moma.org/artists/870
6,2724,2,Paintings by 19 Living Americans,moma.org/calendar/exhibitions/1912,Artist,Charles Demuth,Charles,,Demuth,,moma.org/artists/1490
7,2724,2,Paintings by 19 Living Americans,moma.org/calendar/exhibitions/1912,Artist,Preston Dickinson,Preston,,Dickinson,,moma.org/artists/1537
8,2724,2,Paintings by 19 Living Americans,moma.org/calendar/exhibitions/1912,Artist,Lyonel Feininger,Lyonel,,Feininger,,moma.org/artists/1832
9,2724,2,Paintings by 19 Living Americans,moma.org/calendar/exhibitions/1912,Artist,"George Overbury (""Pop"") Hart",George,"Overbury (""Pop"")",Hart,,moma.org/artists/2519


## Add a column for the Gensim token

Since the Gensim tokenizer trimmed trailing 'e's and otherwise altered artist names, it would be cool to have a column for the token, so that Django can translate those names when interacting with the model.

In [4]:
from gensim.parsing.preprocessing import preprocess_string

In [5]:
def format_name(names):
    "Join a name with an underscore"
    # process all the names at once.
    return ["".join(preprocess_string(n)) for n in names]

In [9]:
artists_tokenized = artists.assign(
    token=lambda x: format_name(x.DisplayName)
)

TypeError: decoding to str: need a bytes-like object, float found

In [8]:
artists_tokenized

NameError: name 'artists_tokenized' is not defined

## Export time!

All right, now we have a mapping to the Gensim model's token. 

TODO:

This is good, but there are two things that would make it better. This script exports _all_ the exhibitions, but only some of them are included in the model.

There is a weird exhibition that was removed.

```python
exh_numbers.remove("No#")
```

And exhibitions above a threshold were not included.

```python
for en in exh_numbers:
        terms = Moma.exhibition_artists(en)
        # Don't calculate and output big lists
        if len(terms) <= 50:
            Moma.append_to_outfile(terms)
```


In [None]:
artists_tokenized.to_csv(
    index=False,
    path_or_buf='./artists_tokenized.csv',
)