# CSV Export

This notebook is for visualizing the CSV exports from source files to Django.

In [18]:
import pandas as pd

In [85]:
cols = [
    'ExhibitionID',
    'ExhibitionNumber',
    'ExhibitionTitle',
    'ConstituentURL', 
    'FirstName',
    'MiddleName',
    'LastName',
    'Suffix',
    'ExhibitionURL',
    'ExhibitionRole',
    'DisplayName',
]

exh = pd.read_csv(
    '~/data1/moma/exhibitions/MoMAExhibitions1929to1989.csv', 
    usecols=cols,
    dtype={
        'ExhibitionID': 'Int64',
    },
    converters={
        'ExhibitionTitle': str,
        'FirstName': str,
        'LastName': str,
        'MiddleName': str,
        'Suffix': str,
    },
    encoding="iso8859-1",
)

## The complete list of columns

['ExhibitionID',
 'ExhibitionNumber',
 'ExhibitionTitle',
 'ExhibitionCitationDate',
 'ExhibitionBeginDate',
 'ExhibitionEndDate',
 'ExhibitionSortOrder',
 'ExhibitionURL',
 'ExhibitionRole',
 'ExhibitionRoleinPressRelease',
 'ConstituentID',
 'ConstituentType',
 'DisplayName',
 'AlphaSort',
 'FirstName',
 'MiddleName',
 'LastName',
 'Suffix',
 'Institution',
 'Nationality',
 'ConstituentBeginDate',
 'ConstituentEndDate',
 'ArtistBio',
 'Gender',
 'VIAFID',
 'WikidataID',
 'ULANID',
 'ConstituentURL']
 

## Filter for artists

The CSV contains one role for each artist in a given exhibition. So let's filter for only artists.

In [116]:
artists = exh.loc[exh['ExhibitionRole'] == 'Artist']

## Add a column for the Gensim token

Since the Gensim tokenizer trimmed trailing 'e's and otherwise altered artist names, it would be cool to have a column for the token, so that Django can translate those names when interacting with the model.

In [117]:
from gensim.parsing.preprocessing import preprocess_string

In [134]:
def format_name(names):
    "Join a name with an underscore"
    # process all the names at once.
    return ["".join(preprocess_string(n)) for n in names]

In [132]:
artists_tokenized = artists.assign(
    token=lambda x: format_name(x.DisplayName)
)

In [133]:
artists_tokenized

Unnamed: 0,ExhibitionID,ExhibitionNumber,ExhibitionTitle,ExhibitionURL,ExhibitionRole,DisplayName,FirstName,MiddleName,LastName,Suffix,ConstituentURL,token
1,2557,1,"Cézanne, Gauguin, Seurat, Van Gogh",moma.org/calendar/exhibitions/1767,Artist,Paul Cézanne,Paul,,Cézanne,,moma.org/artists/1053,paulcézann
2,2557,1,"Cézanne, Gauguin, Seurat, Van Gogh",moma.org/calendar/exhibitions/1767,Artist,Paul Gauguin,Paul,,Gauguin,,moma.org/artists/2098,paulgauguin
3,2557,1,"Cézanne, Gauguin, Seurat, Van Gogh",moma.org/calendar/exhibitions/1767,Artist,Vincent van Gogh,Vincent,,van Gogh,,moma.org/artists/2206,vincentvangogh
4,2557,1,"Cézanne, Gauguin, Seurat, Van Gogh",moma.org/calendar/exhibitions/1767,Artist,Georges-Pierre Seurat,Georges-Pierre,,Seurat,,moma.org/artists/5358,georgpierrseurat
5,2724,2,Paintings by 19 Living Americans,moma.org/calendar/exhibitions/1912,Artist,Charles Burchfield,Charles,,Burchfield,,moma.org/artists/870,charlburchfield
6,2724,2,Paintings by 19 Living Americans,moma.org/calendar/exhibitions/1912,Artist,Charles Demuth,Charles,,Demuth,,moma.org/artists/1490,charldemuth
7,2724,2,Paintings by 19 Living Americans,moma.org/calendar/exhibitions/1912,Artist,Preston Dickinson,Preston,,Dickinson,,moma.org/artists/1537,prestondickinson
8,2724,2,Paintings by 19 Living Americans,moma.org/calendar/exhibitions/1912,Artist,Lyonel Feininger,Lyonel,,Feininger,,moma.org/artists/1832,lyonelfeining
9,2724,2,Paintings by 19 Living Americans,moma.org/calendar/exhibitions/1912,Artist,"George Overbury (""Pop"") Hart",George,"Overbury (""Pop"")",Hart,,moma.org/artists/2519,georgoverburipophart
10,2724,2,Paintings by 19 Living Americans,moma.org/calendar/exhibitions/1912,Artist,Edward Hopper,Edward,,Hopper,,moma.org/artists/2726,edwardhopper


## Export time!

All right, now we have a mapping to the Gensim model's token. 

In [136]:
artists_tokenized.to_csv(
    index=False,
    path_or_buf='./artists_tokenized.csv',
)