In [2]:
import pandas as pd
import numpy as np
from pathlib import Path 

csv = Path('~/data1/moma/exhibitions/MoMAExhibitions1929to1989.csv', encoding='utf-8')
exd = pd.read_csv(csv)

In [57]:
exd.columns.tolist()

['ExhibitionID',
 'ExhibitionNumber',
 'ExhibitionTitle',
 'ExhibitionCitationDate',
 'ExhibitionBeginDate',
 'ExhibitionEndDate',
 'ExhibitionSortOrder',
 'ExhibitionURL',
 'ExhibitionRole',
 'ExhibitionRoleinPressRelease',
 'ConstituentID',
 'ConstituentType',
 'DisplayName',
 'AlphaSort',
 'FirstName',
 'MiddleName',
 'LastName',
 'Suffix',
 'Institution',
 'Nationality',
 'ConstituentBeginDate',
 'ConstituentEndDate',
 'ArtistBio',
 'Gender',
 'VIAFID',
 'WikidataID',
 'ULANID',
 'ConstituentURL']

## John Cage 

Let's take a look at one artist and try to get an idea for the surrounding data given the term, John Cage.

In [3]:
cage = exd.loc[exd['DisplayName'] == 'John Cage']

Cage was in 7 shows.

In [59]:
len(cage)

7

In [60]:
cage[['ExhibitionNumber', 'ExhibitionID', 'ExhibitionTitle']]

Unnamed: 0,ExhibitionNumber,ExhibitionID,ExhibitionTitle
25073,1054b,4114.0,Works on Paper
26020,1117,2964.0,Drawing Now: 1955�1975
26679,1157,443.0,"Prints: Acquisitions, 1973�1976"
26860,1163,3822.0,Projects: Buckminster Fuller and John Cage
27302,1186,3758.0,American Drawn and Matched
28485,1252a,10555.0,The Stage Show
33216,1468,911.0,For 25 Years: Crown Point Press


What other artists were in these shows?

In [61]:
wp = exd.loc[exd['ExhibitionID'] == 4114]
wp['DisplayName'].tolist()

['Martha Beck',
 'James Bishop',
 'John Cage',
 'John Edward Dowell',
 'Steve Gianakos',
 'Michael Goldberg',
 'Robert Grosvenor',
 'Michael Heizer',
 'Hans Hollein',
 'Robert Israel',
 'Jack Krueger',
 'Walter Pichler',
 'Alan Saret',
 'Michelle Stuart',
 'Richard Tuttle',
 'Michael Venezia']

Let's get an array of all the artists Cage is associated with.

In [47]:
artists = []
for i in cage['ExhibitionID']:
    show = exd.loc[exd['ExhibitionID'] == i]
    [artists.append(s) for s in show['DisplayName'].tolist()]

artists.sort(); artists

['Agnes Martin',
 'Al Held',
 'Alan Saret',
 'Alan Saret',
 'Alberto Giacometti',
 'Alberto Magnelli',
 'Alex Katz',
 'Alex Katz',
 'Alex Katz',
 'Alexander Calder',
 'Andr� Beaudin',
 'Andr� Derain',
 'Andr� Dunoyer de Segonzac',
 'Andr� Masson',
 'Andy Warhol',
 'Andy Warhol',
 'Andy Warhol',
 'Antonio Frasconi',
 'Arnaldo Pomodoro',
 'Art & Language',
 'Art Lending Service, The Museum of Modern Art, New York',
 'Audrey Flack',
 'Ay-O',
 'Bea Maddock',
 'Ben Schonzeit',
 'Ben Shahn',
 'Ben Vautier',
 'Benny Andrews',
 'Bernice Rose',
 'Bernice Rose',
 'Blinky Palermo',
 'Blinky Palermo',
 'Brice Marden',
 'Brice Marden',
 'Brice Marden',
 'Bridget Riley',
 'Bruce Nauman',
 'Bruce Nauman',
 'Camille Bryen',
 'Carel Visser',
 'Carl Andre',
 'Carl Andre',
 'Charles Hinman',
 'Chris Burden',
 'Christo (Christo Javacheff)',
 'Christopher Knowles',
 'Chuck Close',
 'Chuck Close',
 'Chuck Close',
 'Claes Oldenburg',
 'Claes Oldenburg',
 'Claire (Claire Mahl) Moore',
 'Cletus Johnson',
 'Cy 

In [48]:
a = pd.DataFrame(artists)

What is the frequency which these artists occur in this list?

In [49]:
cage_count = {}

for artist in artists:
    cage_count.setdefault(artist, 0)
    cage_count[artist] += 1

cage_count_df = pd.DataFrame({'count': cage_count})
cage_count_df.loc[cage_count_df['count'] >= 4].sort_values('count', ascending=False)

Unnamed: 0,count
John Cage,7
Robert Morris,4
Sol LeWitt,4
William T. Wiley,4


According to this I would expect these three artists to have the highest similarity in the trained model.

In [7]:
import word2vec

In [8]:
model = word2vec.load('../word2vec/word2vec_output.bin')

In [24]:
i, m = model.similar('john_cage', n=20)
model.generate_response(i, m).tolist()

[('art_&_language', 0.6979109085805235),
 ('fred_sandback', 0.652470928871357),
 ('mark_di_suvero', 0.6413962847506356),
 ('larry_poons', 0.6350292764581003),
 ('dan_flavin', 0.6283743217440056),
 ('mel_bochner', 0.6206653480549449),
 ('�yvind_fahlstr�m', 0.6113966768420281),
 ('chuck_close', 0.6089421353703656),
 ('james_lee_byars', 0.6063705641385664),
 ('lawrence_weiner', 0.6000357451473296),
 ('richard_tuttle', 0.5913245035524064),
 ('michael_heizer', 0.5860156692353393),
 ('piero_manzoni', 0.5723869873349714),
 ('kazuko', 0.5672932496972538),
 ('hanne_darboven', 0.5622355329555835),
 ('dorothea_rockburne', 0.5551089291719715),
 ('bruce_nauman', 0.5207539405980035),
 ('william_t._wiley', 0.5190067307141962),
 ('panamarenko', 0.5155142134717646),
 ('leon_polk_smith', 0.515099089559946)]

But actually, none of these come up in the top ten most similar. 

Let's double-check that.

In [158]:
model.distance('john_cage', 'robert_morris')

[('john_cage', 'robert_morris', 0.41157096481483973)]

In [161]:
model.distance('john_cage', 'sol_lewitt')

[('john_cage', 'sol_lewitt', 0.31788696646092496)]

In [162]:
model.distance('john_cage', 'william_t._wiley')

[('john_cage', 'william_t._wiley', 0.5190067307141963)]

So, the cosine similarity isn't the same as appearing in the same show.

John Cage and Art & Language is a really good association. How many exhibitions do they co-occur in? Only one!

In [9]:
cage_count_df.loc['Art & Language']

count    1
Name: Art & Language, dtype: int64

Amazingly the artist Kazuko does not share any exhibition with Cage ...

In [45]:
artists.index('kazuko')

ValueError: 'kazuko' is not in list

... but, falls before Wiley in the list of 20 most similar artists.

In [33]:
model.distance('john_cage', 'kazuko')

[('john_cage', 'kazuko', 0.5672932496972538)]

### Cage in Conclusion

To summarize:

1. Just because two artists appear in multiple shows together doesn't mean that they will have close cosine similarities. 
2. Artists can be similar without co-occuring in an exhibition.
3. Two artists can be most simliar and only co-occur once.

This leads me to believe that indirect relationships count for more than direct relationships.  (Which intuitively makes sense because there are more indirect relationships than direct ones.)

## Looking at Context

* What would a list of the total exhibition names look like? Since the dataset excludes exhbitions > 25, this could be significant. I would expect the exhibition names to give a clue to how artists are clustered. The association of Cage and A&L is a good one. (How does that hold up with other similarities?) Cage and A&L are both conceptual. A group show about Americans has less context than one on conceptual art. 
* Taking Cage, Kazuko and Art & Language, I wonder how similar their individual cohorts are. What artists co-occur in those lists? What artists don't?
* This data is limited, but does it suggest that a category such as "conceptual art" exists through the artists' associations?