### Exploring RILM Data

RILM (https://www.rilm.org/) is the most important source of information about writings on music, in all languages.  It's widely used by all music scholars.

The RILM team is especially interested in having some help exploring the scholarship they index--what changes are taking place in various sub-fields?  

Possible terms of interest:

- women’s studies
- Jewish studies
- therapy
- psychology
- activism
- ecology
- sustainability
- migration
- gender


Of course we could also think of particular genres or traditions:

- K-pop
- techno

This Notebook will help you query the RILM database for responses, then sort, slice, group, and analyze the results.

Histograms, Barcharts, and especially Networks would help us understand how fields are changing.


### Load Code

In [2]:
import os
# from decouple import AutoConfig # Install python-decouple
import requests # Install requests
import pandas as pd
import plotly as plt



import pyvis
from pyvis import network as net
from pyvis.network import Network
import networkx as nx

from copy import deepcopy

from community import community_louvain



from itertools import tee
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return tuple(zip(a, b))

### Add Token as Hidden .env File in your Jupyter Hub

In [4]:


# config = AutoConfig("env") # Create a file called .env file in the same directory.
#                             # This is just a text file that contains the BEARER TOKEN so that we don't 
#                             # Have to include it in the code.
#                             # It will have one line like the following (exclude the angle brackets):
#                             # BEARER_TOKEN=<MY_BEARER_TOKEN>
                
BASE = "https://api-ibis.rilm.org/200/haverford/"

BEARER_TOKEN=''

URLS = {
    "year": BASE + "rilm_index_RYs",
    "terms": BASE + "rilm_index_top_terms",
    "index": BASE + "rilm_index"
}

HEADERS = {
    "Authorization": f"Bearer {BEARER_TOKEN}"
}

# Example queries

https://api-ibis.rilm.org/200/haverford/rilm_index_RYs?termName=activism

https://api-ibis.rilm.org/200/haverford/rilm_index_top_terms?termName=activism

https://api-ibis.rilm.org/200/haverford/rilm_index?termName=activism

https://api-ibis.rilm.org/200/haverford/rilm_index?termName=activism&includeAuthors=true

Possible terms of interest:

- women’s studies
- Jewish studies
- therapy
- psychology
- activism
- ecology
- sustainability
- gender
- migration

In [5]:
## here is where we define the search term and author status

# "termName" is the search term
# 'includeAuthors': True will return author names in the data

params = {
    "termName": "Beethoven, Ludwig van",
    "includeAuthors": True
}

# and get the response
response = requests.get(
    URLS["index"], 
    headers=HEADERS, 
    params=params
)
# response.url  

In [6]:
# get the data

data = response.json()
results = pd.DataFrame(data)
results = results.fillna('')
# combines year and accession number to make unique id for each item
results['full_acc'] = results.ry.apply(str) + "-"  + results.ac.apply(str)
results.rename(columns = {'ry': 'year', 'ac': 'item', 'ent' : 'entry', 'lvl': 'level', 'name': 'term', 'cat': 'category', 'full_acc': 'full_id'}, inplace=True)

# how many items
print(len(results))

293204


In [7]:
results

Unnamed: 0,year,item,entry,level,id,term,category,author,pubCC,langItem,langTransFrom,full_id
0,1845,2,1,1,220229,"Beethoven, Ludwig van",N,"Breidenstein, Heinrich Carl",Germany,German,,1845-2
1,1845,2,1,2,254692,Festschriften,M,"Breidenstein, Heinrich Carl",Germany,German,,1845-2
2,1845,2,1,3,1842548,monument inauguration,,"Breidenstein, Heinrich Carl",Germany,German,,1845-2
3,1845,2,1,4,240858,1845,,"Breidenstein, Heinrich Carl",Germany,German,,1845-2
4,1845,3,1,1,220229,"Beethoven, Ludwig van",N,"Breidenstein, Heinrich",Germany,German,,1845-3
...,...,...,...,...,...,...,...,...,...,...,...,...
293199,2023,2303,4,5,268657,late period,,"Noorduin, Marten",United Kingdom,English,,2023-2303
293200,2023,2303,5,1,191504,reception,T,"Noorduin, Marten",United Kingdom,English,,2023-2303
293201,2023,2303,5,2,220229,"Beethoven, Ludwig van",N,"Noorduin, Marten",United Kingdom,English,,2023-2303
293202,2023,2303,5,3,219073,adagio movts.,,"Noorduin, Marten",United Kingdom,English,,2023-2303


In [8]:
concepts = results[results['category'] == "T"]
concepts = concepts[results['year'] > 1900]
concepts

  concepts = concepts[results['year'] > 1900]


Unnamed: 0,year,item,entry,level,id,term,category,author,pubCC,langItem,langTransFrom,full_id
57668,1901,132,3,1,180939,aesthetics,T,"Ferrarelli, Giuseppe",Italy,Italian,,1901-132
57674,1902,47,2,1,112788,symphony,T,"Livonius, Dr.",Germany,German,,1902-47
57676,1902,47,3,1,53413,orchestral music,T,"Livonius, Dr.",Germany,German,,1902-47
57686,1902,140,3,1,149773,sonata,T,"Ernest, Gustav",United Kingdom,English,,1902-140
57690,1902,140,4,1,192399,form,T,"Ernest, Gustav",United Kingdom,English,,1902-140
...,...,...,...,...,...,...,...,...,...,...,...,...
293187,2023,2303,2,1,257776,tempo,T,"Noorduin, Marten",United Kingdom,English,,2023-2303
293191,2023,2303,3,1,252019,performance practice--by composer,T,"Noorduin, Marten",United Kingdom,English,,2023-2303
293195,2023,2303,4,1,285535,"performance practice, historical--by topic",T,"Noorduin, Marten",United Kingdom,English,,2023-2303
293196,2023,2303,4,2,257776,tempo,T,"Noorduin, Marten",United Kingdom,English,,2023-2303


In [9]:
lvb_top_terms = concepts["term"].value_counts().to_frame().head(50).index.to_list()

In [10]:
places.groupby(['full_id'])['term'].describe()


NameError: name 'places' is not defined

In [11]:
places = places.groupby(['entry', 'level'])['term'].describe()
places.first()

NameError: name 'places' is not defined

In [12]:
terms = results.groupby(['term'])['entry'].count()
df = pd.DataFrame(terms)
df.sort_values('entry', ascending=False).head(25)

Unnamed: 0_level_0,entry
term,Unnamed: 1_level_1
"Beethoven, Ludwig van",26252
compositions included in Festschriften,20592
works,16026
life,5474
writings,3806
piano music,2142
reception,2004
performances,1984
aesthetics,1877
"Mozart, Wolfgang Amadeus",1800


In [13]:
results.groupby(['entry'])['term'].describe().head(25)

Unnamed: 0_level_0,count,unique,top,freq
entry,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,48953,8354,"Beethoven, Ludwig van",8366
2,43899,8127,"Beethoven, Ludwig van",6340
3,35589,7523,"Beethoven, Ludwig van",4033
4,28097,6490,"Beethoven, Ludwig van",2638
5,20808,5376,"Beethoven, Ludwig van",1652
6,15595,4282,"Beethoven, Ludwig van",1055
7,11576,3458,works,752
8,8783,2645,works,571
9,6762,2085,works,399
10,5054,1579,works,284


In [14]:
terms = results.groupby(['entry', 'level'])['term']
df = pd.DataFrame(terms)
df
# df.sort_values('full_id', ascending=False).head(25)

Unnamed: 0,0,1
0,"(1, 1)","0 Beethoven, Ludwig van 4 Beet..."
1,"(1, 2)",1 Festschriften 5 ...
2,"(1, 3)",2 monument inauguration 6 ...
3,"(1, 4)",3 ...
4,"(1, 5)","57285 quartets, string, op. 18 5..."
...,...,...
465,"(120, 3)","56344 Beethoven, Ludwig van 56345 Beetho..."
466,"(120, 4)",56520 by Hähnel 56521 by Hähnel 56522 ...
467,"(121, 1)",56696 visual and plastic arts 56697 visu...
468,"(121, 2)","56872 Hähnel, Julius 56873 Hähnel, Juliu..."


### What do the column names mean?

- **year** = year of publication
- **item** = an accession number, or the id of that item within its year
- **full_id** = the combined year and accession number, thus a unique ID for the publication
- **term** = the index term
- **entry** and **level** = ways of grouping the index terms each "ent" can have more than one 'lvl'.  These are in turn combined to make a full index string.  See below
- **id** = the id number of the index term
- **category** = a 'category' for the index term (see below), such as:
    - **G** = Geographical
    - **O** = an Organization
    - **N** = name of a person
- **author** = author of the publication
- **pubCC** = where the item was published



## What are the Categories for the Cat field?

```
B = broadcasts, radio, TV, and podcasts
C = title of choreographic work
D = dictionary
E = ethnic group
F = films and videos
G = geographic name
I = instrument
L = literary work (poetry and prose)
M = margin
N = personal name
O = Organization (other than a school)
P =  periodical
Q = databases
R = treatise
S = school
T = topic
V = visual art
W = work title
```

In [15]:
# another basic plot based on the year of publication and the Geographical Place mentioned in the results index for the Name field

places = results[results['category'] == "E"]
# places.plot.scatter(x = 'year', y = 'term', s = 100, figsize=(10, 15))

In [16]:
communities = results[results['category'] == "E"]
communities

Unnamed: 0,year,item,entry,level,id,term,category,author,pubCC,langItem,langTransFrom,full_id
65413,1948,346,3,2,560718,Bantu-speaking peoples,E,SAPA (South African Press Association),South Africa,Afrikaans,,1948-346
69059,1956,846,3,2,350070,Romani people,E,"Liszt, Franz",United States,English,,1956-846
69060,1956,846,3,2,350070,Romani people,E,"Morgenstern, Sam",United States,English,,1956-846
88279,1976,2267,8,3,656,Uyghur people,E,"Aravin, Pëtr Vasil'evič",Kazakhstan,Russian,,1976-2267
153902,1998,28859,2,3,59873,Indigenous peoples,E,"Smith Brindle, Reginald",United Kingdom,English,,1998-28859
175590,2001,13578,1,4,255181,Jewish people,E,"Gotthold, Zew W.",Germany,German,,2001-13578
175591,2001,13578,1,4,255181,Jewish people,E,"Deuring, Dagmar",Germany,German,,2001-13578
175592,2001,13578,1,4,255181,Jewish people,E,"Licht, Rainer",Germany,German,,2001-13578
175593,2001,13578,1,4,255181,Jewish people,E,"Jospe, Erwin",Germany,German,,2001-13578
175594,2001,13578,1,4,255181,Jewish people,E,"Jacobsen, Joseph",Germany,German,,2001-13578


In [17]:
# histogram of publications by place of publication:  results['pubCC']


italy = results[results['pubCC'].str.contains("Italy")]
# italy.hist('year', figsize=(10, 5), bins=100)

### A Concept Map


Here we find all of the terms associated with a given initial term, as follows:

- Limit the 'term' field to the "T" category (concepts).  This could also be done for a person, with "N"
```
t_concepts = results[results['category'] == "T"]
```

 - Now find all the **full_id numbers** that feature that term word and save as list
```
selected_concept = t_concepts[t_concepts['term'].str.contains('Black')]
selected_items = selected_concept['full_id'].to_list()
```
- Filter the original df so we have only the given full_id numbers (publications), and in turn filter that set so we only have terms corresponding to the "T" category.  This could instead be done for "N" or 'G', depending on your goal!
```
filtered_results = results[results['full_id'].isin(selected_items)]
filtered_results_t_concepts = filtered_results[filtered_results['category'] == "T"]
```

- Find the 'pairs' of all the terms mentioned in the at the various levels of the entries
- Remove the pairs that are just one term 2x
```
topic_as_pairs = filtered_results_t_concepts.groupby('year')['term'].apply(pairwise).explode().dropna().unique()
final_topic_pairs = []
for pair in topic_as_pairs:
    if len(set(pair)) > 1:
        final_topic_pairs.append(pair)
final_topic_pairs
```

In [18]:
# here we find all of the terms associated with a given term
# limit to terms with the "T" category
# t_concepts = results[results['category'] == "T"]

# then find all the full_id numbers that feature a given word as the term and save as list
selected_concept = results[results['term'].isin(lvb_top_terms)]
# selected_concept = t_concepts[t_concepts['term'].str.contains(lvb_top_terms)]
selected_items = selected_concept['full_id'].to_list()
# and retun the original list, now filtered for just those items
filtered_results = results[results['full_id'].isin(selected_items)]

# and filter those results to fit the a certain category, such as "T" or "G" or "N"
filtered_results_t_concepts = filtered_results[filtered_results['category'] == "T"]
# check the list of names for each essay/item
# groups = ideas.groupby('year')['term'].apply(list)
# # instead find the 'pairs' of all names mentioned in the items
topic_as_pairs = filtered_results_t_concepts.groupby('full_id')['term'].apply(pairwise).explode().dropna().unique()
final_topic_pairs = []
# remove pairs that are just one name 2x
for pair in topic_as_pairs:
    if len(set(pair)) > 1:
        final_topic_pairs.append(pair)

final_topic_pairs
filtered_results

Unnamed: 0,year,item,entry,level,id,term,category,author,pubCC,langItem,langTransFrom,full_id
24,1846,1,1,1,220229,"Beethoven, Ludwig van",N,"Schilling, Gustav",Germany,German,,1846-1
25,1846,1,1,1,220229,"Beethoven, Ludwig van",N,"Seidl, Johann Gabriel",Germany,German,,1846-1
26,1846,1,1,1,220229,"Beethoven, Ludwig van",N,"Silcher, Friedrich",Germany,German,,1846-1
27,1846,1,1,1,220229,"Beethoven, Ludwig van",N,"Schröder-Devrient, Wilhelmine",Germany,German,,1846-1
28,1846,1,1,1,220229,"Beethoven, Ludwig van",N,"Rungenhagen, Carl Friedrich",Germany,German,,1846-1
...,...,...,...,...,...,...,...,...,...,...,...,...
293199,2023,2303,4,5,268657,late period,,"Noorduin, Marten",United Kingdom,English,,2023-2303
293200,2023,2303,5,1,191504,reception,T,"Noorduin, Marten",United Kingdom,English,,2023-2303
293201,2023,2303,5,2,220229,"Beethoven, Ludwig van",N,"Noorduin, Marten",United Kingdom,English,,2023-2303
293202,2023,2303,5,3,219073,adagio movts.,,"Noorduin, Marten",United Kingdom,English,,2023-2303


### Make a Simple Network

- You will need to pass in the set of pairs created above and name the html file

In [25]:
G = nx.Graph()
# net = net.Network(notebook=True)
net = net.Network(notebook=True, width=1000, height = 800)
for a, b in final_topic_pairs:

    G.add_edge(a, b)
net.from_nx(G)
# Showing the network
net.show("final_topic_pairs.html")

### Community Network

In [21]:
# do not edit!

def add_communities(G):
    G = deepcopy(G)
    partition = community_louvain.best_partition(G)
    nx.set_node_attributes(G, partition, "group")
    return G

def create_node_html(node: str, source_df: pd.DataFrame, node_col: str):
    rows = source_df.loc[source_df[node_col] == node].itertuples()
    
    html_lis = []
    
    for r in rows:
        html_lis.append(f"""<li>author: {r.author}<br>
                                id: {r.full_acc}<br>"""
                       )
        
    html_ul = f"""<ul>{''.join(html_lis)}</ul>"""
        
    return html_ul


def add_nodes_from_edgelist(edge_list: list, 
                               source_df: pd.DataFrame, 
                               graph: nx.Graph,
                               node_col: str):
    
    graph = deepcopy(graph)
    
    node_list = pd.Series(edge_list).apply(pd.Series).stack().unique()
    
    for n in node_list:
        graph.add_node(n, title=create_node_html(n, source_df, node_col))
        
    return graph




### Create Community Network Here

In [22]:
pyvis_graph = Network(notebook=False, width="1800", height="1400", bgcolor="black", font_color="white")
G = nx.Graph()

try:
    G = add_nodes_from_edgelist(edge_list=final_topic_pairs, source_df=filtered_results, graph=G, node_col='name')
except Exception as e:
    print(e)

G.add_edges_from(final_topic_pairs)
G = add_communities(G)
pyvis_graph.from_nx(G)
pyvis_graph.show('Black_studies_names.html')

'name'
