# Composer Trends

One thing I’m interested in is how the popularity of various composers changed over time. So let’s take a look at how many times each composer was programmed per season.

In [30]:
import altair as alt
import numpy as np
import pandas as pd

In [3]:
df = pd.read_csv("../data/works.csv")

In [4]:
df.head()

Unnamed: 0,season,program,work,movement,composer,title
0,1842-43,3853,52446,,"Beethoven, Ludwig van","SYMPHONY NO. 5 IN C MINOR, OP.67"
1,1842-43,3853,8834,4.0,"Weber, Carl Maria Von",OBERON
2,1842-43,3853,3642,,"Hummel, Johann","QUINTET, PIANO, D MINOR, OP. 74"
3,1842-43,3853,8834,3.0,"Weber, Carl Maria Von",OBERON
4,1842-43,3853,8835,1.0,"Rossini, Gioachino",ARMIDA


Looking at the distribution of the number of times each composer was programmed, we can see that at least 50% of all of the composers ever performed were only programmed one time, and only 25% of composers were performed at least four times. We’ll need to find some minimum threshold for sufficiently popular composers.

In [28]:
composers = df.groupby("composer").nunique().drop(columns=["composer", "title", "movement"])
composers.describe()

Unnamed: 0,season,program,work
count,2778.0,2778.0,2778.0
mean,5.696904,19.177826,4.294816
std,16.50341,132.931394,17.536582
min,1.0,1.0,1.0
25%,1.0,1.0,1.0
50%,1.0,1.0,1.0
75%,3.0,4.0,3.0
max,176.0,3423.0,645.0


In [32]:
composers.quantile(np.arange(0, 1.1, .1))

Unnamed: 0,season,program,work
0.0,1.0,1.0,1.0
0.1,1.0,1.0,1.0
0.2,1.0,1.0,1.0
0.3,1.0,1.0,1.0
0.4,1.0,1.0,1.0
0.5,1.0,1.0,1.0
0.6,2.0,2.0,1.0
0.7,2.0,3.0,2.0
0.8,4.0,5.0,3.0
0.9,10.0,14.0,7.0


In [157]:
programs = pd.DataFrame({
    "count": df.groupby(["season", "composer"])["program"].nunique()
})

programs.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,count
season,composer,Unnamed: 2_level_1
1842-43,"Beethoven, Ludwig van",4
1842-43,"Bellini, Vincenzo",1
1842-43,"Haydn, Franz Joseph",1
1842-43,"Herz, Henri",1
1842-43,"Hummel, Johann",3


In [158]:
l, r = programs.align(programs.groupby("season")["count"].sum(), axis=0, level="season")
programs["pct"] = l["count"] / r
programs.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,count,pct
season,composer,Unnamed: 2_level_1,Unnamed: 3_level_1
1842-43,"Beethoven, Ludwig van",4,0.142857
1842-43,"Bellini, Vincenzo",1,0.035714
1842-43,"Haydn, Franz Joseph",1,0.035714
1842-43,"Herz, Henri",1,0.035714
1842-43,"Hummel, Johann",3,0.107143


In [159]:
# Calculate the revealed comparative advantage for the composers by season.
l, r = programs.align(
    programs.groupby("composer")["count"].sum() / programs["count"].sum(),
    axis=0,
    level="composer")

programs["rca"] = l["pct"] / r
programs.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,count,pct,rca
season,composer,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1842-43,"Beethoven, Ludwig van",4,0.142857,2.223446
1842-43,"Bellini, Vincenzo",1,0.035714,47.567857
1842-43,"Haydn, Franz Joseph",1,0.035714,2.543736
1842-43,"Herz, Henri",1,0.035714,1902.714286
1842-43,"Hummel, Johann",3,0.107143,237.839286


In [160]:
programs["log(rca)"] = np.log(programs["rca"])
programs.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,count,pct,rca,log(rca)
season,composer,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1842-43,"Beethoven, Ludwig van",4,0.142857,2.223446,0.799058
1842-43,"Bellini, Vincenzo",1,0.035714,47.567857,3.862157
1842-43,"Haydn, Franz Joseph",1,0.035714,2.543736,0.933634
1842-43,"Herz, Henri",1,0.035714,1902.714286,7.551037
1842-43,"Hummel, Johann",3,0.107143,237.839286,5.471595


The log of the revealed comparative advantage gives us a way to determine the relative unusualness of the presence of each composer in a given season. Scores above 0 indicate that a composer was performed more than expected that season. We can look at how a composer’s RCA changes over time to get a sense for how that composer’s popularity has changed over the course of the philharmonic’s history.

In [89]:
alt.Chart(
    programs.xs("Beethoven,  Ludwig  van", level="composer").reset_index(),
    width=600,
    title="Beethoven’s Popularity"
).mark_bar().encode(
    x="season:O",
    y="log(rca):Q"
)

<VegaLite 2 object>

For comparison, let’s take a look at one of the most distinctive composers from 2016–17

In [114]:
(programs[programs.align((composers["program"] > 14),axis=0, level="composer")[1]]
 .loc["2016-17"]
 .sort_values("log(rca)", ascending=False)
 .head(3))

Unnamed: 0_level_0,count,pct,rca,log(rca)
composer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"Adams, John",9,0.022727,26.907071,3.292389
"Salonen, Esa-Pekka",6,0.015152,26.039101,3.259599
"Janacek [Janácek], Leoš",5,0.012626,15.288108,2.727075


In [164]:
adams_pop = alt.Chart(
    programs.xs("Adams,  John", level="composer").reset_index(),
    width=600,
    title="John Adams’s Popularity"
).mark_bar().encode(
    x="season:O",
    y="log(rca):Q"
)

adams_pop

<VegaLite 2 object>

Here we can look at the top three most distinctive composers for each season. We exclude the bottom 90% of composers by total programs because so many composers were programmed only a small number of times, making RCA very sensitive in the seasons they were actually programmed.

In [111]:
(programs[programs.align((composers["program"] > 14),axis=0, level="composer")[1]]
 .reset_index()
 .groupby("season", as_index=False)
 .apply(lambda df: df[df["count"] > 1].sort_values("log(rca)", ascending=False).iloc[:3])
 .loc[:, ("season", "composer", "count", "log(rca)")]
 .set_index(["season", "composer"])
 .head(12))

Unnamed: 0_level_0,Unnamed: 1_level_0,count,log(rca)
season,composer,Unnamed: 2_level_1,Unnamed: 3_level_1
1842-43,"Hummel, Johann",3,5.471595
1842-43,"Rossini, Gioachino",3,2.482133
1842-43,"Weber, Carl Maria Von",4,2.150614
1843-44,"Hummel, Johann",2,5.307292
1843-44,"Donizetti, Gaetano",3,4.534102
1843-44,"Weber, Carl Maria Von",3,2.104094
1844-45,"Meyerbeer, Giacomo",2,4.48965
1844-45,"Donizetti, Gaetano",2,4.084185
1844-45,"Weber, Carl Maria Von",2,1.654177
1845-46,"Cherubini, Luigi",2,3.824913


## Debuts

Looking at the comparison above of John Adams to Beethoven, we can see that composers that have been around longer have a disadvantage in later years in terms of being popular. The later a compose debuts, the fewer seasons have the opportunity to program that composer, thus making that composer more distinctive in the seasons they are composed.

This leads me to believe we may need to calculate RCA relative the composer’s debut.

In [143]:
debuts = (df.groupby("composer", as_index=False)
          .apply(lambda df: df.drop_duplicates("season").sort_values("season").iloc[0])
         .loc[:, ["composer", "season"]])
debuts.head()

Unnamed: 0,composer,season
0,"ACT,",2010-11
1,"Abert, Johann Joseph",1926-27
2,"Abt, Franz",1852-53
3,"Achron, Isidor",1937-38
4,"Acosta, Daniel",2013-14


In [151]:
alt.Chart(
    debuts.groupby("season", as_index=False).count(),
    width=600,
    title="Debuts by Season"
).mark_bar().encode(
    x="season:O",
    y="composer:Q"
)

<VegaLite 2 object>

In [163]:
# Take only seasons from 1982 onward so that we can recalculate John Adams's
# RCA using only seasons in which he could conceivably have been programmed
p2 = programs.loc["1982-83":, :]

p2, r = p2.align(p2.groupby("composer")["count"].sum() / p2["count"].sum(), axis=0, level="composer")

p2["rca"] = p2["pct"] / r
p2["log(rca)"] = np.log(p2["rca"])

p2.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,count,pct,rca,log(rca)
season,composer,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1982-83,"Adams, John",1,0.003106,0.906349,-0.098331
1982-83,"Anthem,",1,0.003106,0.326286,-1.119982
1982-83,"Bach, Johann Sebastian",4,0.012422,0.647392,-0.434803
1982-83,"Balada, Leonardo",1,0.003106,40.785714,3.708332
1982-83,"Balassa, Sandor",1,0.003106,40.785714,3.708332


If we only consider seasons after a composer has debuted, we get a more interesting picture of their RCA in each season. John Adams, for instance, wasn’t really programmed much for the first 10 years after his debut. The first time he was really a distinctive element of a season’s programming was in the 1991–92 season.

In [166]:
alt.vconcat(
    adams_pop,
    alt.Chart(
        p2.xs("Adams,  John", level="composer").reset_index(),
        width=600,
        title="John Adams Popularity 1982+"
    ).mark_bar().encode(
        x="season:O",
        y="log(rca):Q"
    )
)

<VegaLite 2 object>

## Oldest Composers

In [187]:
_, premiere_composers = programs.align(debuts.set_index("composer")["season"] == "1842-43", axis=0, level="composer")
_, top_composers = programs.align(composers["program"] > 14, axis=0, level="composer")
programs[premiere_composers & top_composers].head(12)

Unnamed: 0_level_0,Unnamed: 1_level_0,count,pct,rca,log(rca)
season,composer,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1842-43,"Beethoven, Ludwig van",4,0.142857,2.223446,0.799058
1842-43,"Bellini, Vincenzo",1,0.035714,47.567857,3.862157
1842-43,"Haydn, Franz Joseph",1,0.035714,2.543736,0.933634
1842-43,"Hummel, Johann",3,0.107143,237.839286,5.471595
1842-43,"Mendelssohn, Felix",1,0.035714,1.661759,0.507877
1842-43,"Mozart, Wolfgang Amadeus",2,0.071429,1.801813,0.588793
1842-43,"Rossini, Gioachino",3,0.107143,11.966757,2.482133
1842-43,"Spohr, Louis",1,0.035714,34.594805,3.543704
1842-43,"Weber, Carl Maria Von",4,0.142857,8.590132,2.150614
1843-44,"Beethoven, Ludwig van",2,0.090909,1.41492,0.347073


In [188]:
alt.Chart(
    programs[premiere_composers & top_composers].reset_index(),
    width=600,
    height=100
).mark_bar().encode(
    x="season:O",
    y="log(rca):Q"
).facet(row="composer:N")

<VegaLite 2 object>

At this point, I’m a little unsure whether RCA is a good measure of a composer’s popularity—but perhaps that's just because Beethoven’s popularity seems to have declined since the 19<sup>th</sup> century. The question is: is his popularity really dwindling, or is it just that an increasing number of composers to choose from is watering down his RCA?