<a href="https://colab.research.google.com/github/dsilvestro/LiteRate/blob/master/1_diversity_and_diversification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Modeling the Dynamics of Cultural Diversification

# 1. Diversity and Diversification

Culture can be understood as circulating populations of mental representations (e.g. knowledge, norms, values, beliefs, practices) and their public representations expressed through cultural practices, novel art, organizations, and material culture (e.g. cultural things) [(Koch et al, in progress)](https://osf.io/preprints/socarxiv/659bt). This tutorial will introduce you to how we can quantify changes in long-lasting representations (e.g. cultural lineages), and broader cultural forms through time. 


\\
In this tutorial, you will learn:

* How to think about cultural change in terms of the diversity of cultural lineages 
* What we can learn from diversity indices, and how to calculate them
* What we can learn from diversification rates, and how to calculate them
* How to simulate diversification within a population of cultural lineages

---


# a. Quantifying Diversity
 
Diversity broadly describes the distribution of elements across categories (Leonard and Jones 1989; [Stirling 2007)](https://doi.org/10.1098/rsif.2007.0213). In macroevolutionary biology, these elements are often species and the categories of interest are taxonomic clades, ecological niches, or geographic ranges. In cultural contexts, we are interested in the distribution of mental and material representations. Here, our elements are **cultural lineages** and our categories of interest are **cultural forms** that may include art genres, scientific disciplines, technologies, or institutional structures.

Below we list some diversity measures (adapted from [Stirling 2007)](https://doi.org/10.1098/rsif.2007.0213) that are commonly used across disciplines and what they quantify. The notation below includes the following:
- $N$ is the total number of cultural lineages
- $i$ is a particular cultural lineage 
- $n_i$ is the number of $i$ individual things within a cultural lineage

\\

| Property  | Description | Measurement | Questions |
|-----------|:-------------:|:-------------:|:------:|
| **Variety/Richness**  | Number of lineages |      $$N$$      | How many types of lineages do we have? |
| **Abundance**  | Count within a lineage |      $$n_i$$      | What is the mass of each lineage? |
| **Distinctiveness** | Distance between lineages | Pairwise comparisons on distance metric | How different are lineages from one another?
| **Evenness** |  Simpson Index  |        $$\sum_i (\frac{n_i}{N})^2$$     | What is the probability two randomly selected things are in the same lineage?
| **Evenness** |  Gini Index  |       $$1-\sum_i (\frac{n_i}{N})^2$$      |  What is the probability two randomly selected things are NOT in the same lineage?
| **Evenness** |  Shannon Entropy  |    $$ -\sum_i \frac{n_i}{N} \ln \frac{n_i}{N}$$         | How evenly distributed are things across lineages (information theory)? 
| **Normalized Evenness**  | Pielou Index | $$ (-\sum_i \frac{n_i}{N} \ln \frac{n_i}{N}) / \ln N$$   | How evenly distributed are things across lineages, compared to a perfectly even distribution? |


\\
The appropriateness of different diversity indices is contingent on the frame of analysis and resolution of the data available. In the populational analyses often conducted in ecological studies or anthropological ethnographies, evenness indices work well because we have a fairly reliable measure of the number of individuals in a group or the overall size of a group within a given region. However, over the historical time scales of interest in macroevolution, archaeology, history, and cultural/historical sociology, we may not have reliable estimates of the number things within a lineage. For example, it is difficult to know how many pots were manufactured in a particular style or how many people shared a cultural trait, value, or practice. In these circumstances, variety/richness indices are often most appropriate to use.

The tools we introduce in the following analyses rely on measures of variety/richness or the **"standing diversity"** of cultural lineages. Note that in some contexts where you have abundance data (e.g. the number of tweets belonging to a hashtag), it may be possible to weave these other diversity indices into cultural macroevolutionary analyses. 

We note now that even if you don't have complete abundance data, the **methods in these tutorials assume that the long-lasting representations or lineages in your data either comprise the complete set of the cultural form circulating among the population of people, or minimally a representative sample of this set.** Theoretically, you can't identify the operation of evolutionary mechanisms on cultural diversity without at least a representative sample. Empirically, we have used complete populations of cultural forms whenever possible, such as our analysis of all car models from 1896-2018 or all Metal bands created during the 20th century. 

\\

---


# b. From Diversity to Lineages

Diversity indices are useful tools to evaluate and compare the variety, balance, and disparity between different cultural systems. *However, one of the key limitations of diversity indices are that they do not disentangle the underlying proceses of cultural innovation and cultural death.* Ideas and products are "born" (i.e. circulate within a population of individuals) when they are publically produced and shared with others. Ideas and products culturally "die" when they stop circulating between people.



# c. Diversity in American Automobiles

Below we demonstrate some exploratory analyses to look at cultural diversity over time. We feature work on the diversification dynamics of American car models manufactured between 1896 and 2018. We highlight automobiles as they represent one of the most transformative technologies in human history. Since their introduction in the late 19th century, automobiles have radically changed how people move, where people live, and even the global climate. Automobiles are also a diverse technological system (or cultural form) that has gone through many changes but also stayed relatively similar in overall form. 

In our analysis, we use car models as an example of cultural lineages. This is because we consider each car model to have a commercial and cultural reality that persists through time despite small generational changes in features and physical appearances. In addition, car models have an established classification/categorization scheme that has emerged throughout the history of the automotive industry. The benefits of using an established classification scheme is that we do not need to quantify all the traits of each car model, as would be required for phylogenetic analysis; we only need to know the starting and ending years of production (occurrence data) for each car model. 

The list and production years of each car model derives from the Master Vehicle List maintained by Ebay for the listing and selling of automobile parts. We believe the dataset to be fairly complete and represents a significant majority of the automobiles manufactured over the last 120 years. The analysis and data were recently featured in an PLOS One article [(Gjesfjeld et al. 2020)](https://doi.org/10.1371/journal.pone.0227579) with the associated data available [here](https://doi.org/10.6084/m9.figshare.9816326). 

\\
###Richness Through Time
One of the most straightforward plots we can create is the richness or standing diversity through time. More simply, this is the number of car models that are produced each year by American companies. However, since we do not know how many of each car model were made or sold each year, we are unable to calculate the evenness or abundance of car models. A plot of car model richness from 1896 to 2018 looks like this:

<figure align="center">
<img src="https://drive.google.com/uc?id=1Id03hKEmMptnJcZHkHliI1ty41qeb7Zi" alt="" width="500" height="500" border="0">
</figure>


<details>
<summary><font size="4" color="dodgerblue"> R Code </font></summary>

```
library(tidyverse) 

cars_mvl<-read.csv("~/Downloads/cars_mvl_mm_rev.csv")

#Using only American Automobiles
cars_mvl_america <- cars_mvl %>% filter(continent=="America")

#Taking the first and last production years of each car model
cars_mvl_america <- cars_mvl_america %>% group_by(make_model_rev) %>%
  arrange(year) %>%
  summarise(make=first(make),
            model=first(model),
            first_year=first(year),
            last_year=last(year))

#Removing 2019 and 2020 as this data is incomplete (at the time of writing)
cars_mvl_america <- cars_mvl_america %>% filter (last_year <= 2018)

#Counting up the number of cars originating and going "extinct" each year
orig_counts<-cars_mvl_america %>% group_by(first_year) %>%
  summarize(or_counts=n()) %>% mutate(or_sum=cumsum(or_counts),year=first_year)
ex_counts <- cars_mvl_america %>% group_by(last_year) %>%
  summarize(ex_counts=n()) %>% mutate(ex_sum=cumsum(ex_counts),year=last_year)
raw_diversity <- left_join(orig_counts, ex_counts, by="year") %>%
  mutate(richness=or_sum-ex_sum) %>% mutate(log_rich=log10(richness))

#Plotting the richness through time
plot(raw_diversity$year[-nrow(raw_diversity)],
     raw_diversity$richness[-nrow(raw_diversity)],
     type="l",bty="n",ylab="",xlab="",col="springgreen4",lwd=3,xaxt="n",yaxt="n")
axis(1,at=c(1900,1920,1940,1960,1980,2000,2020),
     labels=c("1900","1920","1940","1960","1980","2000","2020"),cex.axis=1)
axis(2,at=c(0,50,100,150,200),labels=c("0","50","100","150","200"),cex.axis=1,las=2)
mtext("Year (AD)", side = 1, line = 2, cex=1)
mtext("Richness", side = 2, line = 3, cex=1)

```

</details>

This plot demonstrates the number of different car models produced by American automobile companies from 1896 to 2018. The general trend is a steady increase until the early 1990s and then a fairly steady decrease in range of car models produced.


An alternative way to view richness through time is the use of a log lineage through time (LTT) plot. This plot uses the common practice of log-transforming the y-axis (or lineage richness). In a plot in semi-logarithmic space, the null expectation is of continued and even growth, which is indicated by a straight line. Deviations from a straight line indicate time periods where the origination or death of new lineages is greater or lower than might be expected from constant growth.  

<figure align="center">
<img src="https://drive.google.com/uc?id=1k7lX6u0xj1yJDYreb54z19b6Gt-0FaJr" alt="" width="500" height="500" border="0">
</figure>


<details>
<summary><font size="4" color="dodgerblue"> R Code </font></summary>

```
par(mar=c(4.1,4.1,1.1,1.1))
plot(raw_diversity$year[-nrow(raw_diversity)],
     log10(raw_diversity$richness[-nrow(raw_diversity)]),
     type="l",bty="n",ylab="",xlab="",col="springgreen4",lwd=3,xaxt="n",yaxt="n")
axis(1,at=c(1900,1920,1940,1960,1980,2000,2020),
     labels=c("1900","1920","1940","1960","1980","2000","2020"),cex.axis=1)
axis(2,at=c(0,1,2),labels=c("0","1","2"),cex.axis=1,las=2)
segments(x0=1899,y0=0,y1=log10(max(raw_diversity$richness,na.rm=TRUE)),x1=2018,lty=2,lwd=2)
mtext("Year (AD)", side = 1, line = 2, cex=1)
mtext("Richness (Log10)", side = 2, line = 3, cex=1)

```

</details>


\\
### Longevity
In addition to simply plotting changes in richness through time, we can also evaluate the longevity of different car models. As we can see in the plot below, some car models have incredibly long lifespans ("living fossils") while others have very short lifespans. We can also see that shorter lifespans tend to be more prevalent during the early history of automobile production (before 1945), as visualized below. 


<figure align="center">
<img src="https://drive.google.com/uc?id=1N0f08ogJJCJR8pbc_d2xBTEMJo9vV_vD" alt="" width="467" height="333" border="0">
</figure>

<details>
<summary><font size="4" color="dodgerblue"> R Code </font></summary>

```
cars_2018_america_draw<-arrange(cars_mvl_america,first_year)
cars_2018_america_draw$ID<-as.numeric(row.names(cars_2018_america_draw))

# Function to draw points and lines
draw_func_range<-function(x,point_color,point_shape,point_size,line_color,line_width){
  for (i in 1:nrow(x)){
    points(as.integer(x[i,4]),as.integer(x[i,6]),pch=point_shape,cex=point_size,col=as.character(point_color))
    points(as.integer(x[i,5]),as.integer(x[i,6]),pch=point_shape,cex=point_size,col=as.character(point_color))
    segments(as.integer(x[i,4]),as.integer(x[i,6]),as.integer(x[i,5]),as.integer(x[i,6]),col=alpha(as.character(line_color),0.7),
             lwd=line_width,lty=1)
  }
}

#Plotting the image
par(mar=c(5.1,1.1,1.1,1.1))
plot(cars_2018_america_draw$first_year,
     cars_2018_america_draw$make_model_rev,type="n",
     ylim=c(0,nrow(cars_2018_america_draw)),
     xlim=c(min(cars_2018_america_draw$first_year),max(cars_2018_america_draw$last_year)),
     xlab="",ylab="",col.lab="black", col.main="black",bty="n",xaxt="n",yaxt="n")
draw_func_range(cars_2018_america_draw,"springgreen4",20,1,"springgreen4",2)
axis(1,at=c(1900,1920,1940,1960,1980,2000,2020),
     labels=c("1900","1920","1940","1960","1980","2000","2020"),cex.axis=1.5)
mtext("Year (AD)", side = 1, line = 3, cex=1.5 )

```

</details>

\\

We can also create a histogram of the lifespans presented above to view the relative proportion of short-lived and long-live car models. As expected, our dataset contains far more car models that have very short lifespans and only a few models with long lifespans.

<figure align="center">
<img src="https://drive.google.com/uc?id=1e6ZTf3lgLzABl60zK4RIYVQ4UOKD6QFK" alt="" width="500" height="500" border="0">
</figure>

<details>
<summary><font size="4" color="dodgerblue"> R Code </font></summary>

```
#Counting up the lifespans
cars_2018_america_hist<-cars_mvl_america %>% mutate(lifespan=last_year-first_year)
cars_2018_america_counts<-cars_2018_america_hist %>% group_by(lifespan) %>%
  summarise(counts=n())

# Histogram
par(mar=c(4.1,4.1,1.1,1.1))
plot(y=log10(cars_2018_america_counts$counts),x=cars_2018_america_counts$lifespan,
     type="h",yaxt="n",col="springgreen4",lwd=3,bty="n",ylab="",xlab="",xaxt="n")
axis(1,at=seq(0,70,10),labels=c("0","10","20","30","40","50","60","70"),cex.axis=1)
axis(2,at=c(0,1,2,3),labels=c("0","10","100","1000"),las=2,cex.axis=1)
mtext("Lifespans of Car Models", side = 1, line = 2, cex=1)
mtext("Frequency", side = 2, line = 3, cex=1)

```

</details>

---

# d. From Diversity Indices to Diversification Rates

Diversity indices are useful tools to evaluate and compare the variety, balance, and disparity within a cultural form. *However, one of the key limitations of diversity indices are that they do not disentangle the underlying processes of cultural innovation and cultural death.* Therefore, instead of diversity indices, many macroevolutionary approaches (including ours) focus on birth/origination and death/extinction rates as the primary metric of interest. Compared to variety/richness, diversification rates provide additional insight into the dynamics of stability and change over time.

For example, the log lineage through time plot of car models shows a dramatic decline in the diversity of car models from the mid-1990s until today. But this decline could be caused by:

 1) A stable origination rate and rising extinction rate \
 2) A stable extinction rate and declining origination rate \
 3) A combination of a declining origination rate and rising extinction rate

<figure align="center">
<img src="https://drive.google.com/uc?id=1DXJLnP7D8Nz-rhO17_JKowd-bdmcgdh6" alt="" width="600" height="333" border="0">
</figure>

Diversification rates simply provide more information into the processes underlying changes in diversity than diversity indices. For example, one could hypothesize that each of the three diversification scenarios presented above represent a different set of underlying cultural, economic or social processes.

1) The first scenario might represent an economic slowdown where some car models are simply no longer produced, similar to what we saw in the 2008 Recession.

2) The second scenario may suggest a pattern of competition where diversity is slowly being reduced over time due to the emergence of a "dominant design", as has been suggested in economics (Abernathy 1978). 

3) The third scenario suggests the possible combination of interacting processes, such as an economic slowdown that will raise extinction rates in combination with an uncertain environment that would tend to decrease origination rates. 

In any of these circumstances, understanding the relationship between origination and extinction rates is imperative to gaining further insight into processes driving the patterns of diversity. This is clear when we view the diversification of American automobiles from 1896-2018 (shown below), as opposed to simply the change in car model richness over time (shown above). By estimating both origination and extinction rates, we are able to identify historical periods in which both rates were impacted (Great Depression, WWII) and those in which only one was impacted (Arab Oil Crisis, 2008 Recession). Furthermore, we can also identify with far greater clarity the timing of significant shifts in either origination and extinction rates, which are highlighted in the figure below from [Gjesfjeld et al.(2020)](https://doi.org/10.1371/journal.pone.0227579.)

<figure align="center">
<img src="https://drive.google.com/uc?id=1wXgp_9hFBu5qCsGqD4JH-DqN5DNXYrR8" alt="" width="500" height="700" border="0">
</figure>

## Why look at diversification rates?

Diversification rates capture how cultural lineages "beget" other lineages through learning and innovation by people, as well as how cultural lineages "die" from disuse and forgetting. They do not make strong assumptions about individual-level circumstances (as in agent-based simulations), or the exact sequence of transmission and variation events (as in networks and phylogenies). Diversification rates consider the entire population or cultural form without privileging some lineages as more important than others. We all have our own heterogeneous personal cultures in which some representations and things are more relevant than others, so in many cases this is a reasonable simplifying assumption. However, it is a strong assumption and should be evaluated in context.

Diversification rate analyses of cultural lineages should be considered as a complement, not a competitor, to actor-based analyses like social network analysis or agent-based simulations. While those methods start from actor-level transmission processes to extrapolate population-scale cultural outcomes, diversification rate analysis takes the opposite approach. Diversification rate analysis starts from population-level cultural phenomena and identifies trends that are consistent with individual-level processes occurring. This approach can be helpful in corroborating simulations, or when actor-level structural data simply aren't available.

---


# e. Formal Introduction to Diversification Rates

In macroevolutionary biology, certain patterns in diversification rates are recognized as theoretically consistent with evolutionary mechanisms that have shaped the dynamics of the species or clade over time. In cultural contexts, birth and death rates can highlight the role of major events and evolutionary mechanisms in the histories of cultural forms.

\\

The birth rate is defined as the expected number of birth events per lineage per time unit (e.g. 1 year) and can be approximated by the following formula:
$$ \lambda = \frac{number\;of\; lineage\;births}{total\;time\;lived}$$

\\

The death rate is defined as the expected number of death events per lineage per time unit and can be approximated as:
 $$\mu=\frac{number\;of\;lineage\;deaths}{total\;time\;lived}$$

\\

where *total time lived* is the total time collectively lived by all lineages in the period of analysis. If the period of analysis is just a single unit of time (e.g., year), this reduces to:

\\

$$ \lambda = \frac{number\;of\;lineage\;births}{standing\;diversity}$$
and
 $$\mu=\frac{number\;of\;lineage\;deaths}{standing\;diversity}$$

\\


---


# f. Creating a Diversification Rate Simulator

To clarify what diversification rates actually represent, we are going to create a diversification rate simulator.

In this first example, we can simulate population dynamics over time by randomly selecting individual lineages to reproduce or die in each time unit. Remember, the number of lineages in the population at any moment is sometimes called the net or **standing diversity**. If we calculate the raw diversification rates from the standing diversity over time, we call these the **empirical** birth and death rates (as opposed to **estimated or theoretical** rates from statistical models).

Let's start by creating a population object that can birth individuals, kill individuals, and keep track of the standing diversity.

To create this object, hover over the brackets below (next to SHOW CODE) and then click on the "play" button. If you're comfortable programming, you can double-click to get the gist of the code (even if you're not familiar with Python). 

**Just make sure you run the code blocks in sequential order otherwise you will get errors. Please also note that no output will occur until you get to the third code block.**  

In [None]:
#@title
import numpy as np
class Population:

  def __init__(self,starting_diversity,total_time):
    self.total_time=total_time #total number of time units (e.g. years)
    self.birth_times=np.zeros(starting_diversity) #every individual in starting population is born at time 0
    self.death_times=np.repeat(total_time,starting_diversity) #for now, set all individual death times to the last time period. We will kill them later ;)

    self.alive_index=np.arange(starting_diversity)

  def currently_alive(self):
    return self.alive_index

  def create_individuals(self,num_individuals,time):
    '''
    create "num_individuals" at "time"
    '''

    #update alive index
    self.alive_index=np.concatenate( (self.alive_index, np.arange(len(self.birth_times), len(self.birth_times)+num_individuals) ) )
    #update birth times
    self.birth_times=np.concatenate( ( self.birth_times,np.repeat(time, num_individuals) ) )
    self.death_times=np.concatenate( ( self.death_times,np.repeat(self.total_time, num_individuals) ) )


  def kill_individuals(self,indices,time):
    '''
    kill off individuals with "indices" in alive_index at "time"
    '''
    #update death times
    kill_index=self.alive_index[indices] #need to get back from alive index to times indices
    self.death_times[kill_index]=time
  
    #update alive index
    self.alive_index=np.delete(self.alive_index,indices)

  def calc_time_lived(self, frame_start, frame_end): #found in literate library as get_br
    '''
    calculates total time lived by all individuals within a time window
    '''
    s, e  = self.birth_times.astype(float), self.death_times.astype(float)
    s[s<frame_start] = frame_start #set elements born before timeframe to start of timeframe
    e[e>frame_end] = frame_end # set elements dying after timeframe to end of timeframe.  
    dt = e - s
    return np.sum(dt[dt>0])
   

Now we will also create an object that simulates population dynamics over time. This object has parameters for all the characteristics we are interested in. These include:

- Time Length: How many time units we want to simulate for
- Starting Diversity: The diversity or richness of lineages at the first time step
- Theoretical Birthrate: The rate at which new lineages emerge in each time step
- Theoretical Deathrate: The rate at which existing linages are lost in each time step
- Random Seed: A random value with which to reproduce the stochastic simulation

Once again, hover over the brackets below and then click on the "play" button.  

In [None]:
#@title
import pandas as pd
import tqdm
import warnings #to deal with stupid tqdm bug and dep warning for sol
warnings.filterwarnings('ignore')
class Simulator:
  def __init__(self,
    theoretical_lambda,
    theoretical_mu,
    epoch=50,
    starting_div=1000,
    ):
    self.theoretical_lambda=theoretical_lambda
    self.theoretical_mu=theoretical_mu
    self.epoch=epoch
    self.starting_div=starting_div

  def run_simulation(self, seed=None):
    if seed!=None: np.random.seed(seed)
    pop=Population(self.starting_div,self.epoch)

    #storage arrays
    empirical_lambda=[]
    empirical_mu=[]
    standing_diversity=[]

    for t in tqdm.tqdm_notebook(range(1,self.epoch+1)):
      #stochastically birth and kill individuals in each time unit
      r=np.random.sample(len(pop.currently_alive())) #each living individual gets a random number
      birther_indices = (r < self.theoretical_lambda).nonzero()[0] #get indices of element that will spawn new elements
      dying_indices = np.intersect1d((r >= self.theoretical_lambda).nonzero()[0], (r < self.theoretical_lambda+ self.theoretical_mu).nonzero()[0]) # get indices of elements to kill
      '''
      Breaking this down for R users (where it would be prettier)
      1. r >= birth_rate returns bool array
      2. nonzero returns nonzero indices
      3. intersect1d returns numbers in both arrays
      '''

      time_lived=pop.calc_time_lived(t-1,t)
      assert time_lived == len(pop.currently_alive()) #just to show you

      #update population
      pop.create_individuals(len(birther_indices),t)
      pop.kill_individuals(dying_indices,t)

      #calculate statistics
      empirical_lambda.append(len(birther_indices)/time_lived)
      empirical_mu.append(len(dying_indices)/time_lived)
      standing_diversity.append(time_lived)

    return pd.DataFrame({
        'Time': np.arange(self.epoch),
        'Theoretical Birthrate': self.theoretical_lambda,
        'Theoretical Deathrate': self.theoretical_mu,
        'Empirical Birthrate': empirical_lambda,
        'Empirical Deathrate': empirical_mu,
        'Standing Diversity': standing_diversity
    })

---
# g. Simulating constant diversification rates
Note the default settings: 

- Time Length: 50
- Starting Diversity: 1005
- Theoretical Birthrate: 0.01
- Theoretical Deathrate: 0.005
- Random Seed: 12345

Before moving on to plot the outcome of the simulation, what would be your initial expectation for how the standing diversity changes through time? Would it increase or decrease? And by how much? Adjust the sliding scales below to match these values if needed and press the "play" button. 

We've made the diversification rates unrealistically small here to minimize major stochastic swings. We'll account for these in more elegant ways in the future, but we want to keep the code simple. Please note that we are primarily interested in the relative differences between birth and death rates rather than their absolute magnitudes.


In [None]:
time_length=30 #@param {type:"slider", min:5, max:100, step:5}
starting_diversity= 3605 #@param {type:"slider", min:5, max:10000, step:100}
theoretical_birthrate=.01 #@param {type:"number"}
theoretical_deathrate=.005 #@param {type:"number"}
random_seed=12345 #@param {type:"number"}
mySimulator=Simulator(
    theoretical_birthrate,
    theoretical_deathrate,
    time_length,
    starting_diversity,
    )

results=mySimulator.run_simulation(random_seed)

#@title 
# NOTE IF PLOTS DO NOT SHOW MAKE SURE YOU ARE NOT BLOCKING POPUPS FROM THIS PAGE
import altair as alt
div_plot = alt.Chart(results).mark_line().encode(
  x=alt.X('Time'),
  y=alt.Y('Standing Diversity', scale=alt.Scale(zero=False)),
  color=alt.value('green')
).properties(
  width=600,
  height=200
).interactive()

birth_plot = alt.Chart(results).mark_line(opacity=0.9).transform_fold(['Empirical Birthrate','Theoretical Birthrate',]).encode(
  x=alt.X('Time:Q'),
  y=alt.Y('value:Q', scale=alt.Scale(zero=False), title='Rates'),
  color=alt.Color('key:N',scale=alt.Scale(scheme='blues'),title=''),
).properties(
  width=600,
  height=200,
).interactive()

death_plot = alt.Chart(results).mark_line(opacity=0.9).transform_fold(['Empirical Deathrate','Theoretical Deathrate']).encode(
  x=alt.X('Time:Q'),
  y=alt.Y('value:Q', scale=alt.Scale(zero=False), title='Rates'),
  color=alt.Color('key:N',scale=alt.Scale(scheme='reds'),title=''),
).properties(
  width=600,
  height=200
).interactive()

(div_plot & birth_plot & death_plot).resolve_scale(color='independent')

With the settings above, we can see that the standing diversity steadily increases from 1000 lineages to 1600 lineages over 50 time steps. Because we have a theoretical birth rate that is higher than the theoretical death rate, we should expect to see an increase in the standing diversity through time. 

However, the empirical birth and death rates show much greater variability from time step to time step. Over long enough periods and with big enough populations, the empirical rates will match the theoretical rates, but as you can see they are quite variable between each time unit in this simulation. 

<font color='firebrick'><h2>Check Your Understanding:</h2></font>

Using the simulator above, try to answer the following questions:

<font size=3 align="left">

1. What happens if you change the theoretical birth rate to a much higher value, such as 0.1?  Before you run it, what type of growth do you expect to see (linear, exponential, logistic) in the standing diversity? <br> What happens to the empirical rates over time. Why? 
 
\\

2. Keeping the birth rate at 0.1 and the death rate at 0.05, change the time length from 50 to 5. What happens to the standing diversity? What type of growth does it look like now?

\\

3. Finally, let's make the birth and death rates equal to each other. Change the time length back to 50 and set both the birth and death rates to 0.1. What does the standing diversity look like with these settings? <br>
Now, change the random seed from 12345 to 111 and then to 222 and then 333. How much does the standing diversity change in-between these different settings? **What does this tell us about the stochasticity in our simulation?**  
<font>

\\

Beyond getting a feel for the relationship between diversification rates and standing diversity, it's important that you notice that **the observed empirical rates are noisy stochastic representations of the true, theoretical underlying rates.** If we are to use rate analyses to understand the histories of cultural forms, we must be able to differentiate between stochastic noise and meaningful changes in the rates over time. This motivates the usage of statistical models for birth and death rates that cut through this noise with some modest assumptions, as we will highlight in future tutorials. 

---


# Key Takeaways

- **Culture can be understood as circulating populations of cultural representations, which we refer to as cultural lineages. Changes in the diversity of these cultural lineages can explain how culture emerges, stabilizes, or changes over time.**

\\

- **There are a variety of indices for highlighting different aspects of diversity and its change over time.**

\\

- **Diversification (birth and death) rates contain more information than diversity indices because they describe processes of cultural origination and extinction.**

\\

- **Empirical diversifications rates are calculated as the number of births/deaths over total time lived in a time window. These snapshots are noisy representations of the true/theoretical rates.** 


# Up Next...

In the next lesson, we introduce you to the linear birth-death process, the core statistical framework for our diversification rate methods and the LiteRate algorithm to estimate statistically-significant rate shifts.



---
# References

Abernathy, William. The Productivity Dilemma: Roadblock to Innovation in the Automobile Industry. Baltimore ; London: Johns Hopkins University Press, 1978.

Gjesfjeld, Erik, Daniele Silvestro, Jonathan Chang, Bernard Koch, Jacob G. Foster, and Michael E. Alfaro. ‘A Quantitative Workflow for Modeling Diversification in Material Culture’. PLOS ONE 15, no. 2 (6 February 2020): e0227579. https://doi.org/10.1371/journal.pone.0227579.

Koch, Bernard, Daniele Silvestro, and Jacob G. Foster. n.d. “The Evolutionary Dynamics of Cultural Change (as Told Through the Birth and Brutal, Blackened Death of Metal Music).” SocArXiv. [https://doi.org/10.31235/osf.io/659bt](https://doi.org/10.31235/osf.io/659bt).

Leonard, Robert D, and George T Jones, eds. Quantifying Diversity in Archaeology. Cambridge University Press Cambridge, 1989.

Stirling, Andy. ‘A General Framework for Analysing Diversity in Science, Technology and Society’. Journal of the Royal Society Interface 4, no. 15 (2007): 707–719. https://doi.org/10.1098/rsif.2007.0213



**License:** These tutorials are licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-nc-sa/4.0/).