<img src=../materials/banner.png width="70%">

# Exercises for experienced programmers
1. [Introduction](#Introduction)
1. [Exercise 1: Characterizing human speech](#Exercise1)
1. [Exercise 2: Creating and characterizing a musical rhythm](#Exercise2)
1. [Exercise 3: Sperm whale clicks](#Exercise3)
1. [Bonus: Play around with bat data](#PlayAround)

In this notebook we'll use the following Python packages to explore some human, sperm whale, and bat vocalizations, and do some rhythmic analyses:

* [thebeat](https://thebeat.readthedocs.io/en/latest/): A package for working with temporal sequences and rhythms. It allows you to analyze and visualize rhythms an other temporal data.
* [matplotlib](https://matplotlib.org/): A package for plotting and visualizing data, based on Matlab's plotting functions.
* [pandas](https://pandas.pydata.org/): A package for working with dataframes, which are like tables in R or Matlab.

Optionally, you can use [seaborn](https://seaborn.pydata.org/) for plotting as well; it's been installed on this server, and for doing calculations you can also use [numpy](https://numpy.org/), which is also installed.

***Take a quick look at the documentation of the packages that you're not yet familiar with.***

---

In this notebook for more experienced programmers, we'll try to give you a bit more freedom to explore and play around with the data. We'll give you some hints and suggestions, but you can also try to come up with your own ideas. If you get stuck, you can always ask for help!

## Exercise 1: Characterizing human speech rhythm <a name="Exercise1"></a>

### Background <a name="Exercise1-background"></a>

Let us start with the most crazy animal of all: the human. We will use a dataset of human speech, and extract some rhythmic measures from it. The dataset is an abridged version of [this speech corpus](https://www.ortolang.fr/market/corpora/sldr000033?lang=en). It contains a number of phrases, their syllables, and the durations of the syllables.

![human speech rhythm](https://www.cell.com/cms/attachment/2c7a6e28-551a-40dd-a229-5ab061539285/gr1b2_lrg.jpg)
*From [Kotz, Ravignani, & Fitch, 2018](https://doi.org/10.1016/j.tics.2018.08.002)*

Speech consists of rhythms at different timescales. In English, for instance, there is an underlying rhythm indicating stressed words, but also one that indicates stressed syllables.

### Importing packages and loading the data <a name="Exercise1-dataloading"></a>

***Use the code block below to import `thebeat` and `pandas`, and load the speech data, which is stored in `data/syllables.csv`:***

In [None]:
# import thebeat and pandas

# read in the 'syllables.csv' dataframe


***Display the data quickly (e.g. using [this](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.head.html) function), check out the column names, and calculate some general descriptive statistics for the duration of the syllables (e.g. using [this](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html) function). In addition, extract the median syllable duration.***

In [None]:
# Display data quickly

# Calculate some general descriptive statistics

# Calculate and print the median syllable duration


**Question:** More often than not, we use the median duration rather than the mean duration. Why would that be?

### First look at one phrase of English

We will use the syllables' durations as so-called 'inter-onset intervals' (IOIs). These are widely used in timing research, and in this case represent the time difference between the onset of one syllable, and the onset of the next syllable. Similar terms are 'inter-stimulus interval' (ISI) and stimulus-onset asynchrony (SOA), though they are used in slightly different contexts.

***Select one of the phrases using Pandas, and plot the phrase using [thebeat](https://thebeat.readthedocs.io). To do so, first make the phrase into a ``thebeat.Sequence`` object using the syllable durations as the IOIs (instructions are [here](https://thebeat.readthedocs.io/en/stable/api_reference/core/Sequence.html#thebeat.core.Sequence.__init__)). Then, use one of the ``Sequence``'s [plotting methods](https://thebeat.readthedocs.io/en/stable/api_reference/core/Sequence.html#thebeat.core.Sequence.plot_sequence) to plot an 'event plot'.***

<div class=>

In [None]:
# Select the fourth phrase using Pandas slicing (use df[df.phrase == 4] syntax)
phrase =

# Make a 'Sequence' object, using the syllable durations as inter-onset intervals
sequence =

# Plot en event plot


The plotted lines here indicate the syllable onsets, and the distance between the lines indicate the IOIs.

### Plotting distribution of IOIs (syllable durations)

***Use e.g. the Pandas' .plot() method on the syllable durations to plot a histogram of all the syllable durations in the dataset:***

In [None]:
# Plot a histogram of all the syllable durations in the dataframe


**Question:** Are there any conclusions we can draw based on the histogram? How are they distributed?

---


*thebeat* contains a function in the stats module ([thebeat.stats](https://thebeat.readthedocs.io/en/stable/api_reference/stats.html)) to quickly check whether the IOIs in a ``Sequence`` follow a certain distribution. 

***Check whether the IOIs in the ``Sequence`` object created above follow a normal distribution.***

In [None]:
# Check whether the syllable durations in the fourth phrase are normally distributed



**Question:** What does the *p*-value tell us?

### Calculating the nPVI

In the code block below, calculate the normalized Pairwise Variability Index (nPVI) for the phrase selected above. The nPVI was mentioned in the presentation. The function to calculate it is also included in the *thebeat.stats* module ([thebeat.stats](https://thebeat.readthedocs.io/en/stable/api_reference/stats.html)).

In [None]:
# Finish this code:
npvi =

# Print the result
print(npvi)

**Question:** What does this value say about the rhythm class of English?

## Exercise 2: Creating and characterizing a musical rhythm <a name="Exercise2"></a>

Moving away from speech, let us now see how we can create a rhythm using *thebeat*, plot it, and visualize some of its properties.

***Start by importing *thebeat*:***

In [None]:
# Import thebeat



We create a new ``Rhythm`` object and plot it in musical notation. We will enter the notes as fractions. The value we supply to ``beat_ms`` indicates the duration of a quarternote in milliseconds (if the time signature is e.g. 4/4).

In [None]:
# Create a rhythm object from a list of fractions
rhythm = thebeat.music.Rhythm.from_fractions([1/8, 1/4, 3/8, 1/4, 1/4, 3/8, 1/4, 1/8], time_signature=(4, 4), beat_ms=250)
# Plot it in musical notation
rhythm.plot_rhythm()

### Make a phase space plot

Phase space plots give an intuitive understanding of the rhythmic structure of a sequence. Check out the documentation to *thebeat*'s visualization module ([thebeat.visualization](https://thebeat.readthedocs.io/en/stable/api_reference/visualization.html)) to find the function for making one. Note that before we can make the phase space plot we have to convert to ``Rhythm`` object to a ``Sequence`` object. 

***Finish the code below to make a phase space plot:***

In [None]:
# Convert Rhythm to Sequence
rhythm_sequence = rhythm.to_sequence()

# Make phase space plot, enter your code below:


**Question:** What does the phase space plot reveal about the rhythm?

---


Another visualization technique is a so-called recurrence plot. What do you think the following recurrence plot reveals? The *x* and *y* axes denote the indices of the sound onsets. The colors indicate the distance between two IOIs.

In [None]:
seq = thebeat.Sequence.generate_isochronous(10, 500)
seq.change_tempo_linearly(0.5)
thebeat.visualization.recurrence_plot(seq, colorbar=True)

## Exercise 3: Sperm whale clicks <a name="Exercise3"></a>

Now, let's look at some bioacoustics. We will use an abridged version of the dataset from [Hersh, Gero, Rendell, & Whitehead (2021)](https://doi.org/10.1111/2041-210X.13644).

As a bit of background, sperm whales produce so-called 'clicks' for echolocation and communication. These clicks are very loud, and can be heard over large distances (hundreds/thousands of miles). They are also very short, with durations of only a few milliseconds. Sperm whales live in 'clans' with specific dialects that are characterized by differences in click patterns.

One string of clicks we call a 'coda'.

For this exercise, we will use *thebeat*, *pandas*, and *matplotlib*.

In [None]:
import thebeat
import pandas as pd
import matplotlib.pyplot as plt

***Load the data using Pandas and save it in a variable called `clicks`. The dataset's file location is `data/whales.csv`.***

In [None]:
# Load the data
clicks =

# Show first 5 rows


***Now, make a list of *thebeat* ``Sequence`` objects for each of the codas (i.e. a sperm whale click sequence).***

***Hint: One way to do this is to loop over a grouped version of the DataFrame (grouped by column 'codanum')***

In [None]:
# Make empty list
sequences = []

# Loop over each coda, make a thebeat.Sequence, and add it to the list


Once we have that, we can quickly visualize these codas using *thebeat*'s ``plot_multiple_sequences`` function from the visualization module:

In [None]:
# Plot multiple sequences. It might be smart to adjust the dpi parameter in the plotting function:


*Note that the x axis is now in seconds (different from in the previous exercises)*

**Question:** Can you see any differences between the codas?

---


Again, plot the distribution of the inter-click intervals (i.e. the IOIs):

In [None]:
# Plot the distribution of the inter-click intervals:


***To discover patterns, let's plot recurrence plots for the first twelve codas:***

In [None]:
# Select first twelve codas
sample = sequences[:12]

# Make a grid
fig, axs = plt.subplots(3, 4, tight_layout=True, figsize=(8, 6))

# Loop over each sequence and plot it
for sequence, ax in zip(sample, axs.flatten()):
    thebeat.visualization.recurrence_plot(sequence, ax=ax)

**Question:** Which codas are similar and may have been produced by members of the same clan?

## Bonus: Play around with bat data <a name="PlayAround"></a>

There is one additional dataset in the `data` folder, called `bats.csv`. This is a dataset from [Burchardt & Kn√∂rnschild (2020)](https://doi.org/10.1371/journal.pcbi.1007755) of bat vocalizations (species *Carollia perspicillata*). You can load it in the same way as the other datasets, and play around with it. You can for instance try to plot the distribution of the IOIs, make recurrence plots, etc.

If you want to learn about other functions that are included in *thebeat*, take a look at the [package documentation](https://thebeat.readthedocs.io/en/latest/), for instance under `Examples`. Another thing that might be cool is to plot the distributions of so-called 'interval ratios'. You can find functions (and explanations of what it is) in the documentation to the ``thebeat.stats`` module.

In [None]:
# Play around with data and code!