In [None]:
!pip install "numpy<2,>=1.13" "pandas~=1.1" "matplotlib<4,>=2.1" "scipy<2,>=0.18" "scikit-learn>=0.19" "mpl-axes-aligner<2,>=1.1"

In [1]:
import math
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats
from sklearn.feature_selection import chi2

In [2]:
def plot_trend(column, df, line_color='grey', xlim=(1810, 1930)):
    slope, intercept, _, _, _ = scipy.stats.linregress(
        df.index, df[column].fillna(0).values)
    ax = df[column].plot(style='o', label=column)
    ax.plot(df.index, intercept + slope * df.index, '--',
             color=line_color, label='_nolegend_')
    ax.set_ylabel("fraction of recipes")
    ax.set_xlabel("year of publication")
    ax.set_xlim(xlim)

## Exercises

### Easy
1. Load the cookbook data set, and extract the "region" column. Print the number of unique
   regions in the data set.
2. Using the same "region" column, produce a frequency distribution of the regions in the
   data.
3. Create a bar plot of the different regions annotated in the dataset.

### Moderate
1. Use the function `plot_trend()` to create a time series plot for three or more ingredients of your
   own choice. 
2. Go back to section {ref}`sec-cooking-chp-culinary-taste-trends`. Create a bar plot of
   the ten most distinctive ingredients for the pre- and postwar era using as keyness
   measure the Pearson's $\chi^2$ test statistic.
3. With the invention of baking powder, cooking efficiency went up. Could there be a
   relationship between the increased use of baking powder and the number of recipes
   describing cakes, sweets, and bread? (The latter recipes have the value `"breadsweets"`
   in the `recipe_class` column in the original data.) Produce two time series plots: one
   for the absolute number of recipes involving baking powder as ingredient, and a second
   plotting the absolute number `"breadsweets"` recipes over time.

### Challenging
1. Use the code to produce the scatter plot from section
   {ref}`sec-cooking-chp-culinary-taste-trends`, and experiment with different time frame
   settings to find distinctive words for other time periods. For example, can you compare
   twentieth-century recipes to nineteenth-century recipes?
2. Adapt the scatter plot code from section
   {ref}`sec-cooking-chp-foreign-cooking-influences` to find distinctive ingredients for
   two specific ethnic groups. (You could, for instance, contrast typical ingredients from
   the Jewish cuisine with those from the Creole culinary tradition.) How do these results
   differ from the ethnicity plot we created before? (Hint: For this
   exercise, you could simply adapt the very final code block of the chapter and use the
   somewhat simplified keyness measure proposed there.)
3. Use the "region" column to create a scatter plot of distinctive ingredients in the
   northeast of the United States versus the ingredients used in the midwest. To make
   things harder on yourself, you could use Pearson's $\chi^2$ test statistic as a keyness
   measure.