# Empirical Project 7

---
**Download the code**

To download the code used in this project as a notebook that can be run in Visual Studio Code, Google Colab, or Jupyter Notebook, right click [here]() and select 'Save Link As', then save it as a `.ipynb` file.

Don’t forget to also download the data into your working directory by following the steps in this project.

---

## Getting started in Python

For this project, you will need the following packages:

- **pandas** for data analysis
- **matplotlib** for data visualisation
- **numpy** for numerical methods

You'll also be using the **warnings** and **pathlib** packages, but these come built-in with Python.

Remember, you can install packages in Visual Studio Code's integrated terminal (click "View > Terminal") by running `conda install packagename` (if using the Anaconda distribution of Python) or `pip install packagename` if not.

Once you have the Python packages installed, you will need to import them into your Python session—and configure any other initial settings.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
import warnings
import matplotlib_inline.backend_inline

# Set the plot style for prettier charts:
plt.style.use("plot_style.txt")
# Make output charts in 'svg' format
matplotlib_inline.backend_inline.set_matplotlib_formats("svg")

# Ignore warnings to make nice output
warnings.simplefilter("ignore")

## Python Walkthrough 7.1

**Importing data into Python and creating tables and charts**

First ensure the data are stored within a subfolder of your working directory called `data`. Now let's read in the data using **pandas** `pd.read_excel` function.

In [None]:
df = pd.read_excel(Path("data/Project-7-datafile.xlsx"), sheet_name="Sheet1")
df.head()

We're going to use the `np.exp` function to create the variables `"p"`, `"q"`, and `"h"` (harvest) from their log counterparts. 

The names are a bit confusing too (`"\n"` is the new line character) so we'll clean them up first. We'll use a *regular expression* to replace any whitespace (including the new line character) with underscores.

In [None]:
df.columns = df.columns.str.replace("\s+", "_", regex=True)
df.head()

Now we can transform some of the columns with `np.exp`. As we're applying the same function multiple times, we can use a loop.

In [None]:
cols_to_convert = {"log_q_(Q)": "q", "log_p_(P)": "p", "log_h_(X)": "h"}
for key, value in cols_to_convert.items():
    df[value] = np.exp(df[key])

df.head()

Let’s plot the chart for the prices, with year as the horizontal axis variable and price (p) as the vertical axis variable.

In [None]:
fig, ax = plt.subplots()
ax.plot(df["Year"], df["p"])
ax.set_xlabel("Year")
ax.set_ylabel("Price")
plt.show()

**Figure 7.2** *Line chart for prices of watermelons*

Now we create the line chart for harvest and crop quantities (the variables `"h"` and `"q"`, respectively). First, we plot the crop quantities as a dashed line, then add a solid line for the harvest data. The legend method adds a chart legend when used on our axes object, `ax`. This knows what our lines are called when you specify the names of the lines when calling `ax.plot` with the `label=` keyword argument.

In [None]:
fig, ax = plt.subplots()
ax.plot(df["Year"], df["h"], label="Harvest")
ax.plot(df["Year"], df["q"], label="Crop", linestyle="dashed")
ax.set_xlabel("Year")
ax.set_ylabel("Quantity")
ax.legend()
plt.show()

**Figure 7.3** *Line chart for harvest and crop for watermelons.*