# Empirical Project 7

## Getting Started in Python

## Preliminary Settings

Let's import the packages we'll need and also configure the settings we want:

In [None]:
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
import seaborn as sns
import seaborn.objects as so
import pingouin as pg
import warnings


### You don't need to use these settings yourself
### — they are just here to make the book look nicer!
# Set the plot style for prettier charts:
plt.style.use("plot_style.txt")
# Make seaborn work consistently with this
so.Plot.config.theme.update(mpl.rcParams)
# Ignore warnings to make nice output
warnings.simplefilter("ignore")

## Python Walkthrough 7.1

**Importing data into Python and creating tables and charts**

First ensure the data, contained in `Project-7-datafile.xlsx`, are stored within a subfolder of your working directory called `data`. The following code, using Python's built-in `glob` library, will list the file if you're in the right place:

In [None]:
import glob

glob.glob("data/*.xlsx")

If you're in the wrong place, you can change the working directory with `import os` followed by `os.chdir("path/to/your/working/directory")` but it's better practice to just open a folder with an editor like Visual Studio Code directly.

Now let's read in the data using **pandas** `pd.read_excel` function.

In [None]:
df = pd.read_excel(Path("data/Project-7-datafile.xlsx"), sheet_name="Sheet1")
df.head()

We're going to use the `np.exp` function to create the variables `"p"` (price), `"q"` (quantity), and `"h"` (harvest) from their log counterparts.

The names are a bit confusing too (`"\n"` is the new line character) so we'll clean them up first. We'll use a *regular expression* to replace any whitespace (including the new line character) with underscores.

In [None]:
df.columns = df.columns.str.replace("\s+", "_", regex=True)
df.head()

Now we can transform some of the columns with `np.exp`. As we're applying the same function multiple times, we can use a loop.

In [None]:
cols_to_convert = {"log_q_(Q)": "q", "log_p_(P)": "p", "log_h_(X)": "h"}
for key, value in cols_to_convert.items():
    df[value] = np.exp(df[key])

df.head()

Let’s plot the chart for the prices, with year as the horizontal axis variable and price (p) as the vertical axis variable.

In [None]:
fig, ax = plt.subplots()
ax.plot(df["Year"], df["p"])
ax.set_xlabel("Year")
ax.set_ylabel("Price")
plt.show()

**Figure 7.2** *Line chart for prices of watermelons*

Now we create the line chart for harvest and crop quantities (the variables `"h"` and `"q"`, respectively). First, we plot the crop quantities as a dashed line, then add a solid line for the harvest data. The legend method adds a chart legend when used on our axes object, `ax`. This knows what our lines are called when you specify the names of the lines when calling `ax.plot` with the `label=` keyword argument.

In [None]:
fig, ax = plt.subplots()
ax.plot(df["Year"], df["h"], label="Harvest")
ax.plot(df["Year"], df["q"], label="Crop", linestyle="dashed")
ax.set_xlabel("Year")
ax.set_ylabel("Quantity")
ax.legend()
plt.show()

**Figure 7.3** *Line chart for harvest and crop for watermelons.*

This chapter used the following packages where *sys* is the Python version:

In [None]:
%load_ext watermark
%watermark --iversions