# Main Jupyter Notebook

### Problem Statement
How did World War 1 and the Civil War affect commodity prices in America?

I will be looking at data in the Allen-Unger dataseset to see how prices rose and fell in WW1 and the Civil War. The analysis will focus on the state of Vermont, as there is a robust and diverse spread of data.

* By what percentage did a commodity rise during each war compared to the previous average, and by how much did it fall afterwards?
    * I will be comparing the wartime peak and the wartime average.
    * If a commodity stayed stable, is there a historical reason?
    * Do the prices recover to normal after the war?
* Which commodities were most affected be each war?
* How did the effects compare across each war? For example, was cloth more affected by the Civil War or World War 1?
* How volatile was the price before/during/after each war?

Also being taken into consideration is the period directly after the first world war that includes the roaring twenties, the great depression, and the beginning of World War 2.

### Commodities Considered
Cloth, Tea, Axes, Flour, Coffee, Bread, Codfish, Sugar


In [10]:
# Library and custom function imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from helper.helper import miss_years
from helper.helper import df_plot

### Data Exploration

In [11]:
df_raw = pd.read_csv("data_raw/all_commodities.csv")
df_raw.head()

Unnamed: 0,Item Year,Original Value,Standard Value,Original Currency,Standard Currency,Orignal Measure,Standard Measure,Sources,Notes,Location,Commodity,Variety
0,1570,7.65,0.703421,Tours Livre,Silver,Aix Charge,Litre,(Rene Baehrel) (Une croissance: La Basse Prove...,(ECPdb),Aix,Wheat,
1,1571,10.65,0.979272,Tours Livre,Silver,Aix Charge,Litre,(Rene Baehrel) (Une croissance: La Basse Prove...,(ECPdb),Aix,Wheat,
2,1572,8.9,0.818359,Tours Livre,Silver,Aix Charge,Litre,(Rene Baehrel) (Une croissance: La Basse Prove...,(ECPdb),Aix,Wheat,
3,1573,7.75,0.684111,Tours Livre,Silver,Aix Charge,Litre,(Rene Baehrel) (Une croissance: La Basse Prove...,(ECPdb),Aix,Wheat,
4,1574,7.8,0.662043,Tours Livre,Silver,Aix Charge,Litre,(Rene Baehrel) (Une croissance: La Basse Prove...,(ECPdb),Aix,Wheat,


In [59]:
# Let's make some basic lists and dataframes that we will reference

com_ls = ['Axe', 'Bread', 'Cloth', 'Codfish', 'Coffee', 'Flour', 'Sugar', 'Tea']

# This dataframe will filter for Vermont and the commodities we are analyzing
df_v = df_raw[(df_raw["Location"] == "Vermont") & (df_raw["Commodity"].isin(com_ls))]

# Sugar has two varieties, one being a NaN and the other being Maple. Maple data cuts of at 1909, so we'll exclude it.
df_v.drop(df_v[(df_v['Commodity'] == 'Sugar') & (df_v['Variety'] == 'Maple')].index, inplace=True)

In [60]:
# Let's find the min and max years for our commodities, and then see if they have any missing years that will need to be interpolated

print(f"Minimum Year:\n{df_v.groupby(['Commodity'])['Item Year'].min()}\n")
print(f"Maximum Year:\n{df_v.groupby(['Commodity'])['Item Year'].max()}\n")

for x in com_ls:
    yrs = miss_years(df_v[df_v["Commodity"] == x])
    print(f"{x}: {yrs}")

Minimum Year:
Commodity
Axe        1790
Bread      1893
Cloth      1804
Codfish    1790
Coffee     1791
Flour      1828
Sugar      1790
Tea        1790
Name: Item Year, dtype: int64

Maximum Year:
Commodity
Axe        1940
Bread      1940
Cloth      1940
Codfish    1938
Coffee     1940
Flour      1940
Sugar      1940
Tea        1940
Name: Item Year, dtype: int64

Axe: [1791, 1792, 1795, 1797, 1798]
Bread: none
Cloth: none
Codfish: [1797, 1798, 1799, 1800, 1801, 1812]
Coffee: [1793, 1797, 1799, 1812]
Flour: none
Sugar: [1800]
Tea: none


We can see above that our minimum years span well before the Civil War, except for Bread. There are some missing years for some of our commodities, but if we restrict our analysis to 1813 and beyond, we will have a full span of data. If we further restrict our analysis to 1828 and beyond, we will be comparing similar time frames for all commadities except Bread. 

All of the commodities stretch to 1940 except for Codfish, which ends at 1938, so we will restrict our analysis to 1938. We will analyze our data from 1828 to 1938, exluding Bread which will start in 1893 and only have data for World War 1.

In [68]:
# Let's drop our unwanted data and check that our data has been formatted how we intend
df_v = df_v.drop(df_v[(df_v['Item Year'] <= 1827) | (df_v['Item Year'] >= 1939)].index)

print(f"Minimum Year:\n{df_v.groupby(['Commodity'])['Item Year'].min()}\n")
print(f"Maximum Year:\n{df_v.groupby(['Commodity'])['Item Year'].max()}\n")

for x in com_ls:
    yrs = miss_years(df_v[df_v["Commodity"] == x])
    print(f"{x}: {yrs}")

Minimum Year:
Commodity
Axe        1828
Bread      1893
Cloth      1828
Codfish    1828
Coffee     1828
Flour      1828
Sugar      1828
Tea        1828
Name: Item Year, dtype: int64

Maximum Year:
Commodity
Axe        1938
Bread      1938
Cloth      1938
Codfish    1938
Coffee     1938
Flour      1938
Sugar      1938
Tea        1938
Name: Item Year, dtype: int64

Axe: none
Bread: none
Cloth: none
Codfish: none
Coffee: none
Flour: none
Sugar: none
Tea: none
