Original data set in CSV.

- Check data types and convert if needed
- Other cleaning

### Note: All prices are represented in dollars per pound

The following code cleans the data and exports for use.

## Important:
Price data is missing from October 2012. As this was the only price point missing, rather than deleting the row, I filled it with the average of the preceeding and following months. We may wish to remove this data or replace it with a null value later.

In [1]:
import pandas as pd

In [None]:
file_path = "fred_beef.csv"
beef_df = pd.read_csv(file_path)
beef_df

In [None]:
beef_df.dtypes

In [None]:
# Rename price column
beef_df.rename(mapper={"APU0000703112" : "Price"}, axis=1, inplace=True)
beef_df.head()

In [None]:
# Convert date to datetime 
beef_df["DATE"] = pd.to_datetime(beef_df["DATE"], format="%Y-%m-%d")

beef_df.head()

In [None]:
# For some reason received ValueError that "." could not be coverted to float
# will remove all periods, convert to float, and then divide by 1000
# Price currently listed to 1/10th of a cent

beef_df["Price"] = beef_df["Price"].str.replace(".", "")
beef_df.head()

In [None]:
beef_df.Price.astype(float)

In [None]:
# Not sure why above error occurs. Why would an empty string cause a problem?

In [None]:
# Check for null rows
beef_df[beef_df["Price"].isna()]

In [None]:
# Try strip method 
beef_df["Price"] = beef_df["Price"].str.strip()
beef_df.head()

In [None]:
# Strip method did not help

In [None]:
# Store values from Price column as a list, drop column, and make a new column with same values
# In this process, found the empty string value
print(beef_df["Price"].values) # Viewing this is how I found it
print("-----")
print(beef_df[beef_df["Price"] == ''])

In [None]:
# Checking original dataset on FRED website
# Confirmed no data on FRED website for October 2012
# Filling with average of preceeding and following months
mid_point = (int(beef_df.Price.values[344]) + int(beef_df.Price.values[346])) / 2
beef_df.at[345, "Price"] = mid_point

In [None]:
beef_df.iloc[345]

In [None]:
# Retry conversion of all column data to float
beef_df.Price = beef_df.Price.astype(float) / 1000
beef_df.head()

In [None]:
# Remove all data before 1990-01-01 and store to new df
cleaned_beef_df = beef_df[beef_df["DATE"] >= "1990-01-01"]
cleaned_beef_df

In [None]:
# Store new DF as new csv file
output_path = "FRED_beef_cleaned.csv"
cleaned_beef_df.to_csv(output_path)

## Add Percent Change column to data
Import from most recent dataset

In [2]:
# import file
beef_df = pd.read_csv("../Edited Data/Output/FRED_beef_cleaned.csv")
beef_df.head()

Unnamed: 0,date_time,Beef $/LB
0,1990-01-01,1.557
1,1990-02-01,1.572
2,1990-03-01,1.571
3,1990-04-01,1.593
4,1990-05-01,1.577


In [3]:
# Calculate and add pct_change column
beef_df["Beef_Pct_Change"] = beef_df["Beef $/LB"].pct_change()
beef_df.head(20)

Unnamed: 0,date_time,Beef $/LB,Beef_Pct_Change
0,1990-01-01,1.557,
1,1990-02-01,1.572,0.009634
2,1990-03-01,1.571,-0.000636
3,1990-04-01,1.593,0.014004
4,1990-05-01,1.577,-0.010044
5,1990-06-01,1.593,0.010146
6,1990-07-01,1.581,-0.007533
7,1990-08-01,1.576,-0.003163
8,1990-09-01,1.594,0.011421
9,1990-10-01,1.582,-0.007528


In [4]:
# save csv file
output_path = "../Edited Data/Output/FRED_beef_cleaned.csv"
beef_df.to_csv(output_path, index=False)