Original data set in CSV.

- Check data types and convert if needed
- Other cleaning

### Note: All prices are represented in dollars per pound

The following code cleans the data and exports for use.

## Important:
Price data is missing from October 2012. As this was the only price point missing, rather than deleting the row, I filled it with the average of the preceeding and following months. We may wish to remove this data or replace it with a null value later.

In [1]:
import pandas as pd

In [2]:
file_path = "C:/Users/jhillman/OneDrive/Desktop/Data Analytics Bootcamp/Three_Meals/Raw Data/fred_beef.csv"
beef_df = pd.read_csv(file_path)
beef_df

Unnamed: 0,DATE,APU0000703112
0,1984-01-01,1.290
1,1984-02-01,1.340
2,1984-03-01,1.308
3,1984-04-01,1.331
4,1984-05-01,1.301
...,...,...
465,2022-10-01,4.836
466,2022-11-01,4.853
467,2022-12-01,4.800
468,2023-01-01,4.791


In [3]:
beef_df.dtypes

DATE             object
APU0000703112    object
dtype: object

In [4]:
# Rename price column
beef_df.rename(mapper={"APU0000703112" : "Beef $/LB"}, axis=1, inplace=True)
beef_df.head()

Unnamed: 0,DATE,Beef $/LB
0,1984-01-01,1.29
1,1984-02-01,1.34
2,1984-03-01,1.308
3,1984-04-01,1.331
4,1984-05-01,1.301


In [5]:
# Convert date to datetime 
beef_df["DATE"] = pd.to_datetime(beef_df["DATE"], format="%Y-%m-%d")

beef_df.head()

Unnamed: 0,DATE,Beef $/LB
0,1984-01-01,1.29
1,1984-02-01,1.34
2,1984-03-01,1.308
3,1984-04-01,1.331
4,1984-05-01,1.301


In [6]:
# For some reason received ValueError that "." could not be coverted to float
# will remove all periods, convert to float, and then divide by 1000
# Price currently listed to 1/10th of a cent

beef_df["Beef $/LB"] = beef_df["Beef $/LB"].str.replace(".", "")
beef_df.head()

  """


Unnamed: 0,DATE,Beef $/LB
0,1984-01-01,1290
1,1984-02-01,1340
2,1984-03-01,1308
3,1984-04-01,1331
4,1984-05-01,1301


In [7]:
beef_df["Beef $/LB"].astype(float)

ValueError: could not convert string to float: 

In [8]:
# Not sure why above error occurs. Why would an empty string cause a problem?

In [9]:
# Check for null rows
beef_df[beef_df['Beef $/LB'].isna()]

Unnamed: 0,DATE,Beef $/LB


In [10]:
# Try strip method 
beef_df['Beef $/LB'] = beef_df['Beef $/LB'].str.strip()
beef_df.head()

Unnamed: 0,DATE,Beef $/LB
0,1984-01-01,1290
1,1984-02-01,1340
2,1984-03-01,1308
3,1984-04-01,1331
4,1984-05-01,1301


In [11]:
# Strip method did not help

In [12]:
# Store values from Price column as a list, drop column, and make a new column with same values
# In this process, found the empty string value
print(beef_df['Beef $/LB'].values) # Viewing this is how I found it
print("-----")
print(beef_df[beef_df['Beef $/LB'] == ''])

['1290' '1340' '1308' '1331' '1301' '1269' '1249' '1280' '1268' '1246'
 '1266' '1298' '1284' '1283' '1282' '1267' '1213' '1205' '1200' '1207'
 '1205' '1189' '1238' '1277' '1275' '1263' '1267' '1219' '1187' '1165'
 '1189' '1225' '1234' '1225' '1278' '1257' '1300' '1271' '1278' '1294'
 '1318' '1301' '1313' '1322' '1323' '1326' '1351' '1322' '1314' '1315'
 '1342' '1336' '1365' '1388' '1373' '1370' '1372' '1394' '1409' '1399'
 '1397' '1366' '1434' '1424' '1437' '1436' '1438' '1447' '1458' '1449'
 '1491' '1501' '1557' '1572' '1571' '1593' '1577' '1593' '1581' '1576'
 '1594' '1582' '1622' '1630' '1647' '1632' '1610' '1611' '1624' '1604'
 '1586' '1578' '1553' '1552' '1567' '1577' '1601' '1587' '1545' '1563'
 '1545' '1535' '1487' '1533' '1521' '1546' '1535' '1500' '1565' '1562'
 '1562' '1588' '1560' '1562' '1577' '1573' '1551' '1556' '1600' '1574'
 '1561' '1555' '1576' '1482' '1489' '1511' '1456' '1460' '1463' '1416'
 '1376' '1380' '1381' '1426' '1392' '1365' '1322' '1333' '1365' '1328'
 '1376

In [13]:
# Checking original dataset on FRED website
# Confirmed no data on FRED website for October 2012
# Filling with average of preceeding and following months
mid_point = (int(beef_df['Beef $/LB'].values[344]) + int(beef_df['Beef $/LB'].values[346])) / 2
beef_df.at[345, 'Beef $/LB'] = mid_point

In [14]:
beef_df.iloc[345]

DATE         2012-10-01 00:00:00
Beef $/LB                 3099.5
Name: 345, dtype: object

In [15]:
# Retry conversion of all column data to float
beef_df['Beef $/LB'] = beef_df['Beef $/LB'].astype(float) / 1000
beef_df.head()

Unnamed: 0,DATE,Beef $/LB
0,1984-01-01,1.29
1,1984-02-01,1.34
2,1984-03-01,1.308
3,1984-04-01,1.331
4,1984-05-01,1.301


In [16]:
# Remove all data before 1990-01-01 and store to new df
cleaned_beef_df = beef_df[beef_df["DATE"] >= "1990-01-01"]
cleaned_beef_df

Unnamed: 0,DATE,Beef $/LB
72,1990-01-01,1.557
73,1990-02-01,1.572
74,1990-03-01,1.571
75,1990-04-01,1.593
76,1990-05-01,1.577
...,...,...
465,2022-10-01,4.836
466,2022-11-01,4.853
467,2022-12-01,4.800
468,2023-01-01,4.791


## Add Percent Change column to data
Import from most recent dataset

In [17]:
# Calculate and add pct_change column
cleaned_beef_df["Beef_Pct_Change (Monthly)"] = beef_df["Beef $/LB"].pct_change(periods=1)
cleaned_beef_df.head(20)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,DATE,Beef $/LB,Beef_Pct_Change (Monthly)
72,1990-01-01,1.557,0.037308
73,1990-02-01,1.572,0.009634
74,1990-03-01,1.571,-0.000636
75,1990-04-01,1.593,0.014004
76,1990-05-01,1.577,-0.010044
77,1990-06-01,1.593,0.010146
78,1990-07-01,1.581,-0.007533
79,1990-08-01,1.576,-0.003163
80,1990-09-01,1.594,0.011421
81,1990-10-01,1.582,-0.007528


In [19]:
cleaned_beef_df["Beef_Pct_Change (Monthly)"] = cleaned_beef_df["Beef_Pct_Change (Monthly)"]*100
cleaned_beef_df.head(20)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,DATE,Beef $/LB,Beef_Pct_Change (Monthly)
72,1990-01-01,1.557,3.730846
73,1990-02-01,1.572,0.963391
74,1990-03-01,1.571,-0.063613
75,1990-04-01,1.593,1.400382
76,1990-05-01,1.577,-1.004394
77,1990-06-01,1.593,1.014585
78,1990-07-01,1.581,-0.753296
79,1990-08-01,1.576,-0.316256
80,1990-09-01,1.594,1.142132
81,1990-10-01,1.582,-0.752823


In [21]:
# save csv file
output_path = "C:/Users/jhillman/OneDrive/Desktop/Data Analytics Bootcamp/Three_Meals/Edited Data/Output/FRED_beef_cleaned.csv"
cleaned_beef_df.to_csv(output_path, index=False)