# Commodity Prices in USA

* [Introduction](#int)
* [Questions](#q)
    1. [Is there a correlation between gold prices and various commodity prices?](#q1)
    2. [Is there a correlation between gold prices and metal prices?](#q2)
    3. [How did food prices change over the years?](#q3)
    4. [What is the cheapest commodity in total?](#q4)
    5. [What is the most expensive commodity total?](#q5)
    5. [What was the most expensive commodity each year?](#q6)


<h3 id="int">Introduction</h3>

* The dataset that I used is monthly prices of commodities in USA between 1992 and 2016.
* Indexes are calculated with their prices in 2005 as 100.

In [None]:
# packages
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
from matplotlib import dates

In [None]:
# load data
golddf = pd.read_csv("../input/gold-prices/monthly_csv.csv")
commdf = pd.read_csv("../input/usa-commodity-prices/commodity-prices-2016.csv")
commdf

In [None]:
# clean data
commdf["Date"] = pd.to_datetime(commdf["Date"])
golddf['Date'] = golddf['Date'].astype(str) + '-01' 
golddf["Date"] = pd.to_datetime(golddf["Date"])
# use only years from 1992 to 2016
commdf = commdf.loc[(commdf['Date'] > "1991-12-01") & (commdf['Date'] <= "2016-02-02")]
golddf = golddf.loc[(golddf['Date'] > "1991-12-01") & (golddf['Date'] <= "2016-02-02")]
df = commdf.loc[: , "Coffee Other Mild Arabicas":"Coffee Robusta"]
commdf['Coffe avg'] = df.mean(axis=1)
df = commdf.iloc[: , 31:34]
commdf['Natural gas avg'] = df.mean(axis=1)
df = commdf.iloc[: , 53:56]
commdf['Sugar avg'] = df.mean(axis=1)
df = commdf.iloc[: , 28:30]
commdf['Logs avg'] = df.mean(axis=1)
df = commdf.iloc[: , 61:63]
commdf['Wool avg'] = df.mean(axis=1)
prices = commdf.merge(golddf)
prices = prices.iloc[:, np.r_[0:16, 19:28, 30, 34, 39:47, 49:53, 56:61, 63:70]]
prices = prices.rename(columns={'China import Iron Ore Fines 62% FE spot': 'Iron ore', "Price": "Gold"})
prices

In [None]:
# get 2005 prices for creating index
prices2005 = prices.loc[(prices['Date'] > "2004-12-01") & (prices['Date'] <= "2005-12-02")]
prices2005

In [None]:
# gold index is calculated
goldbase = prices2005["Gold"].mean()
prices["Gold index"] = prices["Gold"]*100/goldbase
prices

<h3 id="q">Questions</h3>
<h4 id="q1">1. Is there a correlation between gold prices and various commodity prices?</h4>

In [None]:
fig, ax = plt.subplots(figsize=(10, 6))
plt.plot(prices["Date"], prices["Gold index"], label = "Gold index")
plt.plot(prices["Date"], prices["Food Price Index"], label = "Food Price Index")
plt.plot(prices["Date"], prices["Fuel Energy Index"], label = "Fuel Energy Index")
plt.plot(prices["Date"], prices["Industrial Inputs Price Index"], label = "Industrial Inputs Price Index")
plt.plot(prices["Date"], prices["Metals Price Index"], label = "Metals Price Index")
plt.legend()
plt.show()

* Most similar index is metals price index
* Gold prices are rising more than any commodity after 2008.

<h4 id="q2">2. Is there a correlation between gold prices and metal prices?</h4>

In [None]:
# other metal indexes are calculated
prices["Aluminum index"] = prices["Aluminum"]*100/(prices2005["Aluminum"].mean())
prices["Copper index"] = prices["Copper"]*100/(prices2005["Copper"].mean())
prices["Iron ore index"] = prices["Iron ore"]*100/(prices2005["Iron ore"].mean())
prices["Lead index"] = prices["Lead"]*100/(prices2005["Lead"].mean())
prices["Nickel index"] = prices["Nickel"]*100/(prices2005["Nickel"].mean())
prices["Tin index"] = prices["Tin"]*100/(prices2005["Tin"].mean())
prices["Zinc index"] = prices["Zinc"]*100/(prices2005["Zinc"].mean())
prices

In [None]:
fig, ax = plt.subplots(figsize=(10, 6))
plt.plot(prices["Date"], prices["Gold index"], label = "Gold index")
plt.plot(prices["Date"], prices["Aluminum index"], label = "Aluminum index")
plt.plot(prices["Date"], prices["Copper index"], label = "Copper index")
plt.plot(prices["Date"], prices["Iron ore index"], label = "Iron ore index")
plt.plot(prices["Date"], prices["Lead index"], label = "Lead index")
plt.plot(prices["Date"], prices["Nickel index"], label = "Nickel index")
plt.plot(prices["Date"], prices["Tin index"], label = "Tin index")
plt.plot(prices["Date"], prices["Zinc index"], label = "Zinc index")

plt.legend()
plt.show()

* Most correlation is between gold and tin. 
* Iron actually rises even more then gold after 2009.

<h4 id="q3">3. How did food prices change over the years?</h4>

In [None]:
fig, ax = plt.subplots(figsize=(8, 6))
plt.subplot(2, 3, 1)
plt.gca().set_title("1995-01-01")
plt.pie(prices[["Beef", "Lamb", "Poultry chicken", "Rice", "Tea", "Wheat", "Sugar avg"]].loc[prices['Date'] == "1995-01-01"].T.iloc[:,0]
, autopct='%1.1f%%')
plt.subplot(2, 3, 2)
plt.gca().set_title("2000-01-01")
plt.pie(prices[["Beef", "Lamb", "Poultry chicken", "Rice", "Tea", "Wheat", "Sugar avg"]].loc[prices['Date'] == "2000-01-01"].T.iloc[:,0]
, autopct='%1.1f%%')
plt.subplot(2, 3, 4)
plt.gca().set_title("2005-01-01")
plt.pie(prices[["Beef", "Lamb", "Poultry chicken", "Rice", "Tea", "Wheat", "Sugar avg"]].loc[prices['Date'] == "2005-01-01"].T.iloc[:,0]
, autopct='%1.1f%%')
plt.subplot(2, 3, 5)
plt.gca().set_title("2010-01-01")
plt.pie(prices[["Beef", "Lamb", "Poultry chicken", "Rice", "Tea", "Wheat", "Sugar avg"]].loc[prices['Date'] == "2010-01-01"].T.iloc[:,0]
, autopct='%1.1f%%')
plt.subplot(2, 3, 6)
plt.gca().set_title("2015-01-01")
plt.pie(prices[["Beef", "Lamb", "Poultry chicken", "Rice", "Tea", "Wheat", "Sugar avg"]].loc[prices['Date'] == "2015-01-01"].T.iloc[:,0]
, autopct='%1.1f%%')
plt.figlegend(labels=("Beef", "Lamb", "Poultry chicken", "Rice", "Tea", "Wheat", "Sugar avg"))
plt.show()

* Food commodities and their proportions every 5 years.
* It doesn’t chance much.
* Only in 2010 the rice prices has risen a bit.
* Also beef and wheat is more expensive in recent years.

<h4 id="q4">4. What is the cheapest commodity in total? </h4>

In [None]:
tsum = prices.sum(numeric_only=True).sort_values()[:5]
lsum = prices.sum(numeric_only=True).sort_values(ascending=False)[:5]

tsum = {"Name":list(tsum.index), "Sum":list(tsum)}
tsum = pd.DataFrame(tsum)

lsum = {"Name":list(lsum.index), "Sum":list(lsum)}
lsum = pd.DataFrame(lsum)

plt.bar(tsum["Name"], tsum["Sum"])

<h4 id="q5">5. What is the most expensive commodity total? </h4>

In [None]:
plt.bar(lsum["Name"], lsum["Sum"])
plt.ticklabel_format(style='plain', axis='y')
plt.show()

<h4 id="q6">6. What was the most expensive commodity each year? </h4>

In [None]:
# calculating each year
yearlyp = prices.groupby(prices.Date.dt.year).transform('mean').drop_duplicates()
pd.set_option('display.max_rows', None)
for i in range(1,59):
    yearlyp.iloc[:,i] = pd.to_numeric(yearlyp.iloc[:,i])
# most expensive every year
yearlyp.iloc[:,1:].idxmax(axis=1)
yearlyp

In [None]:
# as a graph
pd.set_option('display.max_rows', 6)
yearlyp = yearlyp[["Date","Tin","Nickel"]]
fig, ax = plt.subplots(figsize=(15, 6))
plt.plot(yearlyp["Date"], yearlyp[["Tin","Nickel"]], 'o')
ax.xaxis.set_major_formatter(dates.DateFormatter('%Y'))
ax.xaxis.set_major_locator(dates.DayLocator(interval=365))
plt.show()

* There is an interesting spike in 2007 in nickel.
* In resent years tin is more expensive.