# Economic Network Graphs Data Exploration & Ramblings

@author: Mark Oussoren

**This is made before creating the dataset so warnings if it doesn't run for you. This is solely for me to understand what to do with the data.**

I would like to straighten out exactly what the goal of this project would be. We would like to develop network models and understand the stability of these systems under supply-chain shocks and the dynamics of the system thereafter. That is, if an industry like oil experiences a shortage, we would be interested in understanding the linkage or input/output changes if any exist. 

There is convergent consensus among scientists that many social, economic and financial phenomena can be described by a network of agents and their interactions. With that said, we believe alongside the literature that graph theory can be employed to help us understand the complicated dynamics and interactions of various commodities and sectors within the US. 

Insert literature review here...

In [1]:
import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotnine
import seaborn as sns

The **Supply Table** describes how goods and services become available in an economy during a certain period of time. Products are either produced in the domestic industry or imported.

The **Use Table** shows how goods and services are used in the economy during a certain period of time.
Products can be used either as intermediate consumption or as final use.

The **Input-Output Table** are derived analytically from the supply and use tables by positing specific assumptions. Input-Output Tables in the product-by-product version show as intermediate consumption those product inputs which were necessary for manufacturing the entire supply of a particular product obtained through domestic production. Along similar lines, those amounts of value added components are shown which had to be expended in order to manufacture the entire domestic supply of a particular product.

In [95]:
io_table = pd.read_excel('~/Economic_Networks/data/raw/AllTablesIO/IOUse_After_Redefinitions_PRO_1997-2020_Summary.xlsx', sheet_name=None)
supply = pd.read_excel('~/Economic_Networks/data/raw/AllTablesSUP/Supply_1997-2020_SUM.xlsx', sheet_name=None)
io_late = pd.read_excel('~/Economic_Networks/data/raw/1947-1997-Historical/IOUse_Before_Redefinitions_PRO_1963-1996_Summary.xlsx', sheet_name=None)
io_early = pd.read_excel('~/Economic_Networks/data/raw/1947-1997-Historical/IOUse_Before_Redefinitions_PRO_1947-1962_Summary.xlsx', sheet_name=None)
make_late = pd.read_excel('~/Economic_Networks/data/raw/1947-1997-Historical/IOMake_Before_Redefinitions_1963-1996_Summary.xlsx', sheet_name=None)
make_early = pd.read_excel('~/Economic_Networks/data/raw/1947-1997-Historical/IOMake_Before_Redefinitions_1947-1962_Summary.xlsx', sheet_name=None)

In [96]:
make_early['1950']

Unnamed: 0,"The Make of Commodities by Industries, Before Redefinitions",Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 42,Unnamed: 43,Unnamed: 44,Unnamed: 45,Unnamed: 46,Unnamed: 47,Unnamed: 48,Unnamed: 49,Unnamed: 50,Unnamed: 51
0,(Millions of dollars),,,,,,,,,,...,,,,,,,,,,
1,Bureau of Economic Analysis,,,,,,,,,,...,,,,,,,,,,
2,1950,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,Industry / Commodity,,Farms,"Forestry, fishing, and related activities",Oil and gas extraction,"Mining, except oil and gas",Support activities for mining,Utilities,Construction,Wood products,...,Accommodation,Food services and drinking places,"Other services, except government",Federal general government,Federal government enterprises,State and local general government,State and local government enterprises,"Scrap, used and secondhand goods",Noncomparable imports and rest-of-the-world ad...,Total Industry Output
5,Code,Industry Description,111CA,113FF,211,212,213,22,23,321,...,721,722,81,GFG,GFE,GSLG,GSLE,Used,Other,T008
6,111CA,Farms,28782,587,...,...,...,...,...,226,...,...,...,...,...,...,...,...,...,...,31221
7,113FF,"Forestry, fishing, and related activities",...,2173,...,...,...,...,1,...,...,...,...,...,...,...,...,...,...,...,2175
8,211,Oil and gas extraction,...,...,5082,...,14,...,...,...,...,...,...,...,...,...,...,...,...,...,5241
9,212,"Mining, except oil and gas",...,...,...,6215,11,...,205,...,...,...,...,...,...,...,...,...,...,...,6503


In [67]:
# input feature names and length
def print_names(df, year, make):
    if make:
        print(list(df[year].iloc[4]))
    else:
        print(list(df[year].iloc[5]))

print_names(io_table, '2020', False)

['IOCode', 'Name', 'Farms', 'Forestry, fishing, and related activities', 'Oil and gas extraction', 'Mining, except oil and gas', 'Support activities for mining', 'Utilities', 'Construction', 'Wood products', 'Nonmetallic mineral products', 'Primary metals', 'Fabricated metal products', 'Machinery', 'Computer and electronic products', 'Electrical equipment, appliances, and components', 'Motor vehicles, bodies and trailers, and parts', 'Other transportation equipment', 'Furniture and related products', 'Miscellaneous manufacturing', 'Food and beverage and tobacco products', 'Textile mills and textile product mills', 'Apparel and leather and allied products', 'Paper products', 'Printing and related support activities', 'Petroleum and coal products', 'Chemical products', 'Plastics and rubber products', 'Wholesale trade', 'Motor vehicle and parts dealers', 'Food and beverage stores', 'General merchandise stores', 'Other retail', 'Air transportation', 'Rail transportation', 'Water transporta

In [78]:
# output feature names and length
print('IO Current Dim:', io_table['2020'].shape)
print('IO Late Dim:', io_late['1980'].shape)
print('IO Early Dim:', io_early['1950'].shape)
print('Supply Dim:', supply['2020'].shape)
print('Make Late Dim:', make_late['1993'].shape)
print('Make Early Dim:', make_early['1950'].shape)

IO Current Dim: (89, 96)
IO Late Dim: (76, 86)
IO Early Dim: (58, 65)
Supply Dim: (80, 85)
Make Late Dim: (72, 70)
Make Early Dim: (54, 52)


This data spans from 1997 to 2020 collected annually. This is an ugly dataset. Let's first make the tables bearable to look at by renaming the columns and dropping rows that are blank and unrelated to the cause.

In [97]:
def rename_columns(dict, is_io, is_historical):
    """
    :param: dict: dictionary of pandas dataframes
    :param: is_io: boolean - if true, io data is fed in
    :param: is_recent - should be true if data comes after 1962
    # outputs dictionary of dataframes with columns renamed and empty rows removed
    """
    for key in list(dict.keys()):        
        if is_historical:
            dict[key].columns = list(dict[key].iloc[4, :])
            if is_io:
                dict[key] = dict[key].iloc[6:, 1:]
            else:
                dict[key] = dict[key].iloc[6:, :]
        else:
            dict[key].columns = list(dict[key].iloc[5, :])
            if is_io:
                dict[key].columns = list(dict[key].iloc[5, :])
                # drop those first 5 columns as they have no information
                dict[key] = dict[key].iloc[6:-4, 1:]
            else:
                dict[key] = dict[key].iloc[6:, 1:]
    return dict

io_table = rename_columns(io_table, is_io=True, is_historical=False)
io_early = rename_columns(io_early, is_io=True, is_historical=True)
io_late = rename_columns(io_late, is_io=True, is_historical=True)
supply = rename_columns(supply, is_io=False, is_historical=False)
make_early = rename_columns(make_early, is_io=True, is_historical=True)
make_late = rename_columns(make_late, is_io=True, is_historical=True)

Let's see if there are any missing values. It appears as though there are none for 2020 despite the nonexistence of some values in the actual BEA table. If we actually check the exact entries below, we find that NA's are essentially mapped to the values '...'. We shall replace '...' with NA for clarity

In [104]:
True in list(io_table['2020'].isna().any())

False

In [105]:
io_table['2000'].iloc[0,12]

'...'

We can get a sense of the values in each of the columns for say the year 2020.

In [106]:
for col in io_table['2020'].columns.tolist():
    print(io_table['2020'][col].value_counts().sort_values(ascending=False)[:3])

Farms                                      1
Textile mills and textile product mills    1
Oil and gas extraction                     1
Name: Name, dtype: int64
...    18
8       2
55      2
Name: Farms, dtype: int64
...    19
0       7
1       6
Name: Forestry, fishing, and related activities, dtype: int64
...    23
1       2
30      2
Name: Oil and gas extraction, dtype: int64
...    16
0       2
2       2
Name: Mining, except oil and gas, dtype: int64
...    21
115     2
19      2
Name: Support activities for mining, dtype: int64
...      20
16908     1
2         1
Name: Utilities, dtype: int64
...      18
17089     1
943       1
Name: Construction, dtype: int64
...    16
32      2
86      2
Name: Wood products, dtype: int64
...    16
0       3
201     2
Name: Nonmetallic mineral products, dtype: int64
...     19
17       1
9938     1
Name: Primary metals, dtype: int64
...    18
418     2
1       2
Name: Fabricated metal products, dtype: int64
...     16
0        1
2017     1
Name: M

In [107]:
for col in supply['2020'].columns.tolist():
    print(supply['2020'][col].value_counts().sort_values(ascending=False)[:3])

Farms                          1
Miscellaneous manufacturing    1
Mining, except oil and gas     1
Name: Name, dtype: int64
...       69
440636     1
4794       1
Name: Farms, dtype: int64
...      70
18        1
54970     1
Name: Forestry, fishing, and related activities, dtype: int64
...       62
196026     1
74         1
Name: Oil and gas extraction, dtype: int64
...      63
0         2
75375     1
Name: Mining, except oil and gas, dtype: int64
...    64
38      1
64      1
Name: Support activities for mining, dtype: int64
...       65
104        1
494480     1
Name: Utilities, dtype: int64
...        70
1852389     1
482         1
Name: Construction, dtype: int64
...    59
0       2
164     1
Name: Wood products, dtype: int64
...    50
0       3
38      2
Name: Nonmetallic mineral products, dtype: int64
...    56
0       2
1       2
Name: Primary metals, dtype: int64
...     49
676      1
1232     1
Name: Fabricated metal products, dtype: int64
...    51
317     1
661     1
Name: M

In the input data, there were much less missing values (50 on average for the output data vs. 20 for the input data). Why? Not sure... What about total counts of missingness and missingness per column and row? 

In [108]:
# check percentage of '...' in the dataframe
def check_missing(df, year):
    print(f"Num Missing: {df[year].stack().value_counts()['...']}")
    print(f"Num Total: {len(df[year].index.tolist()) * len(df[year].columns.tolist())}")
    print(100 * df[year].stack().value_counts()['...'] / (len(df[year].index.tolist()) * len(df[year].columns.tolist())))
    
print('IO table')
check_missing(io_table, '2020')

print('\n IO table early')
check_missing(io_early, '1950')

print('\n IO table late')
check_missing(io_late, '1980')

print('\n Supply table')
check_missing(supply, '2020')

print('\n Make early table')
check_missing(make_early, '1950')

print('\n Make late table')
check_missing(make_late, '1980')

IO table
Num Missing: 2463
Num Total: 7505
32.81812125249834

 IO table early
Num Missing: 762
Num Total: 3328
22.896634615384617

 IO table late
Num Missing: 1516
Num Total: 5950
25.478991596638654

 Supply table
Num Missing: 4590
Num Total: 6216
73.84169884169884

 Make early table
Num Missing: 1904
Num Total: 2448
77.77777777777777

 Make late table
Num Missing: 3774
Num Total: 4554
82.87220026350461


In [109]:
# check missingness by column in output
def col_missingness(df, year):
    col_missingness = {}
    for col in df[year].columns.tolist():
        try:
            col_missingness[col] = 100 * df[year][col].value_counts()['...'] / len(df[year].index.tolist())
        except:
            col_missingness[col] = 0
    print(col_missingness)
    print('\n')
    
print('IO table')
col_missingness(io_table, '2020')

print('\n IO table early')
col_missingness(io_early, '1950')

print('\n IO table late')
col_missingness(io_late, '1980')

print('\n Supply table')
col_missingness(supply, '2020')

print('\n Make early table')
col_missingness(make_early, '1950')

print('\n Make late table')
col_missingness(make_late, '1980')

IO table
{'Name': 0, 'Farms': 22.78481012658228, 'Forestry, fishing, and related activities': 24.050632911392405, 'Oil and gas extraction': 29.11392405063291, 'Mining, except oil and gas': 20.253164556962027, 'Support activities for mining': 26.582278481012658, 'Utilities': 25.31645569620253, 'Construction': 22.78481012658228, 'Wood products': 20.253164556962027, 'Nonmetallic mineral products': 20.253164556962027, 'Primary metals': 24.050632911392405, 'Fabricated metal products': 22.78481012658228, 'Machinery': 20.253164556962027, 'Computer and electronic products': 26.582278481012658, 'Electrical equipment, appliances, and components': 24.050632911392405, 'Motor vehicles, bodies and trailers, and parts': 21.518987341772153, 'Other transportation equipment': 26.582278481012658, 'Furniture and related products': 25.31645569620253, 'Miscellaneous manufacturing': 18.9873417721519, 'Food and beverage and tobacco products': 20.253164556962027, 'Textile mills and textile product mills': 22.7

Nothing appears to be wrong by intuition - I think some of these are just more sparse and for that reason, I think they were rounded to zero and instead of zero, the BEA denoted this as '...'. To test this claim, I try asserting  the total commodity output is a sum of the industry outputs/inputs given in the columns prior. We first replace this with np.nan for convenience.

In [110]:
def replace_dots(dict):
    """
    Replace '...' to np.NaN
    :param: dict - dictionary of dataframes to replace values with
    :return: dict with np.NaNs instead of '...'
    """
    return {k: v.replace('...', np.nan) for k, v in dict.items()}

io_table = replace_dots(io_table)
io_early = replace_dots(io_early)
io_late = replace_dots(io_late)
supply = replace_dots(supply)
make_early = replace_dots(make_early)
make_late = replace_dots(make_late)

Now we would like to see what these NaNs should be replaced by. Notice that one of the columns is labeld as Total Commodity Output. This column is the sum of the previous columns - to test this, we can subtract the sum of the columns before and difference with this column. Interestingly enough, the max difference between these columns was \$3 million dollars which is a very small deviation from the total commodity output. This implies that it would be sensible for us to replace NaNs with 0s as they do not influence the sum.

In [213]:
# look at maximum absolute difference between total product supply (purchase prices) and the columns making up purchase price
max_diff_year = {}
for year in supply.keys():
    max_diff_year[year] = max(abs(supply[year]['Total Commodity Output'] - supply[year].apply(lambda x: x[1:-12].sum(), axis=1)))
max_diff_year

{'1997': 4.0,
 '1998': 2.0,
 '1999': 4.0,
 '2000': 4.0,
 '2001': 4.0,
 '2002': 3.0,
 '2003': 3.0,
 '2004': 3.0,
 '2005': 4.0,
 '2006': 5.0,
 '2007': 2.0,
 '2008': 3.0,
 '2009': 4.0,
 '2010': 3.0,
 '2011': 3.0,
 '2012': 3.0,
 '2013': 2.0,
 '2014': 5.0,
 '2015': 4.0,
 '2016': 3.0,
 '2017': 4.0,
 '2018': 3.0,
 '2019': 4.0,
 '2020': 3.0}

In [211]:
# look at maximum absolute difference between the total commodity output and the sum of individual industry outputs
max_diff_year = {}
for year in io_table.keys():
    max_diff_year[year] = max(abs(io_table[year]['Total Commodity Output'] - (io_table[year].apply(lambda x: x[1:-2].sum(), axis=1) - io_table[year]['Total Intermediate'])))
max_diff_year

{'1997': 8.0,
 '1998': 7.0,
 '1999': 6.0,
 '2000': 5.0,
 '2001': 7.0,
 '2002': 6.0,
 '2003': 6.0,
 '2004': 6.0,
 '2005': 6.0,
 '2006': 5.0,
 '2007': 7.0,
 '2008': 6.0,
 '2009': 6.0,
 '2010': 7.0,
 '2011': 7.0,
 '2012': 7.0,
 '2013': 5.0,
 '2014': 6.0,
 '2015': 7.0,
 '2016': 5.0,
 '2017': 8.0,
 '2018': 7.0,
 '2019': 6.0,
 '2020': 7.0}

In [169]:
# look at maximum absolute difference between the total commodity output and the sum of individual industry outputs for early times
max_diff_year = {}
for year in list(io_early.keys())[2:]:
    intermediate_idx = io_early[year].columns.tolist().index('Total Intermediate')
    max_diff_year[year] = max(abs(io_early[year]['Total Commodity Output'] - io_early[year].apply(lambda x: x[1:intermediate_idx - 1].sum() + x[intermediate_idx + 1:-2].sum(), axis=1))[:-3])
max_diff_year

{'1947': 45.0,
 '1948': 46.0,
 '1949': 49.0,
 '1950': 50.0,
 '1951': 58.0,
 '1952': 67.0,
 '1953': 71.0,
 '1954': 91.0,
 '1955': 106.0,
 '1956': 115.0,
 '1957': 131.0,
 '1958': 146.0,
 '1959': 189.0,
 '1960': 289.0,
 '1961': 381.0,
 '1962': 499.0}

In [189]:
# look at maximum absolute difference between the total commodity output and the sum of individual industry outputs for early times
max_diff_year = {}
for year in list(io_late.keys())[2:]:
    intermediate_idx = io_late[year].columns.tolist().index('Total Intermediate')
    max_diff_year[year] = max(abs(io_late[year]['Total Commodity Output'] - io_late[year].apply(lambda x: x[1:intermediate_idx - 1].sum() + x[intermediate_idx + 1:-2].sum(), axis=1))[:-3])
max_diff_year

{'1963': 626.0,
 '1964': 699.0,
 '1965': 772.0,
 '1966': 844.0,
 '1967': 971.0,
 '1968': 923.0,
 '1969': 874.0,
 '1970': 858.0,
 '1971': 807.0,
 '1972': 715.0,
 '1973': 1089.0,
 '1974': 1369.0,
 '1975': 2039.0,
 '1976': 2532.0,
 '1977': 3061.0,
 '1978': 4274.0,
 '1979': 5183.0,
 '1980': 5838.0,
 '1981': 7335.0,
 '1982': 9456.0,
 '1983': 10274.0,
 '1984': 9952.0,
 '1985': 8983.0,
 '1986': 9297.0,
 '1987': 7731.0,
 '1988': 8769.0,
 '1989': 9689.0,
 '1990': 10923.0,
 '1991': 12150.0,
 '1992': 13894.0,
 '1993': 14563.0,
 '1994': 15155.0,
 '1995': 14897.0,
 '1996': 14640.0}

In [205]:
max_diff_year = {}
for year in list(make_early.keys())[2:]:
    max_diff_year[year] = max(abs(make_early[year]['Total Industry Output'] - make_early[year].apply(lambda x: x[1:-1].sum(), axis=1)))
max_diff_year

{'1947': 2.0,
 '1948': 2.0,
 '1949': 3.0,
 '1950': 2.0,
 '1951': 3.0,
 '1952': 2.0,
 '1953': 3.0,
 '1954': 2.0,
 '1955': 3.0,
 '1956': 2.0,
 '1957': 2.0,
 '1958': 2.0,
 '1959': 3.0,
 '1960': 4.0,
 '1961': 3.0,
 '1962': 2.0}

In [206]:
max_diff_year = {}
for year in list(make_late.keys())[2:]:
    max_diff_year[year] = max(abs(make_late[year]['Total Industry Output'] - make_late[year].apply(lambda x: x[1:-1].sum(), axis=1)))
max_diff_year

{'1963': 4.0,
 '1964': 2.0,
 '1965': 2.0,
 '1966': 2.0,
 '1967': 2.0,
 '1968': 3.0,
 '1969': 2.0,
 '1970': 3.0,
 '1971': 2.0,
 '1972': 2.0,
 '1973': 2.0,
 '1974': 2.0,
 '1975': 3.0,
 '1976': 2.0,
 '1977': 2.0,
 '1978': 4.0,
 '1979': 3.0,
 '1980': 3.0,
 '1981': 2.0,
 '1982': 3.0,
 '1983': 3.0,
 '1984': 2.0,
 '1985': 3.0,
 '1986': 3.0,
 '1987': 2.0,
 '1988': 3.0,
 '1989': 3.0,
 '1990': 2.0,
 '1991': 2.0,
 '1992': 3.0,
 '1993': 4.0,
 '1994': 3.0,
 '1995': 4.0,
 '1996': 3.0}

So for the most part, these columns appear to sum up nicely to the output column. There is an exception though. The early IO tables seem to have a lot of trouble getting the right answer. The question is what do I do about this? Well, we could remedy this by not transforming the NA values to 0 and instead filling them with what ever is left over from the difference of the column sums. An alternative is that there is something incorrect in my column sums formula which would explain the variance from the other columns sums that I did. I think for the time being, I just take note that they do not add up and examine this difference further during EDA. I think it may be safe, but maybe a bit to naive, to just set everything in all of these dictionaries to zero right off the bat. 

In [215]:
# we can replace the NaN values up to the 12th last column with zeros as these are essentially zero
def replace_zeros(dict, is_input):
    """
    :param: dict: dictionary of pandas dataframes
    :param: is_input: boolean - if true, data is input dictionary
    # outputs dictionary of dataframes with zeros removed from the industry columns
    """
    for key in list(dict.keys()):
        if is_input:
            # drop those first 5 columns as they have no information
            dict[key].iloc[1:] = dict[key].iloc[1:].replace(np.nan, 0)
        else:
            dict[key].iloc[:-12] = dict[key].iloc[:-12].replace(np.nan, 0)
    return dict

io_table = replace_zeros(io_table, True)
io_early = replace_zeros(io_early, True)
io_late = replace_zeros(io_late, True)
supply = replace_zeros(supply, False)
make_early = replace_zeros(make_early, True)
make_late = replace_zeros(make_late, True)

In [217]:
# bueno
supply['2020']

Unnamed: 0,Name,Farms,"Forestry, fishing, and related activities",Oil and gas extraction,"Mining, except oil and gas",Support activities for mining,Utilities,Construction,Wood products,Nonmetallic mineral products,...,CIF/FOB Adjustments on Imports,Total product supply (basic prices),Trade margins,Transport margins,Total trade and transportation margins,Import duties,Tax on products,Subsidies,Total tax less subsidies on products,Total product supply (purchaser prices)
6,Farms,440636.0,18.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,485451,91965.0,61234.0,153198.0,133.0,5683,-45282.0,-39466,599183
7,"Forestry, fishing, and related activities",4794.0,54970.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,80960,11377.0,4414.0,15791.0,166.0,923,-2146.0,-1057,95694
8,Oil and gas extraction,0.0,0.0,196026.0,0.0,38.0,104.0,0.0,0.0,5.0,...,0.0,282147,8246.0,59233.0,67478.0,126.0,11159,-841.0,10444,360070
9,"Mining, except oil and gas",0.0,0.0,74.0,75375.0,64.0,0.0,0.0,0.0,689.0,...,0.0,75962,9248.0,23419.0,32667.0,9.0,3124,-1053.0,2079,110708
10,Support activities for mining,0.0,0.0,33324.0,7915.0,50558.0,0.0,0.0,0.0,1.0,...,0.0,92664,0.0,0.0,0.0,0.0,0,-3186.0,-3186,89478
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75,State and local general government,,,,,,,,,,...,,1916544,,,,,0,,0,1916544
76,State and local government enterprises,,,,,,1915.0,,,,...,,111092,,,,,0,,0,111092
77,Noncomparable imports and rest-of-the-world ad...,,,,,,,,,,...,-53.0,182129,,,,,0,,0,182129
78,"Scrap, used and secondhand goods",,,,,,,,49.0,10.0,...,,25500,127027.0,33329.0,160356.0,57.0,16649,0.0,16706,202563


# Getting the Future Data


In [3]:
import os
import sys
import yfinance as yf
from datetime import datetime
from pathlib import Path
py_path = Path(os.getcwd()).parent
future_path = str(py_path) + '/data/processed/futures.csv'

In [None]:
# get the data with yfinance api call
futures = yf.download(tickers='SB=F GC=F SI=F CL=F ZC=F ZO=F ZS=F HE=F LE=F CC=F KC=F LBS=F ZB=F NG=F')
df = futures['Open']

# rename the indices with more pythonic names
df = df.rename(columns={'SB=F': 'sugar', 'GC=F': 'gold', 'SI=F': 'silver',
                   'CL=F': 'crude', 'ZC=F': 'corn', 'ZO=F': 'oat',
                   'ZS=F': 'soybean', 'HE=F': 'lean_hog', 'LE=F': 'live_cattle',
                   'CC=F': 'cocoa', 'KC=F': 'coffee', 'LBS=F': 'lumber',
                   'ZB=F': 'treasury_bond', 'NG=F': 'natural_gas'})

# linear interpolation of the closest futures that are missing entries
df = df[df.index >= datetime(2002, 3, 4)].interpolate()

# send to csv in the processed data bin
df.to_csv(data_path)

# Works Cited

Papers Libor gave us:
- Describes the graph-theory stuff w/ BEA data: https://www.nber.org/papers/w21344
- Describes Cobb-Douglas setup w/ graph theory stuff: https://economics.mit.edu/files/12671

Where I will be pulling the futures and economic data if we are doing this problem:
https://finaeon.globalfinancialdata.com

This paper describes how to set up the supply chain problem as an optimization problem: https://supernet.isenberg.umass.edu/articles/440rev2.pdf

The first paper on the topic I believe:
https://www.kellogg.northwestern.edu/research/math/papers/1098.pdf

Pg292 onward in the following paper describe how to set up the graph model statistically speaking:
https://www.ucl.ac.uk/~uctpand/econometrics_of_network_models_2017.pdf

Describes a multivariate time series forecasting method: 
https://arxiv.org/pdf/2103.07719.pdf
((Andrej) Do we *need* a deep neural net for time series?)