# CSCI 184 Machine Learning Final Project: Inflation Detector
# Group Members: Arjun Chimni, Patrick Callahan, Andrew Schulz


## Project Idea: 	
  >With the recent CPI numbers coming out and a recession on the loom, we wanted to identify what conditions cause inflation before the inflation happens. We think that this topic is especially pertinent at the current moment. We want to identify the features/conditions under which inflation happens and try to predict if there will be inflation to the money supply. 
  
  >To do this, we will analyze employment data, CPI (consumer price index), PPI (producer price index), and confidence indexes in order to obtain an idea about what causes inflation and how to detect it early. We will train historical data of all these features to draw conclusions.
  
  > Ideally, the program will be able to suggest some sort of action in order to combat inflation or just general advice for the average consumer heading into a high inflation landscape.



## Step 1: Upload the the .csv files that we will be using for the project
> Here we used data from the following sources: 
* https://www.kaggle.com/datasets/varpit94/us-inflation-data-updated-till-may-2021 : U.S. Inflation Data
* https://www.kaggle.com/datasets/federalreserve/interest-rates : Federal Reserve INterest Rates (1954-Present)
* https://www.kaggle.com/datasets/ambrosm/oecd-consumer-confidence-index : OECD Consumer Confidence Index
    

In [231]:
import pandas as pd
import csv

#opening the data and initializing the variables associated with the data. 
inflation_dt = pd.read_csv("US CPI.csv")
fed_res_inrts = pd.read_csv("index.csv")
cus_conf_index = pd.read_csv("cus_conf_index.csv")


In [232]:
# Now we want to initially convert the data into a dataframe so we can explore how we should begin preprocessing the data. 

inflation_dt_df = pd.DataFrame(data = inflation_dt)
fed_res_inrts_df = pd.DataFrame(data = fed_res_inrts)
cus_conf_index_df = pd.DataFrame(data = cus_conf_index)

In [233]:
# Now that the data is of the dataframe type, we can explore the features and understand the target variable that we are looking to predict
print("These are the columns and datatypes for 'U.S. Inflation Data' dataset: ")
print(inflation_dt_df.dtypes)
print("The shape of the dataframe is: ", inflation_dt_df.shape)
print()
print("These are the columns and datatypes for 'Federal Reserve INterest Rates (1954-Present)' dataset: ")
print(fed_res_inrts_df.dtypes)
print("The shape of the dataframe is: ", fed_res_inrts_df.shape)
print()
print("These are the columns and datatypes for 'OECD Consumer Confidence Index' dataset: ")
print(cus_conf_index_df.dtypes)
print("The shape of the dataframe is: ", cus_conf_index_df.shape)
print()

These are the columns and datatypes for 'U.S. Inflation Data' dataset: 
Yearmon     object
CPI        float64
dtype: object
The shape of the dataframe is:  (1303, 2)

These are the columns and datatypes for 'Federal Reserve INterest Rates (1954-Present)' dataset: 
Year                              int64
Month                             int64
Day                               int64
Federal Funds Target Rate       float64
Federal Funds Upper Target      float64
Federal Funds Lower Target      float64
Effective Federal Funds Rate    float64
Real GDP (Percent Change)       float64
Unemployment Rate               float64
Inflation Rate                  float64
dtype: object
The shape of the dataframe is:  (904, 10)

These are the columns and datatypes for 'OECD Consumer Confidence Index' dataset: 
LOCATION       object
INDICATOR      object
SUBJECT        object
MEASURE        object
FREQUENCY      object
TIME           object
Value         float64
Flag Codes    float64
dtype: object
The s

## Step 2: Merging the Data
> Before we begin with full preprocessing, we want to merge all the data sets together into one ordered dataset so we can observe the power of all the features when building out our predictive model. 

> To do this, we noticed that all the datasets have a date/time feature that we can use to align the datasets and match the data. The following code is the process we took to do this. 

In [234]:
# To complete the merge, we must first observe how each date is formated in each dataset
print("The 'time' feature of dataset 'U.S. Inflation Data' is: ")
print(inflation_dt_df['Yearmon'].head)
print("The 'time' feature of dataset 'Federal Reserve INterest Rates (1954-Present)' is: ")
print(fed_res_inrts_df['Year'].head, fed_res_inrts_df['Month'].head)
print("The 'time' feature of dataset 'OECD Consumer Confidence Index' is: ")
print(cus_conf_index_df['TIME'].head)

The 'time' feature of dataset 'U.S. Inflation Data' is: 
<bound method NDFrame.head of 0       01-01-1913
1       01-02-1913
2       01-03-1913
3       01-04-1913
4       01-05-1913
           ...    
1298    01-03-2021
1299    01-04-2021
1300    01-05-2021
1301    01-06-2021
1302    01-07-2021
Name: Yearmon, Length: 1303, dtype: object>
The 'time' feature of dataset 'Federal Reserve INterest Rates (1954-Present)' is: 
<bound method NDFrame.head of 0      1954
1      1954
2      1954
3      1954
4      1954
       ... 
899    2016
900    2017
901    2017
902    2017
903    2017
Name: Year, Length: 904, dtype: int64> <bound method NDFrame.head of 0       7
1       8
2       9
3      10
4      11
       ..
899    12
900     1
901     2
902     3
903     3
Name: Month, Length: 904, dtype: int64>
The 'time' feature of dataset 'OECD Consumer Confidence Index' is: 
<bound method NDFrame.head of 0        1973-01
1        1973-02
2        1973-03
3        1973-04
4        1973-05
          ...

In [235]:
# With this, we can now structure the data for each dataset into a new column named 'Year-Month' with the format to be YYYY-MM. This will allow easier matching when we perform the merging process. 
# The easiest way to perform restructuring for the data is to create a function that perform this process that can be called for each dataset:
    #PARAMETERS: 
    #           --> dataset = the given dataset that the process will be performed on 
    #           --> columns = the necessary columns that will be used in the process
    #RETURN: 
    #           --> The inputted dataset with a new column named 'Year-Month' for each dataset and the columns that were inputted are now dropped. 

import datetime

def restructure(dataset, columns):
    if len(columns) > 1: 
        dataset[columns[0]] = dataset[columns[0]].astype(str)
        dataset['Temp'] = dataset[columns[1]].astype(str)
        for i in dataset['Temp'].index: 
            if len(dataset['Temp'].iloc[i]) < 2:  
                dataset[columns[1]].iloc[i] = '0' + dataset['Temp'].iloc[i]
            else: 
                dataset[columns[1]].iloc[i] = dataset['Temp'].iloc[i]
        dataset['Year-Month'] = dataset[columns[0]] + "-" + dataset[columns[1]]
        del dataset['Temp']
        del dataset[columns[0]]
        del dataset[columns[1]]
    else: 
        format = "%Y-%m"
        res = True
        try: 
            res = bool(datetime.datetime.strptime(dataset[columns[0]][0], format))
        except: 
            res = False
        print(str(res))
        if res == False:
            dataset['Year-Month'] = 0
            for i in dataset[columns[0]].index:
                date = datetime.datetime.strptime(dataset[columns[0]].iloc[i], '%m-%d-%Y') 
                dataset['Year-Month'].iloc[i] = datetime.date.strftime(date, "%Y-%m")
        else:
            dataset['Year-Month'] = dataset[columns[0]]
        del dataset[columns[0]]
        
    return dataset

In [236]:
restructure(fed_res_inrts_df, ['Year', 'Month']) #Run the function on the first dataset

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataset[columns[1]].iloc[i] = '0' + dataset['Temp'].iloc[i]


Unnamed: 0,Day,Federal Funds Target Rate,Federal Funds Upper Target,Federal Funds Lower Target,Effective Federal Funds Rate,Real GDP (Percent Change),Unemployment Rate,Inflation Rate,Year-Month
0,1,,,,0.80,4.6,5.8,,1954-07
1,1,,,,1.22,,6.0,,1954-08
2,1,,,,1.06,,6.1,,1954-09
3,1,,,,0.85,8.0,5.7,,1954-10
4,1,,,,0.83,,5.3,,1954-11
...,...,...,...,...,...,...,...,...,...
899,14,,0.75,0.50,,,,,2016-12
900,1,,0.75,0.50,0.65,,4.8,2.3,2017-01
901,1,,0.75,0.50,0.66,,4.7,2.2,2017-02
902,1,,0.75,0.50,,,,,2017-03


In [237]:
restructure(inflation_dt_df, ['Yearmon']) #Run the function on the second dataset

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataset['Year-Month'].iloc[i] = datetime.date.strftime(date, "%Y-%m")


False


Unnamed: 0,CPI,Year-Month
0,9.800,1913-01
1,9.800,1913-01
2,9.800,1913-01
3,9.800,1913-01
4,9.700,1913-01
...,...,...
1298,264.877,2021-01
1299,267.054,2021-01
1300,269.195,2021-01
1301,271.696,2021-01


In [238]:
restructure(cus_conf_index_df, ['TIME']) #Run the function on the 3rd dataset

True


Unnamed: 0,LOCATION,INDICATOR,SUBJECT,MEASURE,FREQUENCY,Value,Flag Codes,Year-Month
0,NLD,CCI,AMPLITUD,LTRENDIDX,M,101.50280,,1973-01
1,NLD,CCI,AMPLITUD,LTRENDIDX,M,101.48150,,1973-02
2,NLD,CCI,AMPLITUD,LTRENDIDX,M,101.30810,,1973-03
3,NLD,CCI,AMPLITUD,LTRENDIDX,M,101.01730,,1973-04
4,NLD,CCI,AMPLITUD,LTRENDIDX,M,100.84560,,1973-05
...,...,...,...,...,...,...,...,...
18221,CRI,CCI,AMPLITUD,LTRENDIDX,M,98.93296,,2021-07
18222,CRI,CCI,AMPLITUD,LTRENDIDX,M,99.29540,,2021-08
18223,CRI,CCI,AMPLITUD,LTRENDIDX,M,99.49060,,2021-09
18224,CRI,CCI,AMPLITUD,LTRENDIDX,M,99.57128,,2021-10


In [239]:
# Now that all the dates are in the same format, we can start to compare the datasets to combine them into one. 
# We will initially add the column names of the other datasets to the fed_res_inrts_df dataset



In [247]:
merge_1 = pd.merge(fed_res_inrts_df, cus_conf_index_df, on="Year-Month")
        

In [248]:
final_data = pd.merge(merge_1, inflation_dt_df, on="Year-Month")

In [249]:
final_data # The final merged dataset

Unnamed: 0,Day,Federal Funds Target Rate,Federal Funds Upper Target,Federal Funds Lower Target,Effective Federal Funds Rate,Real GDP (Percent Change),Unemployment Rate,Inflation Rate,Year-Month,LOCATION,INDICATOR,SUBJECT,MEASURE,FREQUENCY,Value,Flag Codes,CPI
0,1,,,,3.99,9.2,5.2,2.0,1960-01,USA,CCI,AMPLITUD,LTRENDIDX,M,101.68200,,29.300
1,1,,,,3.99,9.2,5.2,2.0,1960-01,USA,CCI,AMPLITUD,LTRENDIDX,M,101.68200,,29.400
2,1,,,,3.99,9.2,5.2,2.0,1960-01,USA,CCI,AMPLITUD,LTRENDIDX,M,101.68200,,29.400
3,1,,,,3.99,9.2,5.2,2.0,1960-01,USA,CCI,AMPLITUD,LTRENDIDX,M,101.68200,,29.500
4,1,,,,3.99,9.2,5.2,2.0,1960-01,USA,CCI,AMPLITUD,LTRENDIDX,M,101.68200,,29.500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19951,1,,0.75,0.5,0.65,,4.8,2.3,2017-01,CRI,CCI,AMPLITUD,LTRENDIDX,M,99.48966,,245.519
19952,1,,0.75,0.5,0.65,,4.8,2.3,2017-01,CRI,CCI,AMPLITUD,LTRENDIDX,M,99.48966,,246.819
19953,1,,0.75,0.5,0.65,,4.8,2.3,2017-01,CRI,CCI,AMPLITUD,LTRENDIDX,M,99.48966,,246.663
19954,1,,0.75,0.5,0.65,,4.8,2.3,2017-01,CRI,CCI,AMPLITUD,LTRENDIDX,M,99.48966,,246.669
