# Learning Model
## Objective

This notebook focuses on building a learning model with the different data processing of the other notebooks.

## Data Loading and Preprocessing
- **Data Sources**: We utilized CSV files from previous notebooks
- **Preprocessing Steps**:
  - loading dta from csv files and removing any unnecessary and unused columns.
  - Assign meaningful column names based on the data structure.
  - Convert the 'Date' column to datetime format for better manipulation.
  - merge databases and provide meaningful attribute names.

## Analysis Overview
- **Prediction**:
  - Created a prediction model to predict whether gas price will go up or down based the month.
  - Created a prediction model to predict whether gas price will go up or down based on inflation, current president party, month, year, and current world event if any.


## Results
- **Prediction**:
  - Received a 60 percent accuracy of inflation predicting based off of what month the gas price is in.
  - Recieved a 55 percent accuacy of inflation predicting based off of inflation, current president party, month, year, and current world event if any.



## Conclusion
- Months can help predict whether a gas price will go up or down

## Future Steps to make model better
- investigate more options into what would increase the accuracy of our predicting model





In [19]:
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("Data/Data1_Regular_Conventional.csv", header=2)
df = df.iloc[:-1, :2]
new_headers = ['Date', 'Gas Price']  # Specify your new column names here
df['Date'] = pd.to_datetime(df['Date'])
df.columns = new_headers
df['Month'] = df['Date'].dt.month
df['Year'] = df['Date'].dt.year


df = df.set_index('Date').resample('M').first().reset_index()
df

Unnamed: 0,Date,Gas Price,Month,Year
0,1990-08-31,1.191,8,1990
1,1990-09-30,1.242,9,1990
2,1990-10-31,1.321,10,1990
3,1990-11-30,1.334,11,1990
4,1990-12-31,1.341,12,1990
...,...,...,...,...
400,2023-12-31,3.104,12,2023
401,2024-01-31,2.966,1,2024
402,2024-02-29,3.021,2,2024
403,2024-03-31,3.243,3,2024


In [51]:
df['Price Change'] = df['Gas Price'].diff()

# Define a function to label price changes as 'Up', 'Down', or 'No Change'
def label_change(change):
    if change > 0:
        return 'Up'
    elif change < 0:
        return 'Down'
    else:
        return 'No Change'

# Apply the function to create the 'Change Direction' column
df['Change Direction'] = df['Price Change'].apply(label_change)


# Display the DataFrame with the new columns
df

Unnamed: 0,Date,Gas Price,Month,Year,Price Change,Change Direction
0,1990-08-31,1.191,8,1990,,No Change
1,1990-09-30,1.242,9,1990,0.051,Up
2,1990-10-31,1.321,10,1990,0.079,Up
3,1990-11-30,1.334,11,1990,0.013,Up
4,1990-12-31,1.341,12,1990,0.007,Up
...,...,...,...,...,...,...
400,2023-12-31,3.104,12,2023,-0.141,Down
401,2024-01-31,2.966,1,2024,-0.138,Down
402,2024-02-29,3.021,2,2024,0.055,Up
403,2024-03-31,3.243,3,2024,0.222,Up


In [50]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Assuming df is your DataFrame containing gas prices, including the 'Change Direction' and 'Inflation' columns

# Define features (inflation) and target variable (change direction)
X = df[['Month']]
Y = df['Change Direction']

# Split the data into training and testing sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.8)

# Initialize and train the model (Random Forest Classifier)
model = RandomForestClassifier()
model.fit(X_train, Y_train)

# Predict on the testing set
y_pred = model.predict(X_test)

# Evaluate model performance (accuracy)
accuracy = accuracy_score(Y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.5987654320987654


In [22]:
# adding inflation

new_headers = ['Date', 'Inflation Rate']

df_inflation = pd.read_csv("Data/Inflation_rate_in_US_yearly.csv")
df_inflation.columns = new_headers
df_inflation['Date'] = pd.to_datetime(df_inflation['Date'])
df_inflation['Year'] = df_inflation['Date'].dt.year
df_inflation = df_inflation[df_inflation['Year'] >= 1990]
df_inflation.drop(columns=['Date'], inplace=True)

merged_df = pd.merge(df, df_inflation, on='Year', how='inner')
merged_df.reset_index()


merged_df['Inflation Change'] = merged_df['Inflation Rate'].diff()
merged_df['Inflation Direction'] = merged_df['Inflation Change'].apply(label_change)


# adding world events

iraq_war_start, iraq_war_end = '2003-01-01', '2003-12-31'
financial_crisis_start, financial_crisis_end = '2007-12-01', '2009-06-30'
covid_start, covid_end = '2020-01-01', '2023-12-31'  # Adjust end date as per your data availability

# Create new columns for world events
merged_df['Iraq War'] = (merged_df['Date'] >= iraq_war_start) & (merged_df['Date'] <= iraq_war_end)
merged_df['Financial Crisis'] = (merged_df['Date'] >= financial_crisis_start) & (merged_df['Date'] <= financial_crisis_end)
merged_df['COVID-19 Pandemic'] = (merged_df['Date'] >= covid_start) & (merged_df['Date'] <= covid_end)

# Convert boolean values to integers (0 or 1)
merged_df[['Iraq War', 'Financial Crisis', 'COVID-19 Pandemic']] = merged_df[['Iraq War', 'Financial Crisis', 'COVID-19 Pandemic']].astype(int)




# adding presidential terms

presidential_terms = {
    'Democrat': [(1993, 1996), (1997, 2000), (2009, 2012), (2013, 2016), (2021, 2024)],
    'Republican': [(1989, 1992), (2001, 2004), (2005, 2008), (2017, 2020)]
}

# Create new columns for presidential terms
merged_df['Democrat Term'] = 0
merged_df['Republican Term'] = 0

# Iterate through the presidential terms dictionary
for party, terms in presidential_terms.items():
    for term_start, term_end in terms:
        merged_df.loc[(merged_df['Year'] >= term_start) & (merged_df['Year'] <= term_end), f'{party} Term'] = 1

# Display the modified DataFrame
merged_df



Unnamed: 0,Date,Gas Price,Month,Year,Price Change,Change Direction,Inflation Rate,Inflation Change,Inflation Direction,Iraq War,Financial Crisis,COVID-19 Pandemic,Democrat Term,Republican Term
0,1990-08-31,1.191,8,1990,,No Change,5.397956,,No Change,0,0,0,0,1
1,1990-09-30,1.242,9,1990,0.051,Up,5.397956,0.0,No Change,0,0,0,0,1
2,1990-10-31,1.321,10,1990,0.079,Up,5.397956,0.0,No Change,0,0,0,0,1
3,1990-11-30,1.334,11,1990,0.013,Up,5.397956,0.0,No Change,0,0,0,0,1
4,1990-12-31,1.341,12,1990,0.007,Up,5.397956,0.0,No Change,0,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
384,2022-08-31,4.034,8,2022,-0.585,Down,8.002800,0.0,No Change,0,0,1,1,0
385,2022-09-30,3.617,9,2022,-0.417,Down,8.002800,0.0,No Change,0,0,1,1,0
386,2022-10-31,3.592,10,2022,-0.025,Down,8.002800,0.0,No Change,0,0,1,1,0
387,2022-11-30,3.628,11,2022,0.036,Up,8.002800,0.0,No Change,0,0,1,1,0


In [23]:
X = merged_df[['Month', 'Year', 'Inflation Rate', 'Iraq War', 'Financial Crisis', 'COVID-19 Pandemic', 'Democrat Term', 'Republican Term']]
Y = merged_df['Change Direction']

# Split the data into training and testing sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.8)

# Initialize and train the model (Random Forest Classifier)
model = RandomForestClassifier()
model.fit(X_train, Y_train)

# Predict on the testing set
y_pred = model.predict(X_test)

# Evaluate model performance (accuracy)
accuracy = accuracy_score(Y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.4967948717948718
