## Purpose
This notebook applies ML model 1 (originally designed for data in the `same_date` folder) to the new data set that matches food price data to DJIA data from the following month. As noted in the Jupyter Notebook in the `same_date` folder where this model was built, it performed moderately well, so it will be applied here without any adjustments as a first step.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [2]:
file_path = "ML1_pred_data.csv"
price_df = pd.read_csv(file_path)
price_df

Unnamed: 0,date_time,Beef $/LB,Beef_Pct_Change,Wheat_Price,CPI_Price,Milk Cost per Gallon,Next_Month_DJIA_Change
0,1995-07-01,1.365,0.024006,1.147,138.200,2.477,0
1,1995-08-01,1.328,-0.027106,1.161,138.800,2.482,1
2,1995-09-01,1.376,0.036145,1.159,139.500,2.459,0
3,1995-10-01,1.371,-0.003634,1.175,140.600,2.473,1
4,1995-11-01,1.368,-0.002188,1.169,141.000,2.493,1
...,...,...,...,...,...,...,...
325,2022-08-01,4.937,0.008992,2.298,317.433,4.194,0
326,2022-09-01,4.862,-0.015191,2.362,318.374,4.181,1
327,2022-10-01,4.836,-0.005348,2.386,319.917,4.184,1
328,2022-11-01,4.853,0.003515,2.419,320.034,4.218,0


In [3]:
# Make datetime the index
price_df = price_df.set_index("date_time")
price_df.head()

Unnamed: 0_level_0,Beef $/LB,Beef_Pct_Change,Wheat_Price,CPI_Price,Milk Cost per Gallon,Next_Month_DJIA_Change
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1995-07-01,1.365,0.024006,1.147,138.2,2.477,0
1995-08-01,1.328,-0.027106,1.161,138.8,2.482,1
1995-09-01,1.376,0.036145,1.159,139.5,2.459,0
1995-10-01,1.371,-0.003634,1.175,140.6,2.473,1
1995-11-01,1.368,-0.002188,1.169,141.0,2.493,1


In [4]:
# Separate features from target

# The target is whether the DJIA went up or down
y = price_df["Next_Month_DJIA_Change"]

# Features are all other data
X = price_df.drop(columns="Next_Month_DJIA_Change")

In [5]:
# Split into training and testing sets
# First try without stratifying data
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [6]:
# Create logistic regression model
classifier = LogisticRegression(solver='lbfgs', max_iter=200)

classifier.fit(X_train, y_train)

LogisticRegression(max_iter=200)

In [7]:
# Make predicitons
y_pred = classifier.predict(X_test)
results_df = pd.DataFrame({"Prediction": y_pred, "Actual": y_test}).reset_index(drop=True)
results_df.head(20)

Unnamed: 0,Prediction,Actual
0,1,1
1,1,0
2,1,0
3,1,1
4,1,1
5,1,1
6,1,1
7,1,1
8,1,0
9,1,1


In [8]:
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_pred))

0.6024096385542169


In [9]:
from sklearn.metrics import confusion_matrix, classification_report

matrix = confusion_matrix(y_test, y_pred)
print(matrix)

[[ 4 30]
 [ 3 46]]


In [10]:
report = classification_report(y_test, y_pred)
print(report)

              precision    recall  f1-score   support

           0       0.57      0.12      0.20        34
           1       0.61      0.94      0.74        49

    accuracy                           0.60        83
   macro avg       0.59      0.53      0.47        83
weighted avg       0.59      0.60      0.51        83



## Notes:
This model performed slightly worse at predicting DJIA movement one month ahead than it did at predicting DJIA movement for the same month. Here, it received an accuracy score of about 60%, compared to the 65% it received on the `same_date` data. Notably, precision went up in the negative (`0`) category. 

### Suggestions to try to improve the model

- increase maximum iterations
- stratify the testing and training data
- include percent change data for milk, wheat, and food cpi prices