## Purpose
This notebook creates a logistic regression model and prints its accuracy score and confusion matrix

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [2]:
file_path = "../Edited Data/Output/all_price_data_withdatetime.csv"
price_df = pd.read_csv(file_path)
price_df

Unnamed: 0,date_time,Beef $/LB,Beef_Pct_Change,Wheat_Price,CPI_Price,Milk Cost per Gallon,DJIA_change
0,1995-07-01,1.365,0.024006,1.147,138.200,2.477,1
1,1995-08-01,1.328,-0.027106,1.161,138.800,2.482,0
2,1995-09-01,1.376,0.036145,1.159,139.500,2.459,1
3,1995-10-01,1.371,-0.003634,1.175,140.600,2.473,0
4,1995-11-01,1.368,-0.002188,1.169,141.000,2.493,1
...,...,...,...,...,...,...,...
326,2022-09-01,4.862,-0.015191,2.362,318.374,4.181,0
327,2022-10-01,4.836,-0.005348,2.386,319.917,4.184,1
328,2022-11-01,4.853,0.003515,2.419,320.034,4.218,1
329,2022-12-01,4.800,-0.010921,2.419,322.507,4.211,0


In [3]:
# Make datetime the index
price_df = price_df.set_index("date_time")
price_df.head()

Unnamed: 0_level_0,Beef $/LB,Beef_Pct_Change,Wheat_Price,CPI_Price,Milk Cost per Gallon,DJIA_change
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1995-07-01,1.365,0.024006,1.147,138.2,2.477,1
1995-08-01,1.328,-0.027106,1.161,138.8,2.482,0
1995-09-01,1.376,0.036145,1.159,139.5,2.459,1
1995-10-01,1.371,-0.003634,1.175,140.6,2.473,0
1995-11-01,1.368,-0.002188,1.169,141.0,2.493,1


In [16]:
# Check out proportion of up months to down months
down_months = len(price_df.loc[price_df["DJIA_change"] == 0])
all_months = len(price_df)
up_months = all_months - down_months
print(down_months)
print(all_months)
print("----------")
prp_dwn = down_months / all_months
prp_up = up_months / all_months
print(f'The proportion of down months is {prp_dwn:.2f}%.')
print(f'The proportion of up months is {prp_up:.2f}%.')

125
331
----------
The proportion of down months is 0.38%.
The proportion of up months is 0.62%.


In [None]:
# Slightly more than one third of all months in dataset declined.
# That means slightly less than two thirds saw an increase.
# We may want to strtify the data in train_test_split

In [4]:
# Separate features from target

# The target is whether the DJIA went up or down
y = price_df["DJIA_change"]

# Features are all other data
X = price_df.drop(columns="DJIA_change")

In [9]:
# Split into training and testing sets
# First try without stratifying data
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [10]:
# Create logistic regression model
classifier = LogisticRegression(solver='lbfgs', max_iter=200)

classifier.fit(X_train, y_train)

LogisticRegression(max_iter=200)

In [11]:
# Make predicitons
y_pred = classifier.predict(X_test)
results_df = pd.DataFrame({"Prediction": y_pred, "Actual": y_test}).reset_index(drop=True)
results_df.head(20)

Unnamed: 0,Prediction,Actual
0,1,1
1,1,1
2,1,1
3,0,0
4,1,1
5,1,1
6,1,0
7,1,1
8,1,1
9,1,1


In [12]:
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_pred))

0.6506024096385542


In [13]:
from sklearn.metrics import confusion_matrix, classification_report

In [14]:
matrix = confusion_matrix(y_test, y_pred)
print(matrix)

[[ 4 22]
 [ 7 50]]


In [15]:
report = classification_report(y_test, y_pred)
print(report)

              precision    recall  f1-score   support

           0       0.36      0.15      0.22        26
           1       0.69      0.88      0.78        57

    accuracy                           0.65        83
   macro avg       0.53      0.52      0.50        83
weighted avg       0.59      0.65      0.60        83



## Notes:
As noted above, the proportion of all positive months is about 62%. That means if you were to have predicted that the DJIA would go up every month, you would have been right 62% of the time. With an accuracy score of 65%, this model is performing slightly better than that. 

### Suggestions to try to improve the model

- increase maximum iterations
- stratify the testing and training data
- include percent change data for milk, wheat, and food cpi prices