#### Objective
* Developing a Logistic Regression model to predict whether or not it will rain tomorrow.

In [1]:
import pandas as pd
pd.set_option('display.max_rows', 150)

from sklearn.linear_model import LogisticRegression
from joblib import dump, load
from sklearn.metrics import accuracy_score

In [2]:
# Ignore warnings
import warnings
warnings.filterwarnings("ignore")

In [3]:
%%javascript 
//Disable autoscrolling to see entire graph
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>

##### 1. Develop a Logistic Regression model with an arbitrary random_state.
* You can also set the underlying optimization library by setting the solver parameter.features and labels into training and testing set with 20% test size.


In [4]:
# Read training, test, labels
X_train = pd.read_pickle("../data/X_train.pkl")
X_test  = pd.read_pickle("../data/X_test.pkl")
y_train = pd.read_pickle("../data/y_train.pkl")
y_test  = pd.read_pickle("../data/y_test.pkl")

##### 2. Train the model with the prepared training features and labels.

In [5]:
log_regression = LogisticRegression(solver='liblinear', random_state=0)

In [6]:
log_regression.fit(X_train, y_train)

LogisticRegression(random_state=0, solver='liblinear')

##### 3 Predict the next day's rain forecast for the prepared testing data.
* Calculate the probabilities for negative and positive classes.

In [7]:
y_prediction_test = log_regression.predict(X_test)

In [8]:
y_prediction_test

array([0, 0, 0, ..., 0, 0, 1], dtype=int8)

In [9]:
# Probability of No rain (0)
log_regression.predict_proba(X_test)[:,0]

array([0.92399271, 0.87930502, 0.8637743 , ..., 0.98321549, 0.8307573 ,
       0.39542123])

In [10]:
# Probability of Yes rain (1)
log_regression.predict_proba(X_test)[:,1]

array([0.07600729, 0.12069498, 0.1362257 , ..., 0.01678451, 0.1692427 ,
       0.60457877])

##### 4. Calculate the accuracy score of the model for the predicted results.

In [11]:
test_accuracy_score  = accuracy_score(y_test, y_prediction_test)

In [12]:
print(f"Accuracy Score is:{test_accuracy_score*100:.2f}%")

Accuracy Score is:84.80%


##### 5. Interpret the model results by checking feature importance:
* Check the learned weights for each feature.
* Check the bias term.

In [13]:
# Learned weights for each feature
model_weights = log_regression.coef_[0]
# Bias (Incercept)
model_bias = log_regression.intercept_[0]

In [14]:
# Create a dataframe with feature and weights for easier displaying
column_names = X_train.columns
data = {'Feature':column_names, 'Weights':model_weights}
df = pd.DataFrame(data=data)

In [15]:
df.head(117)

Unnamed: 0,Feature,Weights
0,MinTemp,0.93356
1,MaxTemp,-2.836504
2,Rainfall,1.499087
3,Evaporation,0.18593
4,Sunshine,-1.5324
5,WindGustSpeed,4.117872
6,WindSpeed9am,-0.251537
7,WindSpeed3pm,-0.880586
8,Humidity9am,0.53821
9,Humidity3pm,5.782803


In [16]:
# print bias
print(f"The bias (intercept) for this model is:{model_bias}")

The bias (intercept) for this model is:-3.8766223644030373


In [17]:
# From Project 3 - Milestone 1, save sklearn model
dump(log_regression,'../data/log_regression_project_1.joblib')

['../data/log_regression_project_1.joblib']