# Rainfall Prediction - Training a Binary Classifier
## Model Building and Evaluation
As usual we need to load in the libraries, and data, we will be using. 

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

X_train = pd.read_csv("X_train.csv", header=None)
X_test = pd.read_csv("X_test.csv", header=None)
y_train = pd.read_csv("y_train.csv")
y_test = pd.read_csv("y_test.csv")

We can now import a logistic regression model from sklearn, and train it on the data we've prepared.

In [2]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(solver='liblinear')
fit = model.fit(X_train, y_train['RainTomorrow'].values.ravel())

Now, we can use `predict` to see whether or not our model expects it to rain on the days in our test data set.

In [3]:
from sklearn.metrics import accuracy_score

y_pred = model.predict(X_test)
y_pred

array([0., 0., 0., ..., 0., 0., 0.])

We can also use `predict_proba` to get our models probability estimates for rain on each of the days in the test dataset.

In [4]:
model.predict_proba(X_test)

array([[0.93853527, 0.06146473],
       [0.97598477, 0.02401523],
       [0.78287216, 0.21712784],
       ...,
       [0.96370134, 0.03629866],
       [0.6662715 , 0.3337285 ],
       [0.97254134, 0.02745866]])

sklearn also has a function `accuracy_score` which allows us to test the accuracy of our model.

In [5]:
accuracy_score(y_test['RainTomorrow'], y_pred)

0.8328000281303843

Finally, we can examine the weights generated by our model to see which features it assigns importance to.  

In [6]:
features = np.load("features.npy")
dict(zip(features, model.coef_[0]))

{'Cloud3pm': 0.7608707389905341,
 'Cloud9am': 0.16433916944567153,
 'Day': -0.03228726635302278,
 'Evaporation': 0.059189749181818165,
 'Humidity3pm': 1.907437264584414,
 'Humidity9am': 0.5181593001392079,
 'Location=Adelaide': 0.600732352226965,
 'Location=Albany': -0.1807678664782725,
 'Location=Albury': 0.18989802838965175,
 'Location=AliceSprings': -0.13483391681985654,
 'Location=BadgerysCreek': 0.28040887308700985,
 'Location=Ballarat': -0.3720034042058957,
 'Location=Bendigo': -0.006899845430245617,
 'Location=Brisbane': 0.6071459220511582,
 'Location=Cairns': 0.2745590074853293,
 'Location=Canberra': -0.12246779499984588,
 'Location=Cobar': 0.033032452339599845,
 'Location=CoffsHarbour': 0.04710452261352329,
 'Location=Dartmoor': 0.08102066926179907,
 'Location=Darwin': -0.17749131910176014,
 'Location=GoldCoast': 0.10222901397832544,
 'Location=Hobart': -0.4197597923105655,
 'Location=Katherine': -0.27402073554645834,
 'Location=Launceston': -0.38286381021118543,
 'Location=Me