# My Machine Learning Template Notebook
## By Brennan Casey
##### The purpose of the Python Notebook is to assist with the modeling portion of your data after you have completed your feature engineering and data preprocessing. Simply load in your data, create the train and test split to your liking, and run the models to test performance. When you find a few models that you like, I have provided hyperlinks to the documentation for each classifier to help you tune your hyperparameters.

### Import Statements

In [4]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

### Loading Dataset

In [None]:
df = pd.read_csv("Data.csv")

### Create Train and Test split

You must define your y column name (Output Variable) here. The notebook will remove the colunmn from the X columns

In [5]:
output_variable = "column_name"

In [None]:
y = df[[output_variable]]

X = df.drop([output_variable], axis=1)

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

##### Imbalanced Dataset? 
Use the below cell to implement Imblearn SMOTE to oversample the training set.

In [None]:
import imblearn
from imblearn.over_sampling import SMOTE

oversample = SMOTE()
X_train, y_train = oversample.fit_resample(X_train, y_train)

#### Confusion Matrix to check if the model is learning. 
In many cases with imbalanced datasets or complex models with little data, the model can become overfitted or underfitted. Using the confusion matrix we can check for these problems.

In [7]:
def confusionCheck(y_pred, y_test):
    
    y_check = list(y_test)
    y_pred = list(y_pred)

    true_positive = 0
    true_negative = 0
    false_positive = 0
    false_negative = 0
    
    i = 0
    while i < len(results)-1:
        if y_pred[i] == 1 and y_check[i] == 1:
            true_positive += 1
        elif y_pred[i] == 1 and y_check[i] == 0:
            false_positive += 1
        elif y_pred[i] == 0 and y_check[i] == 0:
            true_negative += 1
        else: 
            false_negative += 1
        i += 1



    print("True Positive "+str(true_positive))

    print("True Negative "+str(true_negative))

    print("False Positive "+str(false_positive))

    print("False Negative "+str(false_negative))

## ML Classifiers

### RandomForrestClassifier
[RandomForrestClassifier Documentation](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)


In [None]:
from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print(accuracy_score(y_test, y_pred, normalize=True))
print(confusionCheck(y_pred=y_pred, y_test=y_test))

### Linear Regression - Lasso
[Linear Regression Lasso Documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso)


In [None]:
from sklearn import linear_model

clf = linear_model.Lasso(alpha=0.1)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print(accuracy_score(y_test, y_pred, normalize=True))
print(confusionCheck(y_pred=y_pred, y_test=y_test))

### Linear Regression - Ridge
[Linear Regression Ridge Documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeClassifier.html#sklearn.linear_model.RidgeClassifier)


In [None]:
from sklearn.linear_model import RidgeClassifier

clf = RidgeClassifier()
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print(accuracy_score(y_test, y_pred, normalize=True))
print(confusionCheck(y_pred=y_pred, y_test=y_test))

### Stochastic Gradient Decent - hinge
[Stochastic Gradient Decent Documentation](https://scikit-learn.org/stable/modules/sgd.html#classification)
###### penalty options: l1, l2, and elasticnet


In [None]:
from sklearn.linear_model import SGDClassifier

clf = SGDClassifier(loss="hinge", penalty="l2", max_iter=5)
clf.fit(X_train, y_train)

clf.predict(X_test)

print(accuracy_score(y_test, y_pred, normalize=True))
print(confusionCheck(y_pred=y_pred, y_test=y_test))

### Stochastic Gradient Decent - modified_huber
[Stochastic Gradient Decent Documentation](https://scikit-learn.org/stable/modules/sgd.html#classification)
##### penalty options: l1, l2, and elasticnet


In [None]:
from sklearn.linear_model import SGDClassifier

clf = SGDClassifier(loss="modified_huber", penalty="l2", max_iter=5)
clf.fit(X_train, y_train)

clf.predict(X_test)

print(accuracy_score(y_test, y_pred, normalize=True))
print(confusionCheck(y_pred=y_pred, y_test=y_test))

### Stochastic Gradient Decent - Log
[Stochastic Gradient Decent Documentation](https://scikit-learn.org/stable/modules/sgd.html#classification)

###### penalty options: l1, l2, and elasticnet

In [None]:
from sklearn.linear_model import SGDClassifier

clf = SGDClassifier(loss="logfrom sklearn.linear_model import SGDClassifier

clf = SGDClassifier(loss="log", penalty="l2", max_iter=5)
clf.fit(X_train, y_train)

clf.predict(X_test)

print(accuracy_score(y_test, y_pred, normalize=True))
print(confusionCheck(y_pred=y_pred, y_test=y_test))


### Neural Network - Multi-layer Perceptron
[NN Multi-layer Perceptron Documentation](https://scikit-learn.org/stable/modules/neural_networks_supervised.html#classification)

In [None]:
from sklearn.linear_model import SGDClassifier

clf = MLPClassifier(solver='lbfgs', alpha=1e-5,
                    hidden_layer_sizes=(15,), random_state=1)

clf.fit(X_train, y_train)

clf.predict(X_test)

print(accuracy_score(y_test, y_pred, normalize=True))
print(confusionCheck(y_pred=y_pred, y_test=y_test))

### Decision Tree
[Decision Tree Documentation](https://scikit-learn.org/stable/modules/tree.html#classification)
                                          

In [None]:
from sklearn import tree

clf = tree.DecisionTreeClassifier()
clf.fit(X_train, y_train)

clf.predict(X_test)

print(accuracy_score(y_test, y_pred, normalize=True))
print(confusionCheck(y_pred=y_pred, y_test=y_test))

### Gaussian Naive Bayes
[Gaussian Naive Bayes Documentation](https://scikit-learn.org/stable/modules/naive_bayes.html#gaussian-naive-bayes)
                    

In [None]:
from sklearn.naive_bayes import GaussianNB

clf = GaussianNB()
clf.fit(X_train, y_train)

clf.predict(X_test)

print(accuracy_score(y_test, y_pred, normalize=True))
print(confusionCheck(y_pred=y_pred, y_test=y_test))


### adaBoost
[adaBoost Documentation](https://scikit-learn.org/stable/modules/ensemble.html#adaboost)
                                          

In [None]:
from sklearn.ensemble import AdaBoostClassifier

clf = AdaBoostClassifier(n_estimators=100)
clf.fit(X_train, y_train)

clf.predict(X_test)

print(accuracy_score(y_test, y_pred, normalize=True))
print(confusionCheck(y_pred=y_pred, y_test=y_test))

### XGBoost
[XGBoost Sklearn Documentation](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html)


In [None]:
from sklearn.ensemble import GradientBoostingClassifier

clf = GradientBoostingClassifier(random_state=0)
clf.fit(X_train, y_train)

clf.predict(X_test)

print(accuracy_score(y_test, y_pred, normalize=True))
print(confusionCheck(y_pred=y_pred, y_test=y_test))