## ADABoost Classifier Mini Project

In this project, we will again try to classify emails, this time using a AdaBoost classifier. 


An **AdaBoost** classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

---

### Libraries

In [1]:
#Specify external dir
import sys
sys.path.append('../tools/')

#Import preprocess for email data and visualization
from class_vis import prettyPicture
from email_preprocess import preprocess

In [2]:
#Import Dependencies
import pandas as pd
from time import time
import matplotlib.pyplot as plt
import numpy as np
import pylab as pl

### Data

In [3]:
### features_train and features_test are the features for the training
### and testing datasets, respectively
### labels_train and labels_test are the corresponding item labels
features_train, features_test, labels_train, labels_test = preprocess()

no. of Chris training emails: 7936
no. of Sara training emails: 7884


### Create Dtree

In [4]:
### Import module to create Decision Tree classifier
from sklearn import tree

#### Create the model
dtree = tree.DecisionTreeClassifier(criterion='gini', max_depth=1)

### Import ADABoost

In [5]:
from sklearn.ensemble import AdaBoostClassifier

#### Create ADABoost Classifier

In [6]:
adabst_fit = AdaBoostClassifier(base_estimator=dtree, n_estimators=1000, learning_rate=0.05, random_state=42)

#### Train

In [7]:
### Fit the classifier on the training features and labels
t0=time()
adabst_fit.fit(features_train,labels_train)
print("Training time: ",round(time()-t0,3),"secs")

Training time:  3320.626 secs


#### Predict

In [8]:
### use the trained classifier to predict labels for the test features
### use the trained classifier to predict labels for the test features
t1=time()
pred=adabst_fit.predict(features_test)
print("Prediction time: ",round(time()-t1,3),"secs")

Prediction time:  2.183 secs


#### Model Accuracy

In [9]:
#Get accuracy for the model
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(pred, labels_test)
print("Accuracy: {:.2f}%".format(accuracy *100))

Accuracy: 96.93%
