## Data Breach Analytics 2005 - 2017 (Part IV - Supervised Text Classification Modeling part 2)
#### by Miriam Rodriguez

Classification models need to be created to determine the risk of a data breach:
- the liklihood of a specific type of breach
- which organization type is likely to be hit
- what is the trend

Separate the features to predict each type of breach and which organizations

Organizations could develop security systems as a result of this study.


scoring dataset : breach_scoringdataset.csv

# Importing packages 

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

#import decisiontreeclassifier
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
#import logisticregression classifier
from sklearn.linear_model import LogisticRegression
import statsmodels.api as sm
#import knn classifier
from sklearn.neighbors import KNeighborsClassifier

#for validating your classification model
from sklearn.cross_validation import train_test_split
from sklearn.cross_validation import cross_val_score
from sklearn import metrics
from sklearn.metrics import roc_curve, auc
from pandas.core import datetools

# feature selection
from sklearn.feature_selection import RFE
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

## Read cleaned data

In [118]:
#import breach data ... open or read the bank data
df=pd.read_csv("databreach_cleaned.csv")
print df.head()


   Breach_Year                                         Company          State  \
0         2006                       Deloitte & Touche, McAfee  United States   
1         2007                   TennCare, Americhoice Inc.     United States   
2         2007                      Mercer Health and Benefits  United States   
3         2007  Fidelity Investments, Dairy Farmers of America  United States   
4         2007                                      Dai Nippon  United States   

  Breach_Type Organization_Type  Total_Recs  \
0        PORT               BSO        9290   
1        PORT               MED       67000   
2        PORT               BSF       10500   
3        PORT               BSF          69   
4        INSD               BSO           0   

                                         Description   Latitude  Longitude  \
0  An external auditor lost a CD with names, Soci...  37.090240 -95.712891   
1  There are 67,000 TennCare \r\n            enro...  35.960638 -83.920739

# Data Prep

In [162]:
#drop or remove the column 'id' since this column is not used in the analysis and disply the result
df_stat = df.drop(['GDP', 'Latitude', 'Longitude', 'Breach_Type', 'Organization_Type', 'State', 'Company', 'Description', 'Year_CAT'], axis=1)
df_stat.head()


Unnamed: 0,Breach_Year,Total_Recs,Breach_Type_CAT,Organization_Type_CAT,State_CAT
0,2006,9290,5,2,0
1,2007,67000,5,6,0
2,2007,10500,5,1,0
3,2007,69,5,1,0
4,2007,0,3,2,0


In [163]:
#  Convert Total Records either harm (>1) or no harm (0) in terms of records breached.

df_stat["Total_Recs"][df_stat["Total_Recs"]>0] = 1
df_stat.head()


Unnamed: 0,Breach_Year,Total_Recs,Breach_Type_CAT,Organization_Type_CAT,State_CAT
0,2006,1,5,2,0
1,2007,1,5,6,0
2,2007,1,5,1,0
3,2007,1,5,1,0
4,2007,0,3,2,0


In [164]:
#describe the data

df_stat.describe()


Unnamed: 0,Breach_Year,Total_Recs,Breach_Type_CAT,Organization_Type_CAT,State_CAT
count,8177.0,8177.0,8177.0,8177.0,8177.0
mean,2012.204965,0.728996,4.015042,4.534915,23.191513
std,3.425944,0.444505,1.992745,1.825506,15.301632
min,2005.0,0.0,0.0,1.0,0.0
25%,2010.0,0.0,2.0,3.0,9.0
50%,2012.0,1.0,4.0,6.0,21.0
75%,2015.0,1.0,5.0,6.0,36.0
max,2018.0,1.0,7.0,7.0,52.0


In [165]:
#show the information about the dataset
df_stat.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8177 entries, 0 to 8176
Data columns (total 5 columns):
Breach_Year              8177 non-null int64
Total_Recs               8177 non-null int64
Breach_Type_CAT          8177 non-null int64
Organization_Type_CAT    8177 non-null int64
State_CAT                8177 non-null int64
dtypes: int64(5)
memory usage: 319.5 KB


# Classification Model building

In [166]:
#set X, y value
y = df_stat['Total_Recs']
X = df_stat.drop(['Total_Recs'], axis=1)

## Decision Tree Model Building, Validation, Evaluation


In [167]:
# evaluate the model by splitting into train (70%) and test sets (30%)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')

In [168]:
#Model evaluation

print  metrics.accuracy_score(y_test, dt.predict(X_test))
print  metrics.confusion_matrix(y_test, dt.predict(X_test))
print  metrics.classification_report(y_test, dt.predict(X_test))
print  metrics.roc_auc_score(y_test, dt.predict(X_test))

0.740831295844
[[ 355  317]
 [ 319 1463]]
             precision    recall  f1-score   support

          0       0.53      0.53      0.53       672
          1       0.82      0.82      0.82      1782

avg / total       0.74      0.74      0.74      2454

0.674630731922


Question: Interpret the results of confusion matrix
    
- The decision tree model is 74% accurate. Therefore, we expect that the model will be about 74% accurate when the model is applied into a real-world situation 
- 355 incidents were correctly classified as not causing harm, 319 incidents were misclassified as not causing harm.
- 1463 incidents were correctly classified as causing harm, 317 incidents were misclassified as causing harm.
- Total number in testing dataset is 2454, total harm = 672 (355 + 317), total with no harm = 1782 (319 + 1463). 
- Overall accuracy calculated as correctly classified 355 + 1463 / 2454 total in test dsn = 74%

In [169]:
Breach_Type_dm = pd.get_dummies(df_stat['Breach_Type_CAT'], prefix='Breach_Type')
df_stat_sp = df_stat.join(Breach_Type_dm)
df_stat_sp.head()

Unnamed: 0,Breach_Year,Total_Recs,Breach_Type_CAT,Organization_Type_CAT,State_CAT,Breach_Type_0,Breach_Type_1,Breach_Type_2,Breach_Type_3,Breach_Type_4,Breach_Type_5,Breach_Type_6,Breach_Type_7
0,2006,1,5,2,0,0,0,0,0,0,1,0,0
1,2007,1,5,6,0,0,0,0,0,0,1,0,0
2,2007,1,5,1,0,0,0,0,0,0,1,0,0
3,2007,1,5,1,0,0,0,0,0,0,1,0,0
4,2007,0,3,2,0,0,0,0,1,0,0,0,0


In [170]:
df_stat_sp = df_stat_sp.drop(['Breach_Type_0'], axis=1)
df_stat_sp.head()

Unnamed: 0,Breach_Year,Total_Recs,Breach_Type_CAT,Organization_Type_CAT,State_CAT,Breach_Type_1,Breach_Type_2,Breach_Type_3,Breach_Type_4,Breach_Type_5,Breach_Type_6,Breach_Type_7
0,2006,1,5,2,0,0,0,0,0,1,0,0
1,2007,1,5,6,0,0,0,0,0,1,0,0
2,2007,1,5,1,0,0,0,0,0,1,0,0
3,2007,1,5,1,0,0,0,0,0,1,0,0
4,2007,0,3,2,0,0,0,1,0,0,0,0


In [171]:
# rename columns Description of incident: Description
df_stat_sp = df_stat_sp.rename(columns={'Breach_Type_1': 'CreditCard', 'Breach_Type_2': 'Hacking', 'Breach_Type_3': 'Insider', 'Breach_Type_4': 'Physical', 'Breach_Type_5': 'Portable', 'Breach_Type_6': 'Stationary', 'Breach_Type_6': 'Disclosure', 'Breach_Type_7': 'Unknown'})

In [172]:
df_stat_sp.head()

Unnamed: 0,Breach_Year,Total_Recs,Breach_Type_CAT,Organization_Type_CAT,State_CAT,CreditCard,Hacking,Insider,Physical,Portable,Disclosure,Unknown
0,2006,1,5,2,0,0,0,0,0,1,0,0
1,2007,1,5,6,0,0,0,0,0,1,0,0
2,2007,1,5,1,0,0,0,0,0,1,0,0
3,2007,1,5,1,0,0,0,0,0,1,0,0
4,2007,0,3,2,0,0,0,1,0,0,0,0


In [173]:
df_stat_sp = df_stat_sp.drop(['Breach_Type_CAT'], axis=1)
df_stat_sp.head()

Unnamed: 0,Breach_Year,Total_Recs,Organization_Type_CAT,State_CAT,CreditCard,Hacking,Insider,Physical,Portable,Disclosure,Unknown
0,2006,1,2,0,0,0,0,0,1,0,0
1,2007,1,6,0,0,0,0,0,1,0,0
2,2007,1,1,0,0,0,0,0,1,0,0
3,2007,1,1,0,0,0,0,0,1,0,0
4,2007,0,2,0,0,0,1,0,0,0,0


## Model Creation and Deployment: Predict y values 
- Create and load scoringdataset.csv (scoring dataset). This dataset has no y value, representing the future. Decision model will be deployed to determine if we can predict the type of breach.

In [174]:
#set X, y value
y = df_stat_sp['Hacking']
X = df_stat_sp.drop(['Hacking'], axis=1)

In [175]:
# evaluate the model by splitting into train (70%) and test sets (30%)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')

In [176]:
#Model evaluation

print  metrics.accuracy_score(y_test, dt.predict(X_test))
print  metrics.confusion_matrix(y_test, dt.predict(X_test))
print  metrics.classification_report(y_test, dt.predict(X_test))
print  metrics.roc_auc_score(y_test, dt.predict(X_test))

0.954360228199
[[1660   48]
 [  64  682]]
             precision    recall  f1-score   support

          0       0.96      0.97      0.97      1708
          1       0.93      0.91      0.92       746

avg / total       0.95      0.95      0.95      2454

0.943053035393


#### Model Deployment

In [177]:
df_score_h = df_stat_sp.drop(['Hacking'], axis=1)
df_score_h.head()

Unnamed: 0,Breach_Year,Total_Recs,Organization_Type_CAT,State_CAT,CreditCard,Insider,Physical,Portable,Disclosure,Unknown
0,2006,1,2,0,0,0,0,1,0,0
1,2007,1,6,0,0,0,0,1,0,0
2,2007,1,1,0,0,0,0,1,0,0
3,2007,1,1,0,0,0,0,1,0,0
4,2007,0,2,0,0,1,0,0,0,0


In [178]:
df_score_h.to_csv("scoringdataset_hack.csv",index=False)

In [179]:
# load scoringdataset_hack.csv
score_h=pd.read_csv("scoringdataset_hack.csv")
print score_h.head()

   Breach_Year  Total_Recs  Organization_Type_CAT  State_CAT  CreditCard  \
0         2006           1                      2          0           0   
1         2007           1                      6          0           0   
2         2007           1                      1          0           0   
3         2007           1                      1          0           0   
4         2007           0                      2          0           0   

   Insider  Physical  Portable  Disclosure  Unknown  
0        0         0         1           0        0  
1        0         0         1           0        0  
2        0         0         1           0        0  
3        0         0         1           0        0  
4        1         0         0           0        0  


In [180]:
# finally generate the predicted y value
predictedY = dt.predict(score_h)
print predictedY

[0 0 0 ..., 1 0 0]


In [181]:
predictedY = pd.DataFrame(predictedY, columns=['predicted Y'])
predictedY.head(20)

Unnamed: 0,predicted Y
0,0
1,0
2,0
3,0
4,0
5,0
6,0
7,0
8,1
9,0


In [182]:
# Check hacking column to see if matches predicted Y
df_stat_sp.head(20)

Unnamed: 0,Breach_Year,Total_Recs,Organization_Type_CAT,State_CAT,CreditCard,Hacking,Insider,Physical,Portable,Disclosure,Unknown
0,2006,1,2,0,0,0,0,0,1,0,0
1,2007,1,6,0,0,0,0,0,1,0,0
2,2007,1,1,0,0,0,0,0,1,0,0
3,2007,1,1,0,0,0,0,0,1,0,0
4,2007,0,2,0,0,0,1,0,0,0,0
5,2007,1,6,0,0,0,0,0,0,0,1
6,2008,0,2,0,0,0,0,0,1,0,0
7,2009,0,5,0,0,0,0,1,0,0,0
8,2009,1,4,0,0,1,0,0,0,0,0
9,2010,1,6,0,0,0,0,1,0,0,0


In [183]:
data_h = score_h.join(predictedY) 
data_h.head(20)

Unnamed: 0,Breach_Year,Total_Recs,Organization_Type_CAT,State_CAT,CreditCard,Insider,Physical,Portable,Disclosure,Unknown,predicted Y
0,2006,1,2,0,0,0,0,1,0,0,0
1,2007,1,6,0,0,0,0,1,0,0,0
2,2007,1,1,0,0,0,0,1,0,0,0
3,2007,1,1,0,0,0,0,1,0,0,0
4,2007,0,2,0,0,1,0,0,0,0,0
5,2007,1,6,0,0,0,0,0,0,1,0
6,2008,0,2,0,0,0,0,1,0,0,0
7,2009,0,5,0,0,0,1,0,0,0,0
8,2009,1,4,0,0,0,0,0,0,0,1
9,2010,1,6,0,0,0,1,0,0,0,0


### Hacking was predicted correctly

#### Model Create

In [184]:
#set X, y value
y = df_stat_sp['CreditCard']
X = df_stat_sp.drop(['CreditCard'], axis=1)

In [185]:
# evaluate the model by splitting into train (70%) and test sets (30%)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')

In [186]:
#Model evaluation

print  metrics.accuracy_score(y_test, dt.predict(X_test))
print  metrics.confusion_matrix(y_test, dt.predict(X_test))
print  metrics.classification_report(y_test, dt.predict(X_test))
print  metrics.roc_auc_score(y_test, dt.predict(X_test))

0.993072534637
[[2425   11]
 [   6   12]]
             precision    recall  f1-score   support

          0       1.00      1.00      1.00      2436
          1       0.52      0.67      0.59        18

avg / total       0.99      0.99      0.99      2454

0.831075533662


#### Model Deployment

In [187]:
df_score_cc = df_stat_sp.drop(['CreditCard'], axis=1)
df_score_cc.head()

Unnamed: 0,Breach_Year,Total_Recs,Organization_Type_CAT,State_CAT,Hacking,Insider,Physical,Portable,Disclosure,Unknown
0,2006,1,2,0,0,0,0,1,0,0
1,2007,1,6,0,0,0,0,1,0,0
2,2007,1,1,0,0,0,0,1,0,0
3,2007,1,1,0,0,0,0,1,0,0
4,2007,0,2,0,0,1,0,0,0,0


In [188]:
df_score_cc.to_csv("scoringdataset_card.csv",index=False)

In [189]:
# load scoringdataset_card.csv
score_cc=pd.read_csv("scoringdataset_card.csv")
print score_cc.head()

   Breach_Year  Total_Recs  Organization_Type_CAT  State_CAT  Hacking  \
0         2006           1                      2          0        0   
1         2007           1                      6          0        0   
2         2007           1                      1          0        0   
3         2007           1                      1          0        0   
4         2007           0                      2          0        0   

   Insider  Physical  Portable  Disclosure  Unknown  
0        0         0         1           0        0  
1        0         0         1           0        0  
2        0         0         1           0        0  
3        0         0         1           0        0  
4        1         0         0           0        0  


In [190]:
# finally generate the predicted y value
predictedY = dt.predict(score_cc)
predictedY

array([0, 0, 0, ..., 0, 0, 0], dtype=uint8)

In [191]:
predictedY = pd.DataFrame(predictedY, columns=['predicted Y'])
predictedY.head(20)

Unnamed: 0,predicted Y
0,0
1,0
2,0
3,0
4,0
5,0
6,0
7,0
8,0
9,0


In [192]:
data_cc = score_cc.join(predictedY) 
data_cc.head(20)

Unnamed: 0,Breach_Year,Total_Recs,Organization_Type_CAT,State_CAT,Hacking,Insider,Physical,Portable,Disclosure,Unknown,predicted Y
0,2006,1,2,0,0,0,0,1,0,0,0
1,2007,1,6,0,0,0,0,1,0,0,0
2,2007,1,1,0,0,0,0,1,0,0,0
3,2007,1,1,0,0,0,0,1,0,0,0
4,2007,0,2,0,0,1,0,0,0,0,0
5,2007,1,6,0,0,0,0,0,0,1,0
6,2008,0,2,0,0,0,0,1,0,0,0
7,2009,0,5,0,0,0,1,0,0,0,0
8,2009,1,4,0,1,0,0,0,0,0,0
9,2010,1,6,0,0,0,1,0,0,0,0


In [193]:
# Check credit card column to see if matches predicted Y
df_stat_sp.head(20)

Unnamed: 0,Breach_Year,Total_Recs,Organization_Type_CAT,State_CAT,CreditCard,Hacking,Insider,Physical,Portable,Disclosure,Unknown
0,2006,1,2,0,0,0,0,0,1,0,0
1,2007,1,6,0,0,0,0,0,1,0,0
2,2007,1,1,0,0,0,0,0,1,0,0
3,2007,1,1,0,0,0,0,0,1,0,0
4,2007,0,2,0,0,0,1,0,0,0,0
5,2007,1,6,0,0,0,0,0,0,0,1
6,2008,0,2,0,0,0,0,0,1,0,0
7,2009,0,5,0,0,0,0,1,0,0,0
8,2009,1,4,0,0,1,0,0,0,0,0
9,2010,1,6,0,0,0,0,1,0,0,0


### Credit Card was predicted correctly


#### Model Creation

In [194]:
#set X, y value
y = df_stat_sp['Disclosure']
X = df_stat_sp.drop(['Disclosure'], axis=1)

In [195]:
# evaluate the model by splitting into train (70%) and test sets (30%)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')

In [196]:
#Model evaluation

print  metrics.accuracy_score(y_test, dt.predict(X_test))
print  metrics.confusion_matrix(y_test, dt.predict(X_test))
print  metrics.classification_report(y_test, dt.predict(X_test))
print  metrics.roc_auc_score(y_test, dt.predict(X_test))

0.976365118174
[[2351   30]
 [  28   45]]
             precision    recall  f1-score   support

          0       0.99      0.99      0.99      2381
          1       0.60      0.62      0.61        73

avg / total       0.98      0.98      0.98      2454

0.80191930408


#### Model Deployment

In [197]:
df_score_d = df_stat_sp.drop(['Disclosure'], axis=1)
df_score_d.head()

Unnamed: 0,Breach_Year,Total_Recs,Organization_Type_CAT,State_CAT,CreditCard,Hacking,Insider,Physical,Portable,Unknown
0,2006,1,2,0,0,0,0,0,1,0
1,2007,1,6,0,0,0,0,0,1,0
2,2007,1,1,0,0,0,0,0,1,0
3,2007,1,1,0,0,0,0,0,1,0
4,2007,0,2,0,0,0,1,0,0,0


In [198]:
df_score_d.to_csv("scoringdataset_disc.csv",index=False)

In [199]:
# load scoringdataset_card.csv
score_d=pd.read_csv("scoringdataset_disc.csv")
print score_d.head()

   Breach_Year  Total_Recs  Organization_Type_CAT  State_CAT  CreditCard  \
0         2006           1                      2          0           0   
1         2007           1                      6          0           0   
2         2007           1                      1          0           0   
3         2007           1                      1          0           0   
4         2007           0                      2          0           0   

   Hacking  Insider  Physical  Portable  Unknown  
0        0        0         0         1        0  
1        0        0         0         1        0  
2        0        0         0         1        0  
3        0        0         0         1        0  
4        0        1         0         0        0  


In [200]:
# finally generate the predicted y value
predictedY = dt.predict(score_d)
predictedY

array([0, 0, 0, ..., 0, 0, 0], dtype=uint8)

In [201]:
predictedY = pd.DataFrame(predictedY, columns=['predicted Y'])
predictedY.head(20)

Unnamed: 0,predicted Y
0,0
1,0
2,0
3,0
4,0
5,0
6,0
7,0
8,0
9,0


### Disclosure was predicted correctly for first 20


In [202]:
# Check credit card column to see if matches predicted Y
df_stat_sp.head(20)

Unnamed: 0,Breach_Year,Total_Recs,Organization_Type_CAT,State_CAT,CreditCard,Hacking,Insider,Physical,Portable,Disclosure,Unknown
0,2006,1,2,0,0,0,0,0,1,0,0
1,2007,1,6,0,0,0,0,0,1,0,0
2,2007,1,1,0,0,0,0,0,1,0,0
3,2007,1,1,0,0,0,0,0,1,0,0
4,2007,0,2,0,0,0,1,0,0,0,0
5,2007,1,6,0,0,0,0,0,0,0,1
6,2008,0,2,0,0,0,0,0,1,0,0
7,2009,0,5,0,0,0,0,1,0,0,0
8,2009,1,4,0,0,1,0,0,0,0,0
9,2010,1,6,0,0,0,0,1,0,0,0


In [203]:
data_d = score_d.join(predictedY) 
data_d.head(20)

Unnamed: 0,Breach_Year,Total_Recs,Organization_Type_CAT,State_CAT,CreditCard,Hacking,Insider,Physical,Portable,Unknown,predicted Y
0,2006,1,2,0,0,0,0,0,1,0,0
1,2007,1,6,0,0,0,0,0,1,0,0
2,2007,1,1,0,0,0,0,0,1,0,0
3,2007,1,1,0,0,0,0,0,1,0,0
4,2007,0,2,0,0,0,1,0,0,0,0
5,2007,1,6,0,0,0,0,0,0,1,0
6,2008,0,2,0,0,0,0,0,1,0,0
7,2009,0,5,0,0,0,0,1,0,0,0
8,2009,1,4,0,0,1,0,0,0,0,0
9,2010,1,6,0,0,0,0,1,0,0,0


In [204]:
Org_Type_dm = pd.get_dummies(df_stat['Organization_Type_CAT'], prefix='Org_Type')
df_stat_sp = df_stat.join(Org_Type_dm)
df_stat_sp.head()

Unnamed: 0,Breach_Year,Total_Recs,Breach_Type_CAT,Organization_Type_CAT,State_CAT,Org_Type_1,Org_Type_2,Org_Type_3,Org_Type_4,Org_Type_5,Org_Type_6,Org_Type_7
0,2006,1,5,2,0,0,1,0,0,0,0,0
1,2007,1,5,6,0,0,0,0,0,0,1,0
2,2007,1,5,1,0,1,0,0,0,0,0,0
3,2007,1,5,1,0,1,0,0,0,0,0,0
4,2007,0,3,2,0,0,1,0,0,0,0,0


> # Logistic Regression
- Will be adding more