## Joseph Rochelle
## DSC 550 Data Mining
## 9.3 Neural Network Classifiers 

1. Neural Network Classifier with Scikit

Using the multi-label classifier dataset from earlier exercises (categorized-comments.jsonl in the reddit folder), fit a neural network classifier using scikit-learn. Use the code found in chapter 12 of the Applied Text Analysis with Python book as a guideline. Report the accuracy, precision, recall, F1-score, and confusion matrix.

2. Neural Network Classifier with Keras

Using the multi-label classifier dataset from earlier exercises (categorized-comments.jsonl in the reddit folder), fit a neural network classifier using Keras. Use the code found in chapter 12 of the Applied Text Analysis with Python book as a guideline. Report the accuracy, precision, recall, F1-score, and confusion matrix.

3. Classifying Images

In chapter 20 of the Machine Learning with Python Cookbook, implement the code found in section 20.15 classify MSINT images using a convolutional neural network. Report the accuracy of your results.

In [202]:
import pandas as pd
import json 
import random


In [203]:
df= pd.read_json("categorized-comments1.jsonl", lines=True)

In [204]:
df.head()

Unnamed: 0,cat,txt
0,sports,Barely better than Gabbert? He was significant...
1,sports,Fuck the ducks and the Angels! But welcome to ...
2,sports,Should have drafted more WRs.\n\n- Matt Millen...
3,sports,[Done](https://i.imgur.com/2YZ90pm.jpg)
4,sports,No!! NOO!!!!!


In [205]:
# Load libraries
import unicodedata
import sys
import re
import string

# Create a dictionary of punctionuation characters
punctuation = dict.fromkeys(i for i in range(sys.maxunicode) if unicodedata.category(chr(i)).startswith('P'))

# For each string, remove any punctuation characters
df['txt'] = [string.translate(punctuation) for string in df.txt]

In [206]:
df.head(11)

Unnamed: 0,cat,txt
0,sports,Barely better than Gabbert He was significantl...
1,sports,Fuck the ducks and the Angels But welcome to a...
2,sports,Should have drafted more WRs\n\n Matt Millen p...
3,sports,Donehttpsiimgurcom2YZ90pmjpg
4,sports,No NOO
5,sports,Ding dong the Kaepers gone Yes Friday off to a...
6,sports,yup\n\nThat would be best case scenario Still ...
7,sports,I think Larry Kruger made a good point on KNBR...
8,sports,This is great to have two wellregarded RB coac...
9,sports,79 next season confirmed


In [207]:
# Stop words

#load library
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

# Load stop words
stopWords = stopwords.words('english')

# tokenize the words
df['txt'] = [word_tokenize(string) for string in df.txt]

# remove stop words
df['txt'] = df['txt'].apply(lambda x: [item for item in x if item not in stopWords])

In [208]:
df.head(11)

Unnamed: 0,cat,txt
0,sports,"[Barely, better, Gabbert, He, significantly, b..."
1,sports,"[Fuck, ducks, Angels, But, welcome, new, niner..."
2,sports,"[Should, drafted, WRs, Matt, Millen, probably]"
3,sports,[Donehttpsiimgurcom2YZ90pmjpg]
4,sports,"[No, NOO]"
5,sports,"[Ding, dong, Kaepers, gone, Yes, Friday, good,..."
6,sports,"[yup, That, would, best, case, scenario, Still..."
7,sports,"[I, think, Larry, Kruger, made, good, point, K..."
8,sports,"[This, great, two, wellregarded, RB, coaches, ..."
9,sports,"[79, next, season, confirmed]"


In [209]:
#Dramatically reduced the DF
df.shape

(606475, 2)

In [210]:
# Stemming of the words

# load library
from nltk.stem.porter import PorterStemmer

# create stemmer
porter = PorterStemmer()

# Apply stemmer

df['txt'] = df['txt'].apply(lambda x: [porter.stem(word) for word in x])

In [211]:
df.head(11)

Unnamed: 0,cat,txt
0,sports,"[bare, better, gabbert, He, significantli, bet..."
1,sports,"[fuck, duck, angel, but, welcom, new, niner, fan]"
2,sports,"[should, draft, wr, matt, millen, probabl]"
3,sports,[donehttpsiimgurcom2yz90pmjpg]
4,sports,"[No, noo]"
5,sports,"[ding, dong, kaeper, gone, ye, friday, good, s..."
6,sports,"[yup, that, would, best, case, scenario, still..."
7,sports,"[I, think, larri, kruger, made, good, point, k..."
8,sports,"[thi, great, two, wellregard, RB, coach, team,..."
9,sports,"[79, next, season, confirm]"


In [212]:
# Part of Speech
#Libraries
from nltk import pos_tag
#from nltk import word_tokenize

# Use pre-trained part of speech tagger
textTagged = df['txt'].apply(lambda x: [pos_tag(x)])


In [213]:
textTagged

0         [[(bare, NN), (better, RBR), (gabbert, NN), (H...
1         [[(fuck, JJ), (duck, NN), (angel, NN), (but, C...
2         [[(should, MD), (draft, VB), (wr, NN), (matt, ...
3                    [[(donehttpsiimgurcom2yz90pmjpg, NN)]]
4                                   [[(No, DT), (noo, NN)]]
                                ...                        
606470    [[(gtani, NN), (chanc, NN), (instal, JJ), (ent...
606471    [[(No, DT), (it, PRP), (probabl, VBZ), (happen...
606472    [[(I, PRP), (think, VBP), (disappoint, NN), (c...
606473    [[(dishonor, NN), (12, CD), (look, NN), (like,...
606474                                      [[(remov, NN)]]
Name: txt, Length: 606475, dtype: object

In [214]:
# Part of speech added to data frame as a tupple
#Added to last DF
df['pos'] = textTagged
df.head()

Unnamed: 0,cat,txt,pos
0,sports,"[bare, better, gabbert, He, significantli, bet...","[[(bare, NN), (better, RBR), (gabbert, NN), (H..."
1,sports,"[fuck, duck, angel, but, welcom, new, niner, fan]","[[(fuck, JJ), (duck, NN), (angel, NN), (but, C..."
2,sports,"[should, draft, wr, matt, millen, probabl]","[[(should, MD), (draft, VB), (wr, NN), (matt, ..."
3,sports,[donehttpsiimgurcom2yz90pmjpg],"[[(donehttpsiimgurcom2yz90pmjpg, NN)]]"
4,sports,"[No, noo]","[[(No, DT), (noo, NN)]]"


In [216]:
# export df to csv only to speed up the model since we are using high dimensionality 
# After text cleaned, doing a flat file as the speed of the entire exercise was taking about an hour. 

df.to_csv('categComments.csv', index = False)

In [221]:
p = 0.0016  # .16% of the lines
# if random from [0,1] interval is greater than 0.0016 the row will be skipped
df = pd.read_csv('categComments.csv', 
         skiprows=lambda i: i>0 and random.random() > p
)


In [222]:
# Now target Variables for Neural Network Classifiers 
#Cleaning df
X = df.drop(['cat', 'pos'], axis = 1) 
y = df['cat']


In [223]:
# X for feature
# y for Variable
X = df.txt 
y = df.cat

In [224]:
X.head()

0       ['ani', 'thought', 'anyon', 'besid', 'pepper']
1         ['charl', 'want', 'go', 'contend', 'anyway']
2               ['you', 'didnt', 'answer', 'question']
3    ['gtwin', 'class', 'fuck', 'let', 'win', 'yeah...
4    ['worst', 'case', 'scenario', 'play', 'somewhe...
Name: txt, dtype: object

In [225]:
y.head()

0    sports
1    sports
2    sports
3    sports
4    sports
Name: cat, dtype: object

In [226]:
from sklearn.preprocessing import LabelEncoder
#from sklearn.preprocessing import OneHotEncoder
#from sklearn.compose import ColumnTransformer
#ordinal encode target variable
label_encoder = LabelEncoder()
label_encoder.fit(y)
y = label_encoder.transform(y)

In [227]:
print(y)

[1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 1 1 1 1 1 1 

In [228]:
type(y)

numpy.ndarray

In [229]:
type(X)

pandas.core.series.Series

In [230]:
print(X)

0         ['ani', 'thought', 'anyon', 'besid', 'pepper']
1           ['charl', 'want', 'go', 'contend', 'anyway']
2                 ['you', 'didnt', 'answer', 'question']
3      ['gtwin', 'class', 'fuck', 'let', 'win', 'yeah...
4      ['worst', 'case', 'scenario', 'play', 'somewhe...
                             ...                        
956    ['the', 'plant', 'move', 'prison', 'escap', 'd...
957    ['I', 'hadnt', 'thought', 'use', 'filter', 'tl...
958    ['I', 'download', 'updat', '8', 'time', '3', '...
959                           ['transistor', 'headphon']
960                           ['infam', 'second', 'son']
Name: txt, Length: 961, dtype: object


In [231]:
y.shape

(961,)

In [234]:
#y.reshape((961, 0))

## 1. Neural Network Classifier with Scikit

In [235]:
# use a function to train the neural network classifier model

def train_model(model,X,y, saveto=None, cv=12):
    """
    Trains model from corpus at specified path; constructing cross-validation
    scores using the cv parameter, then fitting the model on the full data and
    writing it to disk at the saveto path if specified. Returns the scores.
    """
    # Load the corpus data and labels for classification
#     corpus = PickledCorpusReader(path)
    # corpus is df
    X = list(X)
    y = list(y)
    scoring = {'accuracy': 'accuracy',
           'precision': 'precision_macro',
           'recall': 'recall_macro',
              'f1': 'f1_macro'}

    # Compute cross validation scores
    scores = cross_validate(model, X, y, cv=cv, scoring = scoring)

    # Fit the model on entire data set
    model1 = model.fit(X, y)

    # Write to disk if specified
    if saveto:
        joblib.dump(model, saveto)

    # Return fitted model 
    
    return model1

In [236]:
# build a pipeline to create the model for training

# import libaries
from sklearn.pipeline import Pipeline
from sklearn.neural_network import MLPClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_validate

# build pipeline
classifier = Pipeline([
    ('tdif', TfidfVectorizer()),
    ('ann', MLPClassifier(hidden_layer_sizes=(100, ), verbose=True))
])

In [237]:
# call the train model function and return the fitted model
model1 = train_model(classifier, X, y)

Iteration 1, loss = 1.11584235
Iteration 2, loss = 1.05868134
Iteration 3, loss = 1.00627971
Iteration 4, loss = 0.95161347
Iteration 5, loss = 0.89384857
Iteration 6, loss = 0.83259550
Iteration 7, loss = 0.77249147
Iteration 8, loss = 0.71362439
Iteration 9, loss = 0.66064738
Iteration 10, loss = 0.61390352
Iteration 11, loss = 0.57273228
Iteration 12, loss = 0.53527516
Iteration 13, loss = 0.50064841
Iteration 14, loss = 0.46815228
Iteration 15, loss = 0.43615225
Iteration 16, loss = 0.40561882
Iteration 17, loss = 0.37614668
Iteration 18, loss = 0.34863350
Iteration 19, loss = 0.32251924
Iteration 20, loss = 0.29766939
Iteration 21, loss = 0.27493711
Iteration 22, loss = 0.25382639
Iteration 23, loss = 0.23464741
Iteration 24, loss = 0.21713290
Iteration 25, loss = 0.20146052
Iteration 26, loss = 0.18683722
Iteration 27, loss = 0.17417902
Iteration 28, loss = 0.16228786
Iteration 29, loss = 0.15175827
Iteration 30, loss = 0.14242730
Iteration 31, loss = 0.13397976
Iteration 32, los

Iteration 102, loss = 0.03920528
Iteration 103, loss = 0.03908550
Iteration 104, loss = 0.03902338
Iteration 105, loss = 0.03888136
Iteration 106, loss = 0.03892057
Iteration 107, loss = 0.03875345
Iteration 108, loss = 0.03864179
Iteration 109, loss = 0.03849872
Iteration 110, loss = 0.03839339
Iteration 111, loss = 0.03833948
Iteration 112, loss = 0.03817829
Iteration 113, loss = 0.03823023
Iteration 114, loss = 0.03829077
Iteration 115, loss = 0.03808740
Iteration 116, loss = 0.03819017
Iteration 117, loss = 0.03803592
Iteration 118, loss = 0.03791802
Iteration 119, loss = 0.03788354
Iteration 120, loss = 0.03774410
Iteration 121, loss = 0.03763897
Iteration 122, loss = 0.03761880
Iteration 123, loss = 0.03751225
Iteration 124, loss = 0.03748118
Iteration 125, loss = 0.03747760
Iteration 126, loss = 0.03734385
Iteration 127, loss = 0.03759785
Iteration 128, loss = 0.03736398
Iteration 129, loss = 0.03717578
Iteration 130, loss = 0.03714210
Iteration 131, loss = 0.03707824
Iteration 

Iteration 1, loss = 0.99004658
Iteration 2, loss = 0.93920207
Iteration 3, loss = 0.89189878
Iteration 4, loss = 0.84251718
Iteration 5, loss = 0.79048249
Iteration 6, loss = 0.73889859
Iteration 7, loss = 0.68827449
Iteration 8, loss = 0.64189504
Iteration 9, loss = 0.59869418
Iteration 10, loss = 0.55924896
Iteration 11, loss = 0.52204093
Iteration 12, loss = 0.48583106
Iteration 13, loss = 0.45137163
Iteration 14, loss = 0.41746916
Iteration 15, loss = 0.38662651
Iteration 16, loss = 0.35588191
Iteration 17, loss = 0.32809542
Iteration 18, loss = 0.30197668
Iteration 19, loss = 0.27845679
Iteration 20, loss = 0.25711152
Iteration 21, loss = 0.23721274
Iteration 22, loss = 0.21995967
Iteration 23, loss = 0.20365375
Iteration 24, loss = 0.18956711
Iteration 25, loss = 0.17655410
Iteration 26, loss = 0.16489830
Iteration 27, loss = 0.15414365
Iteration 28, loss = 0.14457042
Iteration 29, loss = 0.13553662
Iteration 30, loss = 0.12763287
Iteration 31, loss = 0.12017327
Iteration 32, los

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)


Iteration 1, loss = 0.97320247
Iteration 2, loss = 0.91816382
Iteration 3, loss = 0.86834733
Iteration 4, loss = 0.81914469
Iteration 5, loss = 0.77076876
Iteration 6, loss = 0.72269408
Iteration 7, loss = 0.67795111
Iteration 8, loss = 0.63449104
Iteration 9, loss = 0.59465644
Iteration 10, loss = 0.55670389
Iteration 11, loss = 0.51985131
Iteration 12, loss = 0.48434832
Iteration 13, loss = 0.44857212
Iteration 14, loss = 0.41378114
Iteration 15, loss = 0.38120741
Iteration 16, loss = 0.34987703
Iteration 17, loss = 0.32073358
Iteration 18, loss = 0.29375266
Iteration 19, loss = 0.26965597
Iteration 20, loss = 0.24785121
Iteration 21, loss = 0.22782105
Iteration 22, loss = 0.21031930
Iteration 23, loss = 0.19415402
Iteration 24, loss = 0.17970411
Iteration 25, loss = 0.16694136
Iteration 26, loss = 0.15533000
Iteration 27, loss = 0.14508483
Iteration 28, loss = 0.13557187
Iteration 29, loss = 0.12716979
Iteration 30, loss = 0.11955916
Iteration 31, loss = 0.11265247
Iteration 32, los

Iteration 108, loss = 0.03716463
Iteration 109, loss = 0.03694916
Iteration 110, loss = 0.03685099
Iteration 111, loss = 0.03690758
Iteration 112, loss = 0.03680892
Iteration 113, loss = 0.03670729
Iteration 114, loss = 0.03667772
Iteration 115, loss = 0.03655031
Iteration 116, loss = 0.03640601
Iteration 117, loss = 0.03638674
Iteration 118, loss = 0.03620444
Iteration 119, loss = 0.03612276
Iteration 120, loss = 0.03619012
Iteration 121, loss = 0.03612695
Iteration 122, loss = 0.03598544
Iteration 123, loss = 0.03587752
Iteration 124, loss = 0.03579685
Iteration 125, loss = 0.03581280
Iteration 126, loss = 0.03566784
Iteration 127, loss = 0.03559813
Iteration 128, loss = 0.03552481
Iteration 129, loss = 0.03544345
Iteration 130, loss = 0.03541309
Iteration 131, loss = 0.03539162
Iteration 132, loss = 0.03540143
Iteration 133, loss = 0.03533878
Iteration 134, loss = 0.03523785
Iteration 135, loss = 0.03521538
Iteration 136, loss = 0.03526765
Iteration 137, loss = 0.03519108
Iteration 

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)


Iteration 1, loss = 1.19807521
Iteration 2, loss = 1.13430801
Iteration 3, loss = 1.07358486
Iteration 4, loss = 1.01136404
Iteration 5, loss = 0.94496878
Iteration 6, loss = 0.87663438
Iteration 7, loss = 0.80915539
Iteration 8, loss = 0.74624141
Iteration 9, loss = 0.68765492
Iteration 10, loss = 0.63633607
Iteration 11, loss = 0.59071799
Iteration 12, loss = 0.54945132
Iteration 13, loss = 0.51145767
Iteration 14, loss = 0.47542738
Iteration 15, loss = 0.44070776
Iteration 16, loss = 0.40745984
Iteration 17, loss = 0.37695486
Iteration 18, loss = 0.34802707
Iteration 19, loss = 0.32130537
Iteration 20, loss = 0.29631661
Iteration 21, loss = 0.27360193
Iteration 22, loss = 0.25241718
Iteration 23, loss = 0.23382906
Iteration 24, loss = 0.21648746
Iteration 25, loss = 0.20083345
Iteration 26, loss = 0.18649832
Iteration 27, loss = 0.17387872
Iteration 28, loss = 0.16254126
Iteration 29, loss = 0.15211347
Iteration 30, loss = 0.14267335
Iteration 31, loss = 0.13429104
Iteration 32, los

Iteration 86, loss = 0.03813642
Iteration 87, loss = 0.03791576
Iteration 88, loss = 0.03771681
Iteration 89, loss = 0.03773986
Iteration 90, loss = 0.03748815
Iteration 91, loss = 0.03724440
Iteration 92, loss = 0.03714230
Iteration 93, loss = 0.03685957
Iteration 94, loss = 0.03695043
Iteration 95, loss = 0.03663678
Iteration 96, loss = 0.03644703
Iteration 97, loss = 0.03626164
Iteration 98, loss = 0.03640300
Iteration 99, loss = 0.03608018
Iteration 100, loss = 0.03590387
Iteration 101, loss = 0.03582984
Iteration 102, loss = 0.03578050
Iteration 103, loss = 0.03573088
Iteration 104, loss = 0.03551760
Iteration 105, loss = 0.03542407
Iteration 106, loss = 0.03526123
Iteration 107, loss = 0.03518097
Iteration 108, loss = 0.03518413
Iteration 109, loss = 0.03506728
Iteration 110, loss = 0.03484861
Iteration 111, loss = 0.03480150
Iteration 112, loss = 0.03465807
Iteration 113, loss = 0.03461604
Iteration 114, loss = 0.03468020
Iteration 115, loss = 0.03452396
Iteration 116, loss = 0.

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)


Iteration 1, loss = 1.21096737
Iteration 2, loss = 1.13540443
Iteration 3, loss = 1.06825677
Iteration 4, loss = 1.00097781
Iteration 5, loss = 0.93198871
Iteration 6, loss = 0.86144444
Iteration 7, loss = 0.79342410
Iteration 8, loss = 0.72764404
Iteration 9, loss = 0.66886554
Iteration 10, loss = 0.61643730
Iteration 11, loss = 0.56971992
Iteration 12, loss = 0.52722741
Iteration 13, loss = 0.48819727
Iteration 14, loss = 0.45124291
Iteration 15, loss = 0.41740820
Iteration 16, loss = 0.38464783
Iteration 17, loss = 0.35456739
Iteration 18, loss = 0.32673572
Iteration 19, loss = 0.30139212
Iteration 20, loss = 0.27843018
Iteration 21, loss = 0.25736282
Iteration 22, loss = 0.23800038
Iteration 23, loss = 0.22092460
Iteration 24, loss = 0.20546247
Iteration 25, loss = 0.19123601
Iteration 26, loss = 0.17855112
Iteration 27, loss = 0.16719510
Iteration 28, loss = 0.15675027
Iteration 29, loss = 0.14724449
Iteration 30, loss = 0.13864844
Iteration 31, loss = 0.13089793
Iteration 32, los

  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)


Iteration 1, loss = 1.11815554
Iteration 2, loss = 1.06671792
Iteration 3, loss = 1.01758524
Iteration 4, loss = 0.96194472
Iteration 5, loss = 0.90199487
Iteration 6, loss = 0.83667402
Iteration 7, loss = 0.77207214
Iteration 8, loss = 0.71056946
Iteration 9, loss = 0.65389229
Iteration 10, loss = 0.60407143
Iteration 11, loss = 0.56066665
Iteration 12, loss = 0.52174143
Iteration 13, loss = 0.48533129
Iteration 14, loss = 0.45158294
Iteration 15, loss = 0.41904039
Iteration 16, loss = 0.38805286
Iteration 17, loss = 0.35855389
Iteration 18, loss = 0.33167591
Iteration 19, loss = 0.30602152
Iteration 20, loss = 0.28221292
Iteration 21, loss = 0.26012285
Iteration 22, loss = 0.24019631
Iteration 23, loss = 0.22234578
Iteration 24, loss = 0.20553221
Iteration 25, loss = 0.19045959
Iteration 26, loss = 0.17663610
Iteration 27, loss = 0.16433270
Iteration 28, loss = 0.15328697
Iteration 29, loss = 0.14295914
Iteration 30, loss = 0.13391626
Iteration 31, loss = 0.12576631
Iteration 32, los

Iteration 93, loss = 0.03478960
Iteration 94, loss = 0.03458251
Iteration 95, loss = 0.03453444
Iteration 96, loss = 0.03447960
Iteration 97, loss = 0.03416531
Iteration 98, loss = 0.03400601
Iteration 99, loss = 0.03389517
Iteration 100, loss = 0.03383089
Iteration 101, loss = 0.03377264
Iteration 102, loss = 0.03356856
Iteration 103, loss = 0.03344694
Iteration 104, loss = 0.03330890
Iteration 105, loss = 0.03325990
Iteration 106, loss = 0.03319673
Iteration 107, loss = 0.03295817
Iteration 108, loss = 0.03292838
Iteration 109, loss = 0.03283988
Iteration 110, loss = 0.03296521
Iteration 111, loss = 0.03268641
Iteration 112, loss = 0.03271108
Iteration 113, loss = 0.03252735
Iteration 114, loss = 0.03245646
Iteration 115, loss = 0.03238734
Iteration 116, loss = 0.03234641
Iteration 117, loss = 0.03227725
Iteration 118, loss = 0.03213803
Iteration 119, loss = 0.03205529
Iteration 120, loss = 0.03193720
Iteration 121, loss = 0.03186591
Iteration 122, loss = 0.03177427
Iteration 123, lo

Iteration 32, loss = 0.11009471
Iteration 33, loss = 0.10421010
Iteration 34, loss = 0.09889072
Iteration 35, loss = 0.09418078
Iteration 36, loss = 0.08994771
Iteration 37, loss = 0.08608197
Iteration 38, loss = 0.08247080
Iteration 39, loss = 0.07923439
Iteration 40, loss = 0.07628090
Iteration 41, loss = 0.07361502
Iteration 42, loss = 0.07110171
Iteration 43, loss = 0.06891151
Iteration 44, loss = 0.06681072
Iteration 45, loss = 0.06489220
Iteration 46, loss = 0.06312314
Iteration 47, loss = 0.06153101
Iteration 48, loss = 0.06006927
Iteration 49, loss = 0.05856603
Iteration 50, loss = 0.05733246
Iteration 51, loss = 0.05615911
Iteration 52, loss = 0.05502148
Iteration 53, loss = 0.05397744
Iteration 54, loss = 0.05306228
Iteration 55, loss = 0.05211931
Iteration 56, loss = 0.05126838
Iteration 57, loss = 0.05055383
Iteration 58, loss = 0.04971749
Iteration 59, loss = 0.04902880
Iteration 60, loss = 0.04835028
Iteration 61, loss = 0.04780376
Iteration 62, loss = 0.04718294
Iteratio

In [238]:
# make predictions from fitted model
predicts1 = model1.predict(X)

In [239]:
# return measurement calculations

from sklearn.metrics import accuracy_score 
from sklearn.metrics import precision_score 
from sklearn.metrics import recall_score 
from sklearn.metrics import f1_score 
from sklearn.metrics import confusion_matrix  

# accuracy: (tp + tn) / (p + n) 
accuracy = accuracy_score(y, predicts1) 
print('Accuracy: %f' % accuracy) 

# precision tp / (tp + fp) 
precision = precision_score(y, predicts1, average = 'macro') 
print('Precision: %f' % precision) 

# recall: tp / (tp + fn) 
recall = recall_score(y, predicts1, average = 'macro') 
print('Recall: %f' % recall) 

# f1: 2 tp / (2 tp + fp + fn) 
f1 = f1_score(y, predicts1, average = 'macro') 
print('F1 score: %f' % f1)   


Accuracy: 0.986472
Precision: 0.993914
Recall: 0.973882
F1 score: 0.983567


In [240]:
# create confusion matrix 
matrix = confusion_matrix(y, predicts1)

In [241]:
# print confusion matrix
print(matrix)

[[ 41   0   1]
 [  0 208  12]
 [  0   0 699]]


**Discussion:** The model performance is 100 based off only 750 samples sent to the nueral network. This was due to the amount of data processing. 

## 2. Neural Network Classifier with Keras

In [242]:
# Create the tf-idf feature matrix
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(max_features = 3000)
X = tfidf.fit_transform(df['pos'])

In [243]:
# the TfidfVectorizer turned the feature variable to a sparse matrix, which causes problems in the model.
# solve the error by converting the sparse matrix to a dense matrix
X = X.todense()

In [244]:
X.shape

(961, 3000)

In [185]:
from keras.utils.np_utils import to_categorical
from keras.preprocessing.text import Tokenizer
from keras import models
from keras import layers
from keras.layers import Dense
from keras.models import Sequential
from keras.wrappers.scikit_learn import KerasClassifier

In [245]:
nFeatures = 3000
nClasses = 3

In [246]:
# build the model 
def build_network():
    """
    Create a function that returns a compiled neural network
    """
    nn = Sequential()
    nn.add(Dense(500, activation = 'relu', input_shape =(nFeatures,)))
    nn.add(Dense(150, activation = 'relu'))
    nn.add(Dense(nClasses, activation = 'softmax'))
    nn.compile(loss = 'categorical_crossentropy',
              optimizer = 'adam',
              metrics = ['accuracy']
              )
    return nn

In [247]:
# train the model
nn2 = KerasClassifier(build_fn = build_network, 
                            epochs = 200,
                            batch_size = 128)
nn2.fit(X,y, validation_split=0.33)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200


Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78/200
Epoch 79/200
Epoch 80/200
Epoch 81/200
Epoch 82/200
Epoch 83/200
Epoch 84/200
Epoch 85/200
Epoch 86/200
Epoch 87/200
Epoch 88/200
Epoch 89/200
Epoch 90/200
Epoch 91/200
Epoch 92/200
Epoch 93/200
Epoch 94/200
Epoch 95/200
Epoch 96/200
Epoch 97/200
Epoch 98/200
Epoch 99/200
Epoch 100/200
Epoch 101/200
Epoch 102/200
Epoch 103/200
Epoch 104/200
Epoch 105/200
Epoch 106/200
Epoch 107/200
Epoch 108/200
Epoch 109/200
Epoch 110/200
Epoch 111/200
Epoch 112/200
Epoch 113/200
Epoch 114/200
Epoch 115/200
Epoch 116/200
Epoch 117/200


Epoch 118/200
Epoch 119/200
Epoch 120/200
Epoch 121/200
Epoch 122/200
Epoch 123/200
Epoch 124/200
Epoch 125/200
Epoch 126/200
Epoch 127/200
Epoch 128/200
Epoch 129/200
Epoch 130/200
Epoch 131/200
Epoch 132/200
Epoch 133/200
Epoch 134/200
Epoch 135/200
Epoch 136/200
Epoch 137/200
Epoch 138/200
Epoch 139/200
Epoch 140/200
Epoch 141/200
Epoch 142/200
Epoch 143/200
Epoch 144/200
Epoch 145/200
Epoch 146/200
Epoch 147/200
Epoch 148/200
Epoch 149/200
Epoch 150/200
Epoch 151/200
Epoch 152/200
Epoch 153/200
Epoch 154/200
Epoch 155/200
Epoch 156/200
Epoch 157/200
Epoch 158/200
Epoch 159/200
Epoch 160/200
Epoch 161/200
Epoch 162/200
Epoch 163/200
Epoch 164/200
Epoch 165/200
Epoch 166/200
Epoch 167/200
Epoch 168/200
Epoch 169/200
Epoch 170/200
Epoch 171/200
Epoch 172/200
Epoch 173/200
Epoch 174/200


Epoch 175/200
Epoch 176/200
Epoch 177/200
Epoch 178/200
Epoch 179/200
Epoch 180/200
Epoch 181/200
Epoch 182/200
Epoch 183/200
Epoch 184/200
Epoch 185/200
Epoch 186/200
Epoch 187/200
Epoch 188/200
Epoch 189/200
Epoch 190/200
Epoch 191/200
Epoch 192/200
Epoch 193/200
Epoch 194/200
Epoch 195/200
Epoch 196/200
Epoch 197/200
Epoch 198/200
Epoch 199/200
Epoch 200/200


<tensorflow.python.keras.callbacks.History at 0x1f7c985d448>

In [248]:
# make predictions from fitted model
predicts2 = nn2.predict(X)

In [249]:
from sklearn.metrics import accuracy_score 
from sklearn.metrics import precision_score 
from sklearn.metrics import recall_score 
from sklearn.metrics import f1_score 
from sklearn.metrics import cohen_kappa_score 
from sklearn.metrics import roc_auc_score 
from sklearn.metrics import confusion_matrix  

# accuracy: (tp + tn) / (p + n) 
accuracy = accuracy_score(y, predicts2) 
print('Accuracy: %f' % accuracy) 

# precision tp / (tp + fp) 
precision = precision_score(y, predicts2, average = 'macro') 
print('Precision: %f' % precision) 

# recall: tp / (tp + fn) 
recall = recall_score(y, predicts2, average = 'macro') 
print('Recall: %f' % recall) 

# f1: 2 tp / (2 tp + fp + fn) 
f1 = f1_score(y, predicts2, average = 'macro') 
print('F1 score: %f' % f1)   


Accuracy: 0.919875
Precision: 0.908437
Recall: 0.940247
F1 score: 0.921547


In [250]:
# confusion matrix 
matrix2 = confusion_matrix(y, predicts2)

In [251]:
# print confusion matrix
print(matrix2)

[[ 41   0   1]
 [  0 205  15]
 [  1  60 638]]


## 3. Classifying Images

In [147]:
# import libraries
import numpy as np
import pandas as pd
import random
from keras.datasets import mnist
from keras.preprocessing.text import Tokenizer
from keras import models
from keras import layers
from keras.layers import Dense, Dropout, Flatten
from keras.models import Sequential
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.utils import np_utils
from keras import backend as K


In [148]:
# Set that the color channel value will be first
#K.set_image_data_format("channels_first")
K.set_image_data_format("channels_last")

In [149]:
# Set seed
np.random.seed(0)

In [150]:
# Set image information
channels = 1
height = 28
width = 28

In [151]:
# Load data and target from MNIST data
(data_train, target_train), (data_test, target_test) = mnist.load_data()

In [152]:
# Reshape training image data into features
data_train = data_train.reshape(data_train.shape[0], height, width, channels)
                                                 #  height x width x channels

In [153]:
# Reshape test image data into features
data_test = data_test.reshape(data_test.shape[0], height, width, channels)

In [154]:
# Rescale pixel intensity to between 0 and 1
features_train = data_train / 255
features_test = data_test / 255

In [155]:
# One-hot encode target
target_train = np_utils.to_categorical(target_train)
target_test = np_utils.to_categorical(target_test)
number_of_classes = target_test.shape[1]

In [156]:
# Start neural network
network = Sequential()

In [157]:
# Add convolutional layer with 64 filters, a 5x5 window, and ReLU activation function
network.add(Conv2D(filters = 64,
                  kernel_size = (5, 5),
                  input_shape=(width, height, channels),
                  activation = 'relu'))

In [158]:
# Add max pooling layer with a 2x2 window
network.add(MaxPooling2D(pool_size = (2, 2))) # , data_format='channels_last'
#           MaxPooling2D(pool_size=[3, 3], strides=2, padding='same', data_format='channels_first')

In [159]:
# Add dropout layer
network.add(Dropout(0.5))

In [160]:
# Add layer to flatten input
network.add(Flatten())

In [161]:
# Add fully connected layer of 128 units with a ReLU activiation function
network.add(Dense(128, activation = 'relu'))

In [162]:
# Add dropout layer
network.add(Dropout(0.5))

In [163]:
# Add fully connected layer with a softmax activiation function
network.add(Dense(number_of_classes, activation = 'softmax'))

In [164]:
# Compile neural network
network.compile(loss = "categorical_crossentropy", # Cross-entropy
               optimizer = "rmsprop", # Root Mean Square Propagation
               metrics = ['accuracy']) # Accuracy performance metric

In [165]:
# Train neural network
network.fit(features_train, # Features
           target_train, # Target
           epochs = 2, # Number of epochs
           verbose = 0, # Don't print description after each eposh
           batch_size = 1000, # Number of observations per batch
           validation_data = (features_test, target_test)) # Data for evaluation

<tensorflow.python.keras.callbacks.History at 0x2c681d20908>

In [166]:
preds = network.predict(features_test)

In [174]:
network.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_5 (Conv2D)            (None, 24, 24, 64)        1664      
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
dropout_10 (Dropout)         (None, 12, 12, 64)        0         
_________________________________________________________________
flatten_5 (Flatten)          (None, 9216)              0         
_________________________________________________________________
dense_10 (Dense)             (None, 128)               1179776   
_________________________________________________________________
dropout_11 (Dropout)         (None, 128)               0         
_________________________________________________________________
dense_11 (Dense)             (None, 10)               

In [188]:
network.compile(loss='categorical_crossentropy', 
              optimizer='adam',
              metrics=['accuracy'])

In [189]:
loss, acc = network.evaluate(features_test, target_test, batch_size=1000)
print("\nTest accuracy: %.1f%%" % (100.0 * acc))


Test accuracy: 97.7%


**Discussion:** The model accuragy was 97.7 for this classification image.  This is based off the paramaters that are used within this model. 