## D. Kinney DSC 550 9.3 Exercise: Neural Network Classifiers

1. **Neural Network Classifier with Scikit**

Using the multi-label classifier dataset from earlier exercises (categorized-comments.jsonl in the reddit folder), fit a neural network classifier using scikit-learn. Use the code found in chapter 12 of the Applied Text Analysis with Python book as a guideline. Report the accuracy, precision, recall, F1-score, and confusion matrix.

2. **Neural Network Classifier with Keras**

Using the multi-label classifier dataset from earlier exercises (categorized-comments.jsonl in the reddit folder), fit a neural network classifier using Keras. Use the code found in chapter 12 of the Applied Text Analysis with Python book as a guideline. Report the accuracy, precision, recall, F1-score, and confusion matrix.

3. **Classifying Images**

In chapter 20 of the Machine Learning with Python Cookbook, implement the code found in section 20.15 classify MSINT images using a convolutional neural network. Report the accuracy of your results.

*********************************************
#### 1. Neural Network Classifier with Scikit

In [0]:
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd

In [4]:
df = pd.read_csv('categorized-comments.csv')
df.dropna(inplace=True)
df.shape

(606467, 3)

In [0]:
# This dataset is HUGE. Sample 50k observations...
df = df.sample(50000)

In [9]:
df.shape

(50000, 3)

In [10]:
# Extracting features from text files
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(df.txt)
X_train_counts.shape

(50000, 41435)

In [11]:
# TF-IDF
from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
X_train_tfidf.shape

(50000, 41435)

In [0]:
# Train Multi-layer Perceptron (MLP) classifier on training data.
from sklearn.neural_network import MLPClassifier
mlp_clf = MLPClassifier().fit(X_train_tfidf, df.cat)

In [21]:
# Performance of MLP Classifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    df.txt, df.cat, test_size=0.33, random_state=42)

# Consolidate steps into a pipeline...
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score

text_clf = Pipeline([
    ('vect', CountVectorizer()), 
    ('tfidf', TfidfTransformer()), 
    ('ann', MLPClassifier(hidden_layer_sizes=[30,30],
                          max_iter=50, 
                          verbose = True))]) 
scoring = 'f1_micro'
scores = cross_val_score(text_clf, X_train, y_train, scoring=scoring)
text_clf = text_clf.fit(X_train, y_train)

Iteration 1, loss = 0.84887997
Iteration 2, loss = 0.52073163
Iteration 3, loss = 0.35671557
Iteration 4, loss = 0.25550017
Iteration 5, loss = 0.19867513
Iteration 6, loss = 0.16683300
Iteration 7, loss = 0.14523074
Iteration 8, loss = 0.13189494
Iteration 9, loss = 0.12203169
Iteration 10, loss = 0.11379164
Iteration 11, loss = 0.10850873
Iteration 12, loss = 0.10200501
Iteration 13, loss = 0.09708455
Iteration 14, loss = 0.09192045
Iteration 15, loss = 0.08938363
Iteration 16, loss = 0.08477371
Iteration 17, loss = 0.08184813
Iteration 18, loss = 0.07996164
Iteration 19, loss = 0.07608177
Iteration 20, loss = 0.07397218
Iteration 21, loss = 0.07279138
Iteration 22, loss = 0.07070569
Iteration 23, loss = 0.06909621
Iteration 24, loss = 0.06777370
Iteration 25, loss = 0.06679954
Iteration 26, loss = 0.06505657
Iteration 27, loss = 0.06428570
Iteration 28, loss = 0.06264496
Iteration 29, loss = 0.06290985
Iteration 30, loss = 0.06146980
Iteration 31, loss = 0.06171653
Iteration 32, los

In [24]:
from sklearn.metrics import accuracy_score, confusion_matrix, precision_recall_fscore_support
from sklearn.metrics import classification_report

print(f"f1 scores: {scores}")
predicted = text_clf.predict(X_test)
print("Accuracy: ", np.mean(predicted == y_test))

print("Confusion Matrix:\n", confusion_matrix(y_test, predicted))
print("Classification Report:\n", classification_report(y_test,predicted))
print("Accuracy: ", accuracy_score(y_test,predicted))

f1 scores: [0.78402985 0.7819403  0.77985075 0.78701493 0.79104478]
0.7914545454545454
Confusion Matrix:
  [[  301    65   287]
 [   55  2433  1490]
 [  203  1341 10325]]
Classification Report:
                          precision    recall  f1-score   support

science_and_technology       0.54      0.46      0.50       653
                sports       0.63      0.61      0.62      3978
           video_games       0.85      0.87      0.86     11869

              accuracy                           0.79     16500
             macro avg       0.68      0.65      0.66     16500
          weighted avg       0.79      0.79      0.79     16500

Accuracy:  0.7914545454545454


*********************************************
#### 2. Neural Network Classifier with Keras

In [0]:
from keras.layers import Dense 
from keras.models import Sequential 

N_FEATURES = 5000 
N_CLASSES = 4 

def build_network():
    """ 
    Create a function that returns a compiled neural network 
    """ 
    nn = Sequential() 
    nn.add( Dense( 500, activation ='relu', input_shape =( N_FEATURES,))) 
    nn.add( Dense( 150, activation ='relu')) 
    nn.add( Dense( N_CLASSES, activation ='softmax')) 
    nn.compile( 
        loss ='categorical_crossentropy', 
        optimizer ='adam', 
        metrics =['accuracy'] ) 
    return nn

Using TensorFlow backend.


In [0]:
from keras.wrappers.scikit_learn import KerasClassifier 
from sklearn.feature_extraction.text import TfidfVectorizer 

pipeline = Pipeline([ 
    ('vect', CountVectorizer()),
    ('vect', TfidfVectorizer( max_features = N_FEATURES)), 
    ('nn', KerasClassifier( build_fn = build_network, 
                            epochs = 200, 
                            batch_size = 128)) ])

scores = cross_val_score( model, X_train, y_train, cv = 12, scoring =' accuracy', n_jobs =-1) 
k_clf.fit(X_train, y_train) 
print(scores)
predicted = k_clf_clf.predict(X_test)
print(np.mean(predicted == y_test))

*********************************************
#### 3. Classifying Images

In [0]:
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.utils import np_utils
from keras import backend as K

# Set that the color channel value will be first
K.set_image_data_format("channels_first")

# Set seed
np.random.seed(0)

# Set image information
channels = 1
height = 28
width = 28

# Load data and target from MNIST data
(data_train, target_train), (data_test, target_test) = mnist.load_data()

# Reshape training image data into features
data_train = data_train.reshape(data_train.shape[0], channels, height, width)

# Reshape test image data into features
data_test = data_test.reshape(data_test.shape[0], channels, height, width)

# Rescale pixel intensity to between 0 and 1
features_train = data_train / 255
features_test = data_test / 255

# One-hot encode target
target_train = np_utils.to_categorical(target_train)
target_test = np_utils.to_categorical(target_test)
number_of_classes = target_test.shape[1]

# Start neural network
network = Sequential()

# Add convolutional layer with 64 filters, a 5x5 window, and ReLU activation function
network.add(Conv2D(filters=64,
                   kernel_size=(5, 5),
                   input_shape=(channels, width, height),
                   activation='relu'))

# Add max pooling layer with a 2x2 window
network.add(MaxPooling2D(pool_size=(2, 2)))

# Add dropout layer
network.add(Dropout(0.5))

# Add layer to flatten input
network.add(Flatten())

# # Add fully connected layer of 128 units with a ReLU activation function
network.add(Dense(128, activation="relu"))

# Add dropout layer
network.add(Dropout(0.5))

# Add fully connected layer with a softmax activation function
network.add(Dense(number_of_classes, activation="softmax"))

# Compile neural network
network.compile(loss="categorical_crossentropy", # Cross-entropy
                optimizer="rmsprop", # Root Mean Square Propagation
                metrics=["accuracy"]) # Accuracy performance metric

# Train neural network
network.fit(features_train, # Features
            target_train, # Target
            epochs=2, # Number of epochs
            verbose=0, # Don't print description after each epoch

batch_size=1000, # Number of observations per batch
            validation_data=(features_test, target_test)) # Data for evaluation

Using TensorFlow backend.


<keras.callbacks.callbacks.History at 0x120168f3f98>

In [0]:
score = network.evaluate(features_test, target_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.09088441950436682
Test accuracy: 0.9721999764442444


*************************
**Additional References**

Machine Learning, NLP: Text Classification using scikit-learn, python and NLTK. Jul 23, 2017
https://towardsdatascience.com/machine-learning-nlp-text-classification-using-scikit-learn-python-and-nltk-c52b92a7c73a