### DSC - 550

#### 1. Neural Network Classifier with Scikit

Using the multi-label classifier dataset (categorized-comments.jsonl in the reddit folder), fit a neural network classifier using scikit-learn. Report the accuracy, precision, recall, F1-score, and confusion matrix.

In [1]:
import pandas as pd

# load the data
df = pd.read_json('categorized-comments/categorized-comments.jsonl', lines=True, encoding='utf-8')
df.head()

Unnamed: 0,cat,txt
0,sports,Barely better than Gabbert? He was significant...
1,sports,Fuck the ducks and the Angels! But welcome to ...
2,sports,Should have drafted more WRs.\n\n- Matt Millen...
3,sports,[Done](https://i.imgur.com/2YZ90pm.jpg)
4,sports,No!! NOO!!!!!


In [2]:
# import packages
from sklearn.model_selection import train_test_split

# take sample from data frame
df = df.sample(n=10000)

data_model_X = df['txt']

# create a whole target dataset that can be used for train and validation data splitting
data_model_y = df['cat']


# separate data into training and validation and check the details of the datasets
# split the data
X_train, X_val, y_train, y_val = train_test_split(data_model_X, data_model_y, test_size =0.2, random_state=11)


In [3]:
# find unique values in 'cat' column
print (df['cat'].unique())

['video_games' 'sports' 'science_and_technology']


In [4]:
# import packages
from sklearn.model_selection import train_test_split

#classifying the predictors and target variables as X and Y
data_model_X = df['txt']
data_model_y = df['cat']

# split the data
X_train, X_test, y_train, y_test = train_test_split(data_model_X, data_model_y, test_size =0.2, random_state=11)


In [5]:
# import packages
from sklearn.metrics import classification_report,confusion_matrix
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn.neural_network import MLPClassifier
from transformer import TextNormalizer
from sklearn.feature_extraction.text import TfidfVectorizer

# Instantiate the neural network classifier model 
Classifier_new = Pipeline([    
    ('tfidf', TfidfVectorizer()),
    ('ann', MLPClassifier(hidden_layer_sizes=(30,30,30), verbose=False))
    ])

# fit classifier
Classifier_new.fit(X_train,y_train)

# make predictions
predictions = Classifier_new.predict(X_test)

labels = ['video_games', 'science_and_technology', 'sports']

# create confusion matrix
cm = confusion_matrix(y_test,predictions, labels=labels)

# print confusion matrix
print(cm)


Using TensorFlow backend.


[[1227   16  207]
 [  54   35    9]
 [ 188    4  260]]


In [6]:
# define accuracy function
def accuracy(confusion_matrix):
    diagonal_sum = confusion_matrix.trace()
    sum_of_all_elements = confusion_matrix.sum()
    return diagonal_sum / sum_of_all_elements

#Printing the accuracy
print("Accuracy of MLPClassifier : ", accuracy(cm))

Accuracy of MLPClassifier :  0.761


In [7]:
# print precision, recall, F1-score
print(classification_report(y_test,predictions))

                        precision    recall  f1-score   support

science_and_technology       0.64      0.36      0.46        98
                sports       0.55      0.58      0.56       452
           video_games       0.84      0.85      0.84      1450

              accuracy                           0.76      2000
             macro avg       0.67      0.59      0.62      2000
          weighted avg       0.76      0.76      0.76      2000



#### 2. Neural Network Classifier with Keras

Using the multi-label classifier dataset  (categorized-comments.jsonl in the reddit folder), fit a neural network classifier using Keras. Report the accuracy, precision, recall, F1-score, and confusion matrix.

In [11]:
# import packages
from keras.layers import Dense
from keras.models import Sequential

N_FEATURES = 15917
N_CLASSES = 3

def build_network():
    """
    Create a function that returns a compiled neural network
    """
    nn = Sequential()
    nn.add(Dense(30, activation='relu', input_shape=(N_FEATURES,)))
    nn.add(Dense(30, activation='relu'))
    nn.add(Dense(N_CLASSES, activation='softmax'))
    nn.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
    )
    return nn


In [12]:
# import packages
from sklearn.pipeline import Pipeline
from transformer import TextNormalizer
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import classification_report,confusion_matrix
%matplotlib inline
import matplotlib.pyplot as plt

# Instantiate the neural network classifier model 
Classifier_keras = Pipeline([
    ('norm', CountVectorizer()),
    ('nn', KerasClassifier(build_fn=build_network, epochs=50,batch_size=128))
    ])

# fit classifier
Classifier_keras.fit(X_train,y_train)

# make predictions
predictions_keras = Classifier_keras.predict(X_test)

labels = ['video_games', 'science_and_technology', 'sports']

# create confusion matrix
cm_keras = confusion_matrix(y_test,predictions_keras, labels=labels)

# print confusion matrix
print("confusion matrix:", cm_keras)



Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
confusion matrix: [[1215   24  211]
 [  50   33   15]
 [ 209    7  236]]


In [13]:
#Printing the accuracy
print("Accuracy of KerasClassifier : ", accuracy(cm_keras))

Accuracy of KerasClassifier :  0.742


In [14]:
# print precision, recall, F1-score
print(classification_report(y_test,predictions_keras))

                        precision    recall  f1-score   support

science_and_technology       0.52      0.34      0.41        98
                sports       0.51      0.52      0.52       452
           video_games       0.82      0.84      0.83      1450

              accuracy                           0.74      2000
             macro avg       0.62      0.57      0.58      2000
          weighted avg       0.74      0.74      0.74      2000



#### 3. Classifying Images

classify MSINT images using a convolutional neural network. Report the accuracy of your results.

In [15]:
# import packages
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.utils import np_utils
from keras import backend as K

# Set that the color channel value will be first
K.set_image_data_format("channels_first")
# Set seed
np.random.seed(0)

# Set image information
channels = 1
height = 28
width = 28

# Load data and target from MNIST data
(data_train, target_train), (data_test, target_test) = mnist.load_data()

# Reshape training image data into features
data_train = data_train.reshape(data_train.shape[0], channels, height, width)

# Reshape test image data into features
data_test = data_test.reshape(data_test.shape[0], channels, height, width)

# Rescale pixel intensity to between 0 and 1
features_train = data_train / 255
features_test = data_test / 255

# One-hot encode target
target_train = np_utils.to_categorical(target_train)
target_test = np_utils.to_categorical(target_test)
number_of_classes = target_test.shape[1]

# Start neural network
network = Sequential()

# Add convolutional layer with 64 filters, a 5x5 window, and ReLU activation function
network.add(Conv2D(filters=64,
                   kernel_size=(5, 5),
                   input_shape=(channels, width, height),
                   activation='relu'))

# Add max pooling layer with a 2x2 window
network.add(MaxPooling2D(pool_size=(2, 2)))

# Add dropout layer
network.add(Dropout(0.5))

# Add layer to flatten input
network.add(Flatten())

# # Add fully connected layer of 128 units with a ReLU activation function
network.add(Dense(128, activation="relu"))

# Add dropout layer
network.add(Dropout(0.5))

# Add fully connected layer with a softmax activation function
network.add(Dense(number_of_classes, activation="softmax"))

# Compile neural network
network.compile(loss="categorical_crossentropy", # Cross-entropy
                optimizer="rmsprop", # Root Mean Square Propagation
                metrics=["accuracy"]) # Accuracy performance metric

# Train neural network
network.fit(features_train, # Features
            target_train, # Target
            epochs=2, # Number of epochs
            verbose=0, # Don't print description after each epoch
            batch_size=1000, # Number of observations per batch
            validation_data=(features_test, target_test)) # Data for evaluation






<keras.callbacks.callbacks.History at 0x2a10ffcf320>

In [16]:
# evaluate model
network.evaluate(features_test, target_test)



[0.09184574899375439, 0.9714999794960022]

Accuracy of convolutional neural network model is : 97 %