# Overview

We will be doing the following to create a Deep Neural Network using RNN and Softmax as the activation output layer:

- Instantiate required Python components.
- Set Hyperparameters
- Read the CSV data
- Remove unused fields.
- Keep only the message in the JSON.
- Define two lists: messages and labels.
- Split data between training and validation sets.
- Tokenize words
- Pad sequences so they are the same size.
- Build LSTM
- Train several epochs.
- Plot Loss and Accuracy to view model's performance.
- Make predictions.


# Instantiate required Python components.

Our project will use TensorFlow for developing our model.  We'll also need several other Python libraries to work with our CSV.

In [1]:
import pandas as pd
import csv
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
STOPWORDS = set(stopwords.words('english'))

2023-01-01 03:14:18.949107: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-01 03:14:19.056076: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-01-01 03:14:19.056101: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-01-01 03:14:19.648119: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-

# Set Hyperparameters

This handy section will control all the important parameters for our model.

In [2]:
vocab_size = 5000
embedding_dim = 64
max_length = 200
trunc_type = 'post'
padding_type = 'post'
oov_tok = ''
training_portion = .8

# Read the CSV data

Read the CSV contents and keep only specific fields.

In [14]:
labels = []
messages = []

# Open file and save to dataframe.
df = pd.read_csv("./rnn-softmax-multi-class-text-classify/data/20221220-message-incidents.csv")

# print(df.columns)

# Preprocess Data

As part of the Machine Learning process, we will remove fields not required, fix missing values, remove noisy data, and any additional steps to prepare for the ML training process.

## Keep Labels and Messages

We will keep only specific columns that is important to the model.

In [15]:
# Keep specific columns.
df = df[["reason", "messages"]]

print(df.columns)

Index(['reason', 'messages'], dtype='object')


## ▶️ Remove Empty Messages Data

Let's remove any message column if the array is empty.

In [52]:
# Create a boolean mask to select columns with only empty lists
removeEmptyMessages = df['messages'].apply(lambda x: x == '[]')

# Use the mask to drop the columns with only empty lists
df = df.drop(index=df[removeEmptyMessages].index)

print(f'Total number of rows after removing empty lists: {len(df)}')

Total number of rows after removing empty lists: 4042


## Remove JSON and Keep Message Field

We will remove the JSON formatting and keep the message field.

In [49]:
# Define a function to extract the message field from the JSON
def extract_message(json):
  return json['message']

# Apply the function to the 'json' column and create a new 'message' column
df['new_message'] = df['messages'].apply(lambda x: list(map(extract_message, x)))

print(df.head(10))

TypeError: string indices must be integers

# Remove unused fields.

In [13]:
!ls -la

total 32
drwxr-xr-x 4 jovyan users  4096 Jan  1 02:49 .
drwxrwxr-x 5 jovyan  1000  4096 Dec 24 19:16 ..
drwxr-xr-x 2 jovyan users  4096 Jan  1 02:49 .ipynb_checkpoints
-rw-r--r-- 1 jovyan users 12484 Dec 28 17:16 rnn-softmax.ipynb
drwxr-xr-x 4 jovyan users  4096 Jan  1 03:22 rnn-softmax-multi-class-text-classify


# Keep only the message in the JSON.