# Text Classification

## Problem Description

Train a deep neural network for classification tickets(‘body’ column) data with respect to ‘urgency’. 

__The purpose here is to get familiar with Recurrent Neural Networks, LSTMs, Keras API, hyper parameter tuning and validating the results.__

## Load libs

In [1]:
from keras.models import Sequential
from keras.layers import Dense
import numpy
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from keras.preprocessing.text import Tokenizer
from keras.preprocessing import sequence
from keras.layers import LSTM, Activation, Dense, Dropout, Input, Embedding
from keras.models import Model
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping
from keras.utils import to_categorical
from keras.models import model_from_json, load_model

# fix random seed for reproducibility
numpy.random.seed(7)

Using TensorFlow backend.


## Load dataset

In [2]:
# load pima indians dataset
ticketData = pd.read_csv(
    "/Users/samyam/Documents/Samya/GIT/insofe/Cute_AI_DL/TextAnalysis/all_tickets-1551435513304.csv", 
    delimiter=",")

print(ticketData.shape)
ticketData.head()

(48549, 9)


Unnamed: 0,title,body,ticket_type,category,sub_category1,sub_category2,business_service,urgency,impact
0,,hi since recruiter lead permission approve req...,1,4,2,21,71,3,4
1,connection with icon,icon dear please setup icon per icon engineers...,1,6,22,7,26,3,4
2,work experience user,work experience user hi work experience studen...,1,5,13,7,32,3,4
3,requesting for meeting,requesting meeting hi please help follow equip...,1,5,13,7,32,3,4
4,reset passwords for external accounts,re expire days hi ask help update passwords co...,1,4,2,76,4,3,4


In [3]:
ticketData_1 = ticketData[['body', 'urgency']]
ticketData_1.head()

Unnamed: 0,body,urgency
0,hi since recruiter lead permission approve req...,3
1,icon dear please setup icon per icon engineers...,3
2,work experience user hi work experience studen...,3
3,requesting meeting hi please help follow equip...,3
4,re expire days hi ask help update passwords co...,3


In [10]:
X_test = ticketData_1.body

# If input is just a sentence
X_test = pd.Series("connection with icon,icon dear please setup icon per icon engineers please let other details needed thanks lead")
print(X_test.shape)


(1,)


In [11]:
#length of body
max_ticket_body_length_in_words = 500
max_words = 5000

#num_words: the maximum number of words to keep, based on word frequency. 
#Only the most common num_words-1 words will be kept.
tok = Tokenizer(num_words=max_words)


### Load the weights

In [12]:
# Load json and create model
json_file = open('/Users/samyam/Documents/Samya/GIT/insofe/Cute_AI_DL/TextAnalysis/Trained_weights/v3/model_text_1.json', 'r')
loaded_model_json_1 = json_file.read()
json_file.close()
loaded_model_json_1 = model_from_json(loaded_model_json_1)

# Load weights into new model
loaded_model_json_1.load_weights("/Users/samyam/Documents/Samya/GIT/insofe/Cute_AI_DL/TextAnalysis/Trained_weights/v3/model_text_1.h5")


### Predict on test data

In [13]:
test_sequences = tok.texts_to_sequences(X_test)
test_sequences_matrix = sequence.pad_sequences(test_sequences,maxlen=max_ticket_body_length_in_words)
Y_pred = loaded_model_json_1.predict(test_sequences_matrix)

In [14]:
y_pred =[]
for i in Y_pred:
    y_pred.append(np.argmax(i))
print(y_pred)

[3]


## Ref

1. https://www.kaggle.com/kredy10/simple-lstm-for-text-classification
2. https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/
3. https://www.kaggle.com/ngyptr/multi-class-classification-with-lstm
4. Notebooks from Insofe