-
Notifications
You must be signed in to change notification settings - Fork 0
ICP 12
In this ICP, we have learnt about difference between FNN and RNN, what is RNN and LSTM.
-
Python version 3 or Google Colab
-
Pycharm
-
Keras and Tensorflow installed
-
Anaconda
-
Github
1. Save the model and use the saved model to predict on new text data (ex, “A lot of good things are happening. We are respected again throughout the world, and that's a great thing.@realDonaldTrump”)
-
The data set considered here is Sentiment.csv
-
Below is the link for source code.
https://umkc.box.com/s/xrxrv8un2xen18yb1p9nq2xw70tm849g
-
Import the necessary libraries
-
Reading the CSV as a pandas data frame.
-
Having only necessary two columns in the data frame.
-
Next step is data pre-processing
-
Replacing the special characters and rt in the text data with empty space
-
Maximum features taken is 2000.
-
Converting the text data into lower case
-
Used the tokenizer API and performed fit to text and text to sequence methods on the text data.
-
No of neurons and embedded dimensions are 128 and 196
-
Next is the model creation
-
Initialized the sequential layer
-
Added the embedded layer, dropout is 0.2 and output layer is softmax
-
Loss used is categorical_crossentropy, the optimizer is Adam and Metics is accuracy.
-
Converting the categorical Y data to the numerical format and split it into test and train data
-
Model execution is done. Accuracy and loss values are printed.
-
Save and load the model
-
In order to predict the new text from the saved model, Take the string and converting into pandas column. Performing the pre-processing which is converting the text data into lower case and removing the special characters. Converting the pre-processed text into the numerical format, which is used the tokenizer API and performed fit to text and text to sequence methods on the text data we see the predicted value at the output.
Below is the source code:
The output images are shown below:
Below is the link for complete output.
2. Apply GridSearchCV on the source code provided in the class
To the above code GridSearchCV is applied.
-
Used the grid search CV to find out the best hyperparameters to train the model.
-
Batch sizes are taken are 10 and 20. No of epochs used is 1 and 2.
-
Initialized the Grid Search Model with the above parameters while fitting the model on the train data.
-
Best parameters chosen are batch_size 10, epochs 2
-
So found the accuracy for the tuned parameters
-
Below is the source code
Output Images are shown below:
Below is the link for the complete output.
3. Apply the code on spamdata set available in thesourcecode (text classification on the spam.csvdata set)
-
Here the data set considered is spam.csv
-
To the first program source code the data set is replaced with spam.csv
-
Loss and accuracy are calculated.
-
Prediction is also done on the sample input text.
Below is the source code:
Output images are shown below:
Below is the link for complete output.