## Description:

- The purpose of this simple demo is to show that **my re-trained transformer NLP model, a pre-trained DistilBERT, is a MVP** that analyses and predicts with remarkable accuracy between a positive and a negative customer review. 

- **TFDistilBertForSequenceClassification** was configured to train, test, & validate on a data frame of approximately **85,981 data points**. The **training data set consisted 30% of this**. 

- To **evaluate the model, I shall pass unseen new data** into it in the form of sample reviews. The sample reviews are taken from actual Amazon (US)personal care appliances reviews: **test_review1 is a positive review** while **test_review2 is a negative review**. 

### I. Installing libraries and dependencies

**Native Python machine learning module**

In [None]:
import tensorflow as tf

**Activating GPU of Google Colab needed to run a large-sized ML Model**

In [None]:
num_gpus_available = len(tf.config.experimental.list_physical_devices('GPU'))
print('Num GPUs Available: ', num_gpus_available) 
assert num_gpus_available > 0

Num GPUs Available:  1


**Installing API from Hugging Face**

In [None]:
!pip3 install transformers

**Importing modules for NLP tasks**

In [None]:
from transformers import DistilBertTokenizerFast
from transformers import TFDistilBertForSequenceClassification

###II. Implementing Tokenizer from DistilBertTokenizerFast 

In [None]:
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')

###III. Retrieving the re-trained model

**Mounting Google Drive to access saved model**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


**Loading the model**

In [None]:
loaded_model = TFDistilBertForSequenceClassification.from_pretrained('drive/MyDrive/tdb_sentiment')

All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification.

All the layers of TFDistilBertForSequenceClassification were initialized from the model checkpoint at drive/MyDrive/tdb_sentiment.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


##IV. Predicting on unseen data

**Positive review**

In [None]:
# A 5-star rating from https://www.amazon.com/Philips-PrecisionPerfect-Precision-HP6390-51/dp/B00I471M9E/ref=lp_17395279011_1_8#customerReviews
test_review1 = '''    
I've been painfully plucking my chin hairs 
and lady moustache for years. This thing does 
wonders!! The blade is close enough to get a smooth 
shave, and it also has a protector so it doesn't cut your 
skin. Easy to use; MUCH faster than plucking; and less 
painful than waxing and plucking. Would definitely recommend!!
'''

predict_input = tokenizer.encode(test_review1,
                                 truncation=True,
                                 padding=True,
                                 return_tensors='tf')
tf_output = loaded_model.predict(predict_input)[0]
tf_prediction = tf.nn.softmax(tf_output, axis=1)  # Activation set to softmax for Sparse Categorical Entropy loss
labels = ['Negative','Positive']
label = tf.argmax(tf_prediction, axis=1)
label = label.numpy()
print(labels[label[0]])

Positive


**Negative review**

In [None]:
# A 1-star rating from https://www.amazon.com/product-reviews/B00I471M9E/ref=acr_dp_hist_1?ie=UTF8&filterByStar=one_star&reviewerType=all_reviews#reviews-filter-bar
test_review2 = '''
This product lasted about 4 months - thought the battery was bad 
so I replaced it and it still didn’t work - it was dead ! I used it 
a total of 3 times !!! Had to toss , would not buy again ... I’m going 
back to my old school methods of using a scissor and a comb ...
'''

predict_input = tokenizer.encode(test_review2,
                                 truncation=True,
                                 padding=True,
                                 return_tensors='tf')
tf_output = loaded_model.predict(predict_input)[0]
tf_prediction = tf.nn.softmax(tf_output, axis=1)  # Activation set to softmax for Sparse Categorical Entropy loss
labels = ['Negative','Positive']
label = tf.argmax(tf_prediction, axis=1)
label = label.numpy()
print(labels[label[0]])

Negative
