<a href="https://colab.research.google.com/github/blessondensil294/Coursera-Sentiment-Analysis-using-BERT/blob/master/Sentiment_Analysis_using_BERT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Analysis using BERT Deep Learning

BERT (Bidirectional Encoder Representations from Transformers) is a recent paper published by researchers at Google AI Language.

BERT’s key technical innovation is applying the bidirectional training of Transformer, a popular attention model, to language modelling. 


This is in contrast to previous efforts which looked at a text sequence either from left to right or combined left-to-right and right-to-left training.


a pre-trained neural network produces word embeddings which are then used as features in NLP models.

How BERT works!

BERT makes use of Transformer.

Transformer includes two separate mechanisms — an encoder that reads the text input and a decoder that produces a prediction for the task.

[Link to BERT Medium Blog](https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270)

## Exploratory Data Analysis

In [0]:
import torch
import pandas as pd
import numpy as np

In [0]:
df_Train = pd.read_csv('https://raw.githubusercontent.com/blessondensil294/Coursera-Sentiment-Analysis-using-BERT/master/Data/smile-annotations-final.csv', 
                       names=['id', 'tweet', 'category'])
df_Train.set_index('id', inplace=True)

In [3]:
df_Train.head()

Unnamed: 0_level_0,tweet,category
id,Unnamed: 1_level_1,Unnamed: 2_level_1
611857364396965889,@aandraous @britishmuseum @AndrewsAntonio Merc...,nocode
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...,happy
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...,happy
614877582664835073,@Sofabsports thank you for following me back. ...,happy
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...,happy


In [4]:
df_Train['category'].value_counts()

nocode               1572
happy                1137
not-relevant          214
angry                  57
surprise               35
sad                    32
happy|surprise         11
happy|sad               9
disgust|angry           7
disgust                 6
sad|angry               2
sad|disgust             2
sad|disgust|angry       1
Name: category, dtype: int64

Encoding of the Catogory Column

In [0]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df_Train['category'] = le.fit_transform(df_Train['category'])

In [0]:
label_dict = {}
possible_label = df_Train['category'].unique()
for index, possible_label in enumerate(possible_label):
  label_dict[possible_label] = index

In [6]:
label_dict

{'angry': 3,
 'disgust': 5,
 'disgust|angry': 4,
 'happy': 1,
 'happy|sad': 9,
 'happy|surprise': 6,
 'nocode': 0,
 'not-relevant': 2,
 'sad': 7,
 'sad|angry': 11,
 'sad|disgust': 10,
 'sad|disgust|angry': 12,
 'surprise': 8}

In [0]:
df_Train['label'] = df_Train['category'].replace(label_dict)

In [8]:
df_Train.head()

Unnamed: 0_level_0,tweet,category,label
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
611857364396965889,@aandraous @britishmuseum @AndrewsAntonio Merc...,nocode,0
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...,happy,1
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...,happy,1
614877582664835073,@Sofabsports thank you for following me back. ...,happy,1
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...,happy,1


## Train and Test Split

In [0]:
from sklearn.model_selection import train_test_split

In [0]:
x_train, x_test, y_train, y_test = train_test_split(
    df_Train.index.values,
    df_Train['label'].values,
    test_size=0.15,
    random_state=294
#    stratify = df_Train['label'].values
)

In [0]:
df_Train['Data_Type'] = ['not set']*df_Train.shape[0]

In [12]:
df_Train.loc[]

Unnamed: 0_level_0,tweet,category,label,Data_Type
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
611857364396965889,@aandraous @britishmuseum @AndrewsAntonio Merc...,nocode,0,not set
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...,happy,1,not set
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...,happy,1,not set
614877582664835073,@Sofabsports thank you for following me back. ...,happy,1,not set
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...,happy,1,not set
