<a href="https://colab.research.google.com/github/chandanadasarii/NLP/blob/master/ANLP_Assignment1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

  # Advanced Natural Language Processing
  
  ###  Chandana

In this task you will develop a system to detect irony in text. We will use the data from the SemEval-2018 task on irony detection. You should use the file `SemEval2018-T3-train-taskA.txt` from Blackboard it consists of examples as follows:

```csv
Tweet index     Label   Tweet text
1       1       Sweet United Nations video. Just in time for Christmas. #imagine #NoReligion  http://t.co/fej2v3OUBR
2       1       @mrdahl87 We are rumored to have talked to Erv's agent... and the Angels asked about Ed Escobar... that's hardly nothing    ;)
3       1       Hey there! Nice to see you Minnesota/ND Winter Weather 
4       0       3 episodes left I'm dying over here
```


# Task 1 

Read all the data and find the size of vocabulary of the dataset (ignoring case) and the number of positive and negative examples.

In [1]:
# importing the necessary packages

import pandas as pd
import nltk
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize
nltk.download('punkt')
from sklearn.model_selection import train_test_split
nltk.download('wordnet')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.


True

In [0]:
# Loading the data
data = pd.read_csv('SemEval2018-T3-train-taskA.txt',sep='\t')

# intializations
positive_count=0 
negative_count=0
full_vocab=[]
full_vocab_dict={}

# Number of ironic(positive) and non-ironic(negative) example count

for label in data['Label']:
  if label==1:
    positive_count+=1
  else:
    negative_count+=1
  
# parsing each tweet to build the corpus by lowering the case and tokenizing the tweets
for index,row in data.iterrows():
  full_vocab+= word_tokenize(row['Tweet text'].lower())
  
# iterating over all the words and building a {word - freq} dictionary  
for item in full_vocab:
  if item in full_vocab_dict :
    full_vocab_dict[item] += 1
  else:
    full_vocab_dict[item] = 1 
    
print("Number of Positive Examples : "+ str(positive_count))
print("Number of Negative Examples : "+ str(negative_count))
print("Size of the vocabulary : " + str(len(full_vocab_dict)))


Number of Positive Examples : 1901
Number of Negative Examples : 1916
Size of the vocabulary : 13460


# Task 2

Develop a classifier using the Naive Bayes model to predict if an example is ironic. The model should convert each Tweet into a bag-of-words and calculate

$p(\text{Ironic}|w_1,\ldots,w_n) \propto \prod_{i=1,\ldots,n} p(w_i \in \text{tweet}| \text{Ironic}) p(\text{Ironic})$

$p(\text{NotIronic}|w_1,\ldots,w_n) \propto \prod_{i=1,\ldots,n} p(w_i \in \text{tweet}| \text{NotIronic}) p(\text{NotIronic})$

You should use add-alpha smoothing to calculate probabilities

**Steps involved in Naive Bayes Classifier :**

1. Convert the tweet to lower case.
2. Tokenize the tweets into words.
3. Build the Ironic and Non Ironic Vocabulary list
4. Based on the Vocabulary list , build the Ironic and not Ironic dictionaries which captures the frequency of each words in respective classes/contexts.
5. Calculate the prior Probabilities of both the classes
6. Using the above mentioned formula , calculate the posterior probabilities of both the classes given tweet. and assign the maximum probability class as the label of that tweet.

**Advantages **

1. Simple and efficient
2. Purely based on the word frequency count

**Disadvantages**

Fails to captures the semantics of the tweets.






In [0]:
class NaiveBayes(): 
  
  def __init__(self): #initializations   
    self.tweet_words=[]
    self.Ironic_vocab=[]
    self.notIronic_vocab=[]
    self.Ironic_dict={}
    self.notIronic_dict={}
    self.positive_count=0
    self.negative_count=0   
    self.Prob_notIronic=0
    self.Prob_Ironic=0

  def model(self, train):

    #Building the Ironic and notIronic Vocabularies  
    for index,row in train.iterrows():
      if row['Label'] ==1:
        self.Ironic_vocab+=word_tokenize(row['Tweet text'].lower())
      else:
        self.notIronic_vocab+=word_tokenize(row['Tweet text'].lower())

    #Ironic and notIronic dictionaries 
    for item in self.notIronic_vocab:
      if item in self.notIronic_dict :
        self.notIronic_dict[item] += 1
      else:
        self.notIronic_dict[item] = 1
    for item in self.Ironic_vocab:
      if item in self.Ironic_dict :
        self.Ironic_dict[item] += 1
      else:
        self.Ironic_dict[item] = 1

    # Ironic and notIronic Label count
    for label in train['Label']:
      if label==1:
        self.positive_count+=1
      else:
        self.negative_count+=1

    # Calculating prior probabilites for Ironic and notIronic classes
    self.Prob_Ironic = self.positive_count/(self.positive_count+self.negative_count)
    self.Prob_notIronic = self.negative_count/(self.positive_count+self.negative_count)

  def predict(self,test):
    count = 0
    for tweet,label in zip(test['Tweet text'],test['Label']):
      tweet_words=word_tokenize(tweet.lower())
      prob_Ironic_given_tweet = 1
      prob_notIronic_given_tweet = 1

      for word in tweet_words:
        if word not in self.Ironic_dict:
          self.Ironic_dict[word]=0
        if word not in self.notIronic_dict:          
          self.notIronic_dict[word]=0
        #Calculating likelihoods
        prob_Ironic_given_tweet = prob_Ironic_given_tweet*(self.Ironic_dict[word]+1)/(len(self.Ironic_vocab)+len(self.Ironic_dict))
        prob_notIronic_given_tweet = prob_notIronic_given_tweet*(self.notIronic_dict[word]+1)/(len(self.notIronic_vocab)+len(self.notIronic_dict))
      
      # Calculating Posterior probability of Ironic given tweet
      prob_Ironic_given_tweet = prob_Ironic_given_tweet * self.Prob_Ironic
      # Calculating Posterior probability of notIronic given tweet
      prob_notIronic_given_tweet = prob_notIronic_given_tweet * self.Prob_notIronic

      # based on maximum likelihood probability assigning the labels
      if prob_notIronic_given_tweet > prob_Ironic_given_tweet:
        NB_label = 0
      else:
        NB_label = 1
      #checking whether the actual label and NB predicted label are same
      if label==NB_label:
        count+=1
      
          
    #Calculating the accuracy      
    accuracy= count/len(test)
    print("Accuracy of NaiveBayes : {:0.2f}%\n".format(accuracy*100))


# Task 3

Divide the data into a training and test set and justify your split.

Choose a suitable evaluation metric and implement it. Explain why you chose this evaluation metric.

Evaluate the method in Task 2 according to this metric.

In [0]:
# train - test split
train_raw, test_raw = train_test_split(data, test_size=0.25, random_state=42)

# created one more copy of train test to use ahead
train_lstm_imp = train_raw
test_lstm_imp  = test_raw
# Naive Bayes model create and predict
nb = NaiveBayes()
nb.model(train_raw)
nb.predict(test_raw)


Accuracy of NaiveBayes : 64.71%



# Task 4

Run the following code to generate a model from your training set. The training set should be in a variable  called `train` and is assumed to be of the form:

```
[(1, 1, ['sweet', 'united', 'nations', 'video', '.', 'just', 'in', 'time', 'for', 'christmas', '.', '#', 'imagine', '#', 'noreligion', 'http', ':', '//t.co/fej2v3oubr']), 
 (2, 1, ['@', 'mrdahl87', 'we', 'are', 'rumored', 'to', 'have', 'talked', 'to', 'erv', "'s", 'agent', '...', 'and', 'the', 'angels', 'asked', 'about', 'ed', 'escobar', '...', 'that', "'s", 'hardly', 'nothing', ';', ')']), 
 (3, 1, ['hey', 'there', '!', 'nice', 'to', 'see', 'you', 'minnesota/nd', 'winter', 'weather']), 
 (4, 0, ['3', 'episodes', 'left', 'i', "'m", 'dying', 'over', 'here']), 
 ...
]
 ```



In [0]:
# converting the data into the above format , which is compatible with the keras model given below
# converting each row -> (index,label,[tokens_of_tweet])

Train =[]
Test=[]
for index,row in train.iterrows():
  # to convert to this form : (index,label,[tokenized_tweet])
  r = (index,row['Label'],word_tokenize(row['Tweet text'].lower())) 
  Train.append(r)
print(Train[1:10])

for index,row in test.iterrows():
  r = (index,row['Label'],word_tokenize(row['Tweet text'].lower()))
  Test.append(r)
print(Test[1:10])

[(1044, 0, ['#', 'lebron', '#', 'james', ':', "'", '#', 'violence', '#', 'is', '#', 'the', '#', 'answer', "'", ':', 'lebron', 'james', 'said', 'thursday', 'that', '``', 'violence', 'is', 'not', 'the', 'answer', 'and', '...', 'http', ':', '//t.co/gpz3d9tiuv']), (2820, 0, ['by', '@', 'imjayded_', '``', ':', 'fire', ':', ':fire', ':', ':fire', ':', '@', 'dipmag', 'gears', 'up', 'to', 'release', 'its', 'next', 'issue', ':', 'soon_with_rightwards_arrow_above', ':', "'wifey", 'series', "'", 'with', '#', 'covermodels', '...', 'http', ':', '//t.co/dqrkfjmyvl']), (69, 1, ['just', 'delivered', '@', 'dominiqueansel', '#', 'cronuts', 'to', '@', 'bouchonbakeryrc', 'hmmm', 'its', 'for', 'a', 'customer', 'i', 'hope', '!', '!', '!', 'http', ':', '//t.co/kdx0fhvqcw']), (3705, 1, ['i', "'m", 'not', 'in', 'enough', 'dance', 'practice', 'whatsapp', 'groups', 'yet', '.', '#', 'december', '#', 'winter', '#', 'weddingseason']), (3136, 0, ['@', 'gemheartbeat', '@', 'mooglexox', 'neither', 'can', 'i', ',', 'i'

In [0]:
from keras.models import Sequential, load_model
from keras.layers import Dense, Activation, Embedding, Dropout, TimeDistributed
from keras.layers import LSTM
from keras.optimizers import Adam
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint
import numpy as np

## These values should be set from Task 3
train, test = Train,Test

def make_dictionary(train, test):
    dictionary = {}
    for d in train+test:
        for w in d[2]:
            if w not in dictionary:
                dictionary[w] = len(dictionary)
    return dictionary

class KerasBatchGenerator(object):
    def __init__(self, data, num_steps, batch_size, vocabulary, skip_step=5):
        self.data = data
        self.num_steps = num_steps
        self.batch_size = batch_size
        self.vocabulary = vocabulary
        self.current_idx = 0
        self.current_sent = 0
        self.skip_step = skip_step

    def generate(self):
        x = np.zeros((self.batch_size, self.num_steps))
        y = np.zeros((self.batch_size, self.num_steps, 2))
        while True:
            for i in range(self.batch_size):
                # Choose a sentence and position with at lest num_steps more words
                while self.current_idx + self.num_steps >= len(self.data[self.current_sent][2]):
                    self.current_idx = self.current_idx % len(self.data[self.current_sent][2])
                    self.current_sent += 1
                    if self.current_sent >= len(self.data):
                        self.current_sent = 0
                # The rows of x are set to values like [1,2,3,4,5]
                x[i, :] = [self.vocabulary[w] for w in self.data[self.current_sent][2][self.current_idx:self.current_idx + self.num_steps]]
                # The rows of y are set to values like [[1,0],[1,0],[1,0],[1,0],[1,0]]
                y[i, :, :] = [[self.data[self.current_sent][1], 1-self.data[self.current_sent][1]]] * self.num_steps
                self.current_idx += self.skip_step
            yield x, y

# Hyperparameters for model
vocabulary = make_dictionary(train, test)
num_steps = 5
batch_size = 20
num_epochs = 50 # Reduce this if the model is taking too long to train (or increase for performance)
hidden_size = 50 # Increase this to improve perfomance (or increase for performance)
use_dropout=True

# Create batches for RNN
train_data_generator = KerasBatchGenerator(train, num_steps, batch_size, vocabulary,
                                           skip_step=num_steps)
valid_data_generator = KerasBatchGenerator(test, num_steps, batch_size, vocabulary,
                                           skip_step=num_steps)

# A double stacked LSTM with dropout and n hidden layers
model = Sequential()
model.add(Embedding(len(vocabulary), hidden_size, input_length=num_steps))
model.add(LSTM(hidden_size, return_sequences=True))
model.add(LSTM(hidden_size, return_sequences=True))
if use_dropout:
    model.add(Dropout(0.5))
model.add(TimeDistributed(Dense(2)))
model.add(Activation('softmax'))

# Set optimizer and build model
optimizer = Adam()
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])

# Train the model
model.fit_generator(train_data_generator.generate(), len(train)//(batch_size*num_steps), num_epochs,
                        validation_data=valid_data_generator.generate(),
                        validation_steps=len(test)//(batch_size*num_steps))

# Save the model
model.save("final_model.hdf5")

Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


Now consider the following code:

In [0]:
model = load_model("final_model.hdf5")

x = np.zeros((1,num_steps))
x[0,:] = [vocabulary["this"],vocabulary["is"],vocabulary["an"],vocabulary["easy"],vocabulary["test"]]
print(model.predict(x))


[[[0.22709972 0.7729003 ]
  [0.2189427  0.7810573 ]
  [0.3058866  0.69411343]
  [0.02666874 0.9733313 ]
  [0.19957547 0.8004246 ]]]


Using the code above write a function that can predict the label using the LSTM model above and compare it with the evaluation performed in Task 3

In [0]:
# function to check the performance of LSTM model
def prediction_using_lstm_model(test):
  
  y_predict = list()
  
  '''
  Above defined LSTM model accepts only 5 words at a time.
  to process 5 words in a tweet every time. divide the tweet into chunks, 
  in such a way that each chunk consist of 5 words.
  To achive this, maintained 2 positions current_position and to_position, 
  after each iteration updating this positions by adding the num_steps
  
  '''
  for row in test:
    current_position = 0
    to_position=0
    p_predict = np.ones(2)
    #to find the maximum chunks possible ina tweet
    max_chunks = int(len(row[2])/num_steps)
    for i in range(max_chunks):
      x = np.zeros((1, num_steps))
      to_position = to_position+num_steps
      # boundary case , after updating to_position if it goes beyond the length of tweet, correcting it by equating it to max_langth of tweet
      if to_position > len(row[2]):
        to_position = len(row[2])
        # and updating the current position
        current_position = to_position-num_steps
      #processing the chunk   
      x[0, :] = [vocabulary[w] for w in row[2][current_position:to_position]]
      #predicting using the above lstm model
      p_temp = model.predict(x)
      #multiplyig the prediction probabilities of all chunks to get the complete tweet probability
      p_predict = p_predict * np.prod(p_temp[0], axis=0)
      #updating the current position to process the next chunk
      current_position += num_steps

    #assigning the label based on the maximum probability between 2 classes
    y_predict.append(int(p_predict[0] > p_predict[1]))
  
  return y_predict

In [0]:
# calling the above method to get the labels
y_predict_using_lstm = prediction_using_lstm_model(test)

In [0]:
# Performance Metrics : Accuracy- Precision - Recall - F Score - Confusion matix

from sklearn.metrics import f1_score
from sklearn.metrics import confusion_matrix

def metrics_demo(actual_label,predicted_label): 
    
    result = [i for i,j in zip(actual_label,predicted_label) if i==j]
    F1_score = f1_score(actual_label, predicted_label, average="macro")
    tn, fp, fn, tp = confusion_matrix(actual_label,predicted_label).ravel()
    accuracy = len(result)/len(actual_label)

    print("---------------------------------")
    print("Accuracy of keras LSTM model : {:0.2f}%\n".format(accuracy*100))
    print("F1 Score of keras LSTM model : {:0.2f}\n".format(F1_score*100))
    print("---------------------------------")

In [0]:
# calculate the performance metrics for the above lstm model
actual_lab=list()
for row in Test:
  actual_lab.append(row[1])#actual test labels
  
# calling the metrics_demo function with actual and predicted labels
metrics_demo(actual_lab,y_predict_using_lstm)

---------------------------------
Accuracy of keras LSTM model : 52.88%

F1 Score of keras LSTM model : 51.43

---------------------------------


# Task 5

Suggest an improvement to either the system developed in Task 2 or 4 and show that it improves according to your evaluation metric.

Please note this task is marked according to: demonstration of knowledge from the lecutures (10), originality and appropriateness of solution (10), completeness of description (10), technical correctness (5) and improvement in evaluation metric (5).

In this section , i am discussing about the improvements made to the above given LSTM model in Task 4.


As the data is collected from social media platform like twitter,  The data contains a lot of noise, unnecessary extra words, extra spacings, multiple hashtags. Dealing with twitter data needs a god preprocessing, in such a way to remove the unnecessary characters without losing the context or data.

**1. Preprocessing the Tweets :**

converting the tweets to lower case

Replace the URL's present in the tweets with the word 'URL'

Replace the @user_mention with the word 'USER'

Removing the hashtags present in a word

Stripping the extra empty spaces

Checking for the validity of a word

Handling emojis , by replacing the emojis with positive or negative words




In [0]:
import re
import sys
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()


def preprocess_word(word):
    # Remove punctuation
    word = word.strip('\'"?!,.():;')
    # Remove & '
    word = re.sub(r'(-|\')', '', word)
    return word


def is_valid_word(word):
    # Check word begins with an alphabet
    return (re.search(r'^[a-zA-Z][a-z0-9A-Z\._]*$', word) is not None)

def preprocess_tweet(tweet):
    
    processed_tweet = []
    # Convert to lower case
    tweet = tweet.lower()
    # Replaces URLs with the word URL
    tweet = re.sub(r'((www\.[\S]+)|(https?://[\S]+))', ' URL ', tweet)
    # Replace @handle with the word USER
    tweet = re.sub(r'@[\S]+', 'USER', tweet)
    # Replaces #hashtag with hashtag
    tweet = re.sub(r'#(\S+)', r' \1 ', tweet)
    # Replace 2+ dots with space
    tweet = re.sub(r'\.{2,}', ' ', tweet)
    # Strip space, " and ' from tweet
    tweet = tweet.strip(' "\'')
    # Replace emojis with either positive , negative
    tweet = handle_emojis(tweet)
    # Replace multiple spaces with a single space
    tweet = re.sub(r'\s+', ' ', tweet)
    words = tweet.split()

    for word in words:
        word = preprocess_word(word)
        if is_valid_word(word):
            word = lemmatizer.lemmatize(word)
            processed_tweet.append(word)
    return ' '.join(processed_tweet)
  
def handle_emojis(tweet):
    # Smile -- :), : ), :-), (:, ( :, (-:, :')
    tweet = re.sub(r'(:\s?\)|:-\)|\(\s?:|\(-:|:\'\))', ' positive ', tweet)
    # Laugh -- :D, : D, :-D, xD, x-D, XD, X-D
    tweet = re.sub(r'(:\s?D|:-D|x-?D|X-?D)', ' positive ', tweet)
    # Love -- <3, :*
    tweet = re.sub(r'(<3|:\*)', ' positive ', tweet)
    # Wink -- ;-), ;), ;-D, ;D, (;,  (-;
    tweet = re.sub(r'(;-?\)|;-?D|\(-?;)', ' positive ', tweet)
    # Sad -- :-(, : (, :(, ):, )-:
    tweet = re.sub(r'(:\s?\(|:-\(|\)\s?:|\)-:)', ' negative ', tweet)
    # Cry -- :,(, :'(, :"(
    tweet = re.sub(r'(:,\(|:\'\(|:"\()', ' negative ', tweet)
    return tweet

In [0]:
#preprocessing the tweets:

def preprocess_data(data):
  processed_data_list=[]
  for index,row in data.iterrows():
      processed_data_list.append(preprocess_tweet(row[2]))
  return processed_data_list
 

In [0]:
from keras import models
from keras import layers
from keras import optimizers
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

#preprocessing the train data
train_data = preprocess_data(train_raw)
train_labels = train_raw['Label'].values
#preprocessing the test data
test_data = preprocess_data(test_raw)
test_labels = test_raw['Label'].values

tokenizer = Tokenizer()
#vectorizing the tweets by pre-fitted tokenizer instance
tokenizer.fit_on_texts(data['Tweet text'])

#Padding the tweets
#iterating over each sentence and finding the maximum number of words sentence  
max_length = max([len(w.split()) for w in data['Tweet text']])

#Vocabulary size
vocabulary_size = len(tokenizer.word_index)+1
print(vocabulary_size)

#converting the text to sequences to that it can be given as input to the network.
train_tokens = tokenizer.texts_to_sequences(train_data)
test_tokens = tokenizer.texts_to_sequences(test_data)

#Padding
train_padding = pad_sequences(train_tokens,max_length,padding='post')
test_padding = pad_sequences(test_tokens,max_length,padding='post')

12941


**Bidirectional Recurrent Neuaral Networks :** 

As Bidirectional LSTM's are advanced compared to the tradidtional single layer LSTM. 


Bidirectional LSTMs train two instead of one LSTMs on the input sequence. The first on the input sequence as-is and the second on a reversed copy of the input sequence. This can provide additional context to the network and result in faster and even fuller learning on the problem.

In [0]:
from keras.models import Sequential
from keras.layers import Embedding, LSTM,GRU,Dense,Bidirectional
from keras.layers.embeddings import Embedding

Embedding_dimensionality = 1000
#defining the model
model = Sequential()
#Embedding has parametres vocabulary of 12941, embedding dimensionality, max lengt of tweet
model.add(Embedding(vocabulary_size,Embedding_dimensionality,input_length=max_length))
#wrappig the LSTM layer in the bidirectional wrapper generates the two copies of hidden layer to handle the sequence from the straight and reverse order 
model.add(Bidirectional(LSTM(units=20,dropout=0.2, recurrent_dropout=0.2,return_sequences=False)))
model.add(Dense(1,activation='sigmoid'))

To build the bidirectional LSTM Network. we use  multiple  hyperparameters, can be tunes in order to achieve good results. and used sigmoid as activation function. 

In [0]:
#compile the model
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])

In [0]:
#summary of the model
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 187, 1000)         12941000  
_________________________________________________________________
bidirectional_1 (Bidirection (None, 40)                163360    
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 41        
Total params: 13,104,401
Trainable params: 13,104,401
Non-trainable params: 0
_________________________________________________________________
None


In [0]:
model.fit(train_padding,train_labels,batch_size=100,epochs=10,validation_data=(test_padding,test_labels),verbose=2)

Train on 2862 samples, validate on 955 samples
Epoch 1/10
 - 96s - loss: 0.6885 - acc: 0.5360 - val_loss: 0.6855 - val_acc: 0.5393
Epoch 2/10
 - 90s - loss: 0.5973 - acc: 0.7379 - val_loss: 0.6547 - val_acc: 0.5990
Epoch 3/10
 - 90s - loss: 0.4039 - acc: 0.8382 - val_loss: 0.7399 - val_acc: 0.6209
Epoch 4/10
 - 91s - loss: 0.2347 - acc: 0.9217 - val_loss: 0.8686 - val_acc: 0.6000
Epoch 5/10
 - 90s - loss: 0.1410 - acc: 0.9577 - val_loss: 0.9639 - val_acc: 0.6168
Epoch 6/10
 - 90s - loss: 0.0918 - acc: 0.9762 - val_loss: 1.0662 - val_acc: 0.6000
Epoch 7/10
 - 91s - loss: 0.0631 - acc: 0.9832 - val_loss: 1.1844 - val_acc: 0.6209
Epoch 8/10
 - 92s - loss: 0.0475 - acc: 0.9899 - val_loss: 1.2476 - val_acc: 0.6136
Epoch 9/10
 - 90s - loss: 0.0409 - acc: 0.9892 - val_loss: 1.3312 - val_acc: 0.6105
Epoch 10/10
 - 91s - loss: 0.0378 - acc: 0.9923 - val_loss: 1.4081 - val_acc: 0.6052


<keras.callbacks.History at 0x7ff1814810f0>

In [0]:
Improved_LSTM_Labels = model.predict(test_padding)

In [0]:
#for calculating the accuracy

Imp_labels =[]
for i in Improved_LSTM_Labels:
  if i>0.5:
    Imp_labels.append(1)
  else:
    Imp_labels.append(0) 
    
countt=0
for (i,j) in zip(Imp_labels,test_labels):
  if i==j:
    countt=countt+1
#print(countt/len(Imp_labels))


**Conclusion**

As given Single layer LSTM is failing to capture the semantics. I am using the keras bidirectional LSTM, as it inturn generates the 2 LSTMs to work on both the directions of the sequence. Thus capturing the context effectively.

As there are many hyper parameters involved , we can always tune this parameters based on the application we are interested in.

In this case with the Single layer of LSTM , 

 ---------------------------------
Accuracy of keras LSTM model : 52.88%

F1 Score of keras LSTM model : 51.43

---------------------------------

In this case with the Single layer of LSTM , 

 ---------------------------------
Accuracy of keras LSTM model : 60.79%

F1 Score of keras LSTM model : 58.43

---------------------------------
 

Compared to the single layer , bidirectional LSTM is performing better. 
and by tuning the hyperparameters appropriatelty and giving more and more training data will help to further imrove the performance .



Reference :

1. Multiple papers on twitter sentiment analysis
2. preprocessing techniques on twitter data.
3. Keras documentation
