In this task you will develop a system to detect irony in text. We will use the data from the SemEval-2018 task on irony detection. You should use the file `SemEval2018-T3-train-taskA.txt` from Blackboard it consists of examples as follows:

```csv
Tweet index     Label   Tweet text
1       1       Sweet United Nations video. Just in time for Christmas. #imagine #NoReligion  http://t.co/fej2v3OUBR
2       1       @mrdahl87 We are rumored to have talked to Erv's agent... and the Angels asked about Ed Escobar... that's hardly nothing    ;)
3       1       Hey there! Nice to see you Minnesota/ND Winter Weather 
4       0       3 episodes left I'm dying over here
```


Student Name : Anitha Govindaraju
Student ID: 19230254

# Task 1 (5 Marks)

Read all the data and find the size of vocabulary of the dataset (ignoring case) and the number of positive and negative examples.

In [0]:
from google.colab import files
files.upload()

Saving semevaluation.txt to semevaluation.txt


{'semevaluation.txt': b'Tweet index\tLabel\tTweet text\n1\t1\tSweet United Nations video. Just in time for Christmas. #imagine #NoReligion  http://t.co/fej2v3OUBR\n2\t1\t@mrdahl87 We are rumored to have talked to Erv\'s agent... and the Angels asked about Ed Escobar... that\'s hardly nothing    ;)\n3\t1\tHey there! Nice to see you Minnesota/ND Winter Weather \n4\t0\t3 episodes left I\'m dying over here\n5\t1\t"I can\'t breathe!" was chosen as the most notable quote of the year in an annual list released by a Yale University librarian \n6\t0\tYou\'re never too old for Footie Pajamas. http://t.co/ElzGqsX2yQ\n7\t1\tNothing makes me happier then getting on the highway and seeing break lights light up like a Christmas tree.. \n8\t0\t4:30 an opening my first beer now gonna be a long night/day\n9\t0\t@Adam_Klug do you think you would support a guy who knocked out your daughter? Rice doesn\'t deserve support.\n10\t0\t@samcguigan544 You are not allowed to open that until Christmas day!\n11\t1\t

In [0]:
#Importing libraries
import pandas as pd
import nltk

In [0]:
#Loading the data through pandas
data = pd.read_csv('semevaluation.txt', sep = "\t")
tweet_text = data['Tweet text']

#Code to find the number of vocabularies in the dataset ignoring the case
wordfreq = []
for sentence in tweet_text:
    texts = [word.lower() for word in sentence.split()]
    for i in texts:
      wordfreq.append(i.lower())

print("Number of Vocabulary in the dataset: ",len(set(wordfreq)))

Number of Vocabulary in the dataset:  17052


In [0]:
#Code to count the number of positive and negative sentences in the dataset
label = data['Label']
labelfreq = {}
for number in label:
    if number not in labelfreq.keys():
        labelfreq[number] = 1
    else:
        labelfreq[number] += 1
print("Number of Positive(1) and negative(0) labels: ",labelfreq)

Number of Positive(1) and negative(0) labels:  {1: 1901, 0: 1916}


# Task 2 (15 Marks)

Divide the data into a training and test set and justify your split.

Implement a function that calculates the precision, recall and F-Measure for this task.

In [0]:
#Splitting data to test and train. Train dataset has an ideal split of 80% of the overall data with test as 20%. More the train data, better the performance of the model.
#More the test data, lesser the variance. Hence the split is balanced to improve the model.
#[Reference: Lab solution 3]
train_test_cutoff = int(.80 * len(data)) 
training_sentences = data[:train_test_cutoff]
testing_sentences = data[train_test_cutoff:]
X_train = training_sentences['Tweet text'].tolist()
y_train = training_sentences['Label'].tolist()
X_test = testing_sentences['Tweet text'].tolist()
y_test = testing_sentences['Label'].tolist()

In [0]:
#Function implemented to calculate the true positive(tp), true negative(tn), false positive(fp), false negative(fn)
tp = 0
tn = 0
fp = 0
fn = 0
def calc(y_true, y_pred): 
  global tp
  global tn
  global fp
  global fn
  if y_true == 1 and y_pred == 1:
    tp = tp + 1
  elif y_true == 0 and y_pred == 1:
    fp = fp + 1
  elif y_true == 1 and y_pred == 0:
    fn = fn + 1
  elif y_true == 0 and y_pred == 0:
    tn = tn + 1
  return tp, fp, fn, tn

In [0]:
#Function to calculate precision
def precision(tp, fp):  
  precision = tp / (tp + fp) 
  print('Precision: %f' % precision)
  return precision

In [0]:
#Function to calculate recall
def recall(tp, fn):
  recall = tp / (tp + fn)
  print('Recall: %f' % recall)
  return recall

In [0]:
#Function to calculate f1 score
def f1(tp, fp, fn):
  f1 = (2 * tp) / ((2 * tp) + fp + fn)
  print('F1 score: %f' % f1)
  return f1

# Task 3 (15 Marks)

Suggest some features to extract from each sentence. Implement a simple log-linear model to classify tweets as ironic or not ironic.

Train this method and evaluate the results using precision, recall and F-Measure

In [0]:
#Importing libraries for Task 3
from sklearn.linear_model import LogisticRegression
from nltk.corpus import stopwords
nltk.download('stopwords')
from sklearn.feature_extraction.text import TfidfVectorizer

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [0]:
#Vectorizing the string feature using tf - idf vectorizer
vectorizer = TfidfVectorizer(stop_words = stopwords.words('english'))
X = X_train + X_test
vectorizer.fit(X)
X_train = vectorizer.transform(X_train)
X_test = vectorizer.transform(X_test)

In [0]:
#Building Logistic Regression model
logisticRegr = LogisticRegression()
logisticRegr.fit(X_train, y_train)
y_pred = logisticRegr.predict(X_test)

In [0]:
# Use score method to get accuracy of model
score = logisticRegr.score(X_test, y_test)
print("Accuracy is:",score)

Accuracy is: 0.6243455497382199


In [0]:
#Function call for calculating precision, Recall, and F1
for i in range(len(y_test)):
  true_pos, false_pos, false_neg, true_neg = calc(y_test[i], y_pred[i])
precision_score = precision(true_pos, false_pos)
recall_score = recall(true_pos, false_neg)
f1_score = f1(true_pos, false_pos, false_neg)

Precision: 0.618076
Recall: 0.576087
F1 score: 0.596343


# Task 4 (25 Marks)

Develop an acceptor or a transducer recurrent neural network that classifiers the sentence as ironic or not ironic.

Evaluate this according to precision, recall or F-Measure

In [0]:
#[References]
#https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/text/text_classification_rnn.ipynb#scrollTo=zIwH3nto596k
#https://www.youtube.com/watch?v=dzoh8cfnvnI&feature=youtu.be
#https://jovianlin.io/keras-one-hot-encode-decode-sequence-data/
#https://keras.io/models/model/#fit

#Imported libraries for Task 4
import keras
import keras.utils
from keras import optimizers
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import tensorflow_datasets as tfds
import tensorflow as tf
import numpy as np

#Splitting data to test and train. Train dataset has an ideal split of 80% of the overall data with test as 20%. More the train data, better the performance of the model.
#More the test data, lesser the variance.
train_test_cutoff = int(.80 * len(data)) 
training_sentences = data[:train_test_cutoff]
testing_sentences = data[train_test_cutoff:]
X_train_4 = training_sentences['Tweet text'].tolist()
y_train_4 = training_sentences['Label'].tolist()
X_test_4 = testing_sentences['Tweet text'].tolist()
y_test_4 = testing_sentences['Label'].tolist()

vocab_size = len(set(wordfreq))

#Preprocessing the data
tokenizer = Tokenizer(num_words=vocab_size)
tokenizer.fit_on_texts(X_train_4)

X_train_4 = tokenizer.texts_to_sequences(X_train_4)
X_train_4 = pad_sequences(X_train_4)
y_train_4 = to_categorical(y_train_4)

X_test_4 = tokenizer.texts_to_sequences(X_test_4)
X_test_4 = pad_sequences(X_test_4)
y_test_4 = to_categorical(y_test_4)

In [0]:
#Keras generative neural network model for text classification
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, 64)
])
model.add(tf.keras.layers.LSTM(64,return_sequences=True))
model.add(tf.keras.layers.LSTM(32,return_sequences=False))
model.add(tf.keras.layers.Dense(2, activation='relu'))

In [0]:
#Compiler method with SGD optimizer
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error',
              optimizer='sgd',
              metrics=['accuracy'])

In [0]:
X_train_4.shape, y_train_4.shape

((3053, 196), (3053, 2))

In [0]:
#Model fit method
history = model.fit(X_train_4, y_train_4, epochs=10,
                    validation_split=0.2,shuffle=True, 
                    validation_steps=30)

Train on 2442 samples, validate on 611 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [0]:
#Code to evaluate model 
loss, acc = model.evaluate(X_test_4, y_test_4)

print('Test Loss: {}'.format(loss))
print('Test Accuracy: {}'.format(acc))

Test Loss: 0.2489225359643317
Test Accuracy: 0.5340313911437988


In [0]:
#predicted values
y_pred_4 = model.predict(X_test_4)

In [0]:
#One hot encoding through argmax
def decode(x):
    return np.argmax(x)

In [0]:
#Decoding using one hot argmax function
y_prediction_4 = []
for i in range(y_pred_4.shape[0]):
    decoded_pred_4 = decode(y_pred_4[i])
    y_prediction_4.append(decoded_pred_4)
y_prediction_4 = np.array(y_prediction_4)

In [0]:
#Decoding using one hot argmax function
y_test_data_4 = []
for i in range(y_test_4.shape[0]):
    decoded_test_4 = decode(y_test_4[i])
    y_test_data_4.append(decoded_test_4)
y_test_data_4 = np.array(y_test_data_4)

In [0]:
#Function call for calculation Precision, Recall and F1
for i in range(len(y_test_data_4)):
  true_pos, false_pos, false_neg, true_neg = calc(y_test_data_4[i], y_prediction_4[i])
precision_score_4 = precision(true_pos, false_pos)
recall_score_4 = recall(true_pos, false_neg)
f1_score_4 = f1(true_pos, false_pos, false_neg)

Precision: 0.573370
Recall: 0.573370
F1 score: 0.573370


# Task 5 (40 Marks)

Suggest an improvement to either the system developed in Task 3 or 4 and show that it improves according to your evaluation metric.

Please note this task is marked according to: demonstration of knowledge from the lecutures (10), originality and appropriateness of solution (10), completeness of description (10), technical correctness (5) and improvement in evaluation metric (5).

In [0]:
#Splitting the data into train and test dataset
train_test_cutoff = int(.80 * len(data)) 
training_sentences = data[:train_test_cutoff]
testing_sentences = data[train_test_cutoff:]
X_train_5 = training_sentences['Tweet text'].tolist()
y_train_5 = training_sentences['Label'].tolist()
X_test_5 = testing_sentences['Tweet text'].tolist()
y_test_5 = testing_sentences['Label'].tolist()

In [0]:
#Preprocessing the X and y data accordingly
vocab_size = len(set(wordfreq))

tokenizer = Tokenizer(num_words=vocab_size)
tokenizer.fit_on_texts(X_train_5)

X_train_5 = tokenizer.texts_to_sequences(X_train_5)
X_train_5 = pad_sequences(X_train_5)
y_train_5 = to_categorical(y_train_5)

X_test_5 = tokenizer.texts_to_sequences(X_test_5)
X_test_5 = pad_sequences(X_test_5)
y_test_5 = to_categorical(y_test_5)


In [0]:
#Model to perform text classification, added dropout layer, recurrent dropout and an additional layer to improve performance
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, 64)
])
model.add(tf.keras.layers.LSTM(64,dropout=0.4, recurrent_dropout=0.4,return_sequences=True))
model.add(tf.keras.layers.LSTM(32,dropout=0.4, recurrent_dropout=0.4,return_sequences=False))

model.add(layers.Dense(10, activation='relu'))
model.add(tf.keras.layers.Dense(2, activation='relu'))

In [0]:
#Compiler using Adam optimizer to improve accuracy
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(1e-4),
              metrics=['accuracy'])

In [0]:
#Model fit method
history = model.fit(X_train_5, y_train_5, epochs=10,
                    validation_split=0.2,shuffle=True, 
                    validation_steps=30)

Train on 2442 samples, validate on 611 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [0]:
#Code to evaluate model,Accuracy has improved to 56 % on test dataset
loss, acc = model.evaluate(X_test_5, y_test_5)

print('Test Loss: {}'.format(loss))
print('Test Accuracy: {}'.format(acc))

Test Loss: 0.6854712922535642
Test Accuracy: 0.5621727705001831


In [0]:
#predicted values
y_pred_5 = model.predict(X_test_5)

In [0]:
#One hot encoding through argmax
def decode(x):
    return np.argmax(x)

In [0]:
#Decoding using one hot argmax function
y_prediction_5 = []
for i in range(y_pred_5.shape[0]):
    decoded_pred_5 = decode(y_pred_5[i])
    y_prediction_5.append(decoded_pred_5)
y_prediction_5 = np.array(y_prediction_5)

In [0]:
#Decoding using one hot argmax function
y_test_data_5 = []
for i in range(y_test_5.shape[0]):
    decoded_test_5 = decode(y_test_5[i])
    y_test_data_5.append(decoded_test_5)
y_test_data_5 = np.array(y_test_data_5)

In [0]:
#Function call for calculation Precision, Recall and F1
for i in range(len(y_test_data_5)):
  true_pos, false_pos, false_neg, true_neg = calc(y_test_data_5[i], y_prediction_5[i])
precision_score_5 = precision(true_pos, false_pos)
recall_score_5 = recall(true_pos, false_neg)
f1_score_5 = f1(true_pos, false_pos, false_neg)

Precision: 0.562581
Recall: 0.592391
F1 score: 0.577101
