# Quora Dataset Challenge - To predict if a pair of questions are duplicates

### The Quora dataset challenge is about predicting whether two questions have similar meaning or not.It is important for Quora to detect duplicate questions to save space and avoid the hassle of answering the same questions for the users.
### The goal of this project is to explore natural language processing techniques and integrate them with neural networks. 

### Let's start by importing the required libraries

In [1]:
import pandas as pd
import os
import matplotlib.pyplot as plt
import numpy as np
import nltk
from nltk.corpus import stopwords
import string
from nltk.tokenize import word_tokenize
from nltk.stem.wordnet import WordNetLemmatizer
import sklearn
from sklearn.utils import shuffle
import gensim
import fuzzywuzzy
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
from collections import Counter
from gensim.models import KeyedVectors
from gensim.scripts.glove2word2vec import glove2word2vec
import tensorflow as tf
import sklearn
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
os.chdir("C:\\Users\\aksha\\Desktop\\quora_data")



### Reading the dataset from a csv file and storing it in a dataframe

In [2]:
quora_data = pd.read_csv('train.csv')
quora_data.head(10)

Unnamed: 0,id,qid1,qid2,question1,question2,is_duplicate
0,0,1,2,What is the step by step guide to invest in sh...,What is the step by step guide to invest in sh...,0
1,1,3,4,What is the story of Kohinoor (Koh-i-Noor) Dia...,What would happen if the Indian government sto...,0
2,2,5,6,How can I increase the speed of my internet co...,How can Internet speed be increased by hacking...,0
3,3,7,8,Why am I mentally very lonely? How can I solve...,Find the remainder when [math]23^{24}[/math] i...,0
4,4,9,10,"Which one dissolve in water quikly sugar, salt...",Which fish would survive in salt water?,0
5,5,11,12,Astrology: I am a Capricorn Sun Cap moon and c...,"I'm a triple Capricorn (Sun, Moon and ascendan...",1
6,6,13,14,Should I buy tiago?,What keeps childern active and far from phone ...,0
7,7,15,16,How can I be a good geologist?,What should I do to be a great geologist?,1
8,8,17,18,When do you use シ instead of し?,"When do you use ""&"" instead of ""and""?",0
9,9,19,20,Motorola (company): Can I hack my Charter Moto...,How do I hack Motorola DCX3400 for free internet?,0


### So the dataset contains pair of questions on every row and label whether they are duplicate or not. 1 represents duplicate pair of questions and 0 represents non duplicate questions.

### We do not need the id, qid1, qid2 columns as they do not provide any useful information. Let's drop them.

In [3]:
quora_data = quora_data.drop(['id','qid1','qid2'],axis=1)

### Imbalanced datasets are problematic. Let's check if our dataset is imbalanced

In [4]:
quora_data['is_duplicate'].value_counts()/len(quora_data)

0    0.630802
1    0.369198
Name: is_duplicate, dtype: float64

### So the dataset seems imbalanced with 64% of the labels belonging to '0' class. When we split the dataset in test and train sets the problem may even get more serious. So we split the original dataset according to the labels. Then we select all the rows with '1' label and equal number of '0' labels to make the distribution uniform and then shuffle the data. (However in the process I am losing the other '0' label rows)

In [5]:
quora_data_positive = quora_data['is_duplicate'] ==1
quora_data_negative = quora_data['is_duplicate'] ==0

In [6]:
quora_df_1 = quora_data[quora_data_positive]
quora_df_1.reset_index();

In [7]:
quora_df_2 = quora_data[quora_data_negative][:len(quora_df_1)]
quora_df_2.reset_index();

In [8]:
quora_df = pd.DataFrame()
quora_df = quora_df.append(quora_df_1).append(quora_df_2)

In [9]:
quora_df = shuffle(quora_df)
quora_df = quora_df.reset_index()
quora_df.head(10)

Unnamed: 0,index,question1,question2,is_duplicate
0,227636,Should I heat my room with a ceramic tower hea...,How can I keep my room warm without heater?,0
1,173323,How much does Arijit Singh charge to sing a song?,Which is an easy Hindi song by Arijit to sing?,0
2,276645,What are the best hotels in Rajasthan for stay...,Where can I find best hotels at Rajasthan for ...,1
3,56091,What the purpose of life on earth?,What is the meaning of life? Whats our purpose...,1
4,199525,Which is the best pilot training academy in In...,What are the best commercial pilot training sc...,1
5,96466,How do aromatherapy diffusers work?,How does aromatherapy help depression?,0
6,213811,Is religion the biggest scam mankind has ever ...,"Is religion a scam, and if so, how?",0
7,197533,How can I find a pro bono lawyer?,How do I find a pro bono lawyer?,1
8,228727,Is there a way to learn about literature while...,I am a used car dealer in Uttar Pradesh.I want...,0
9,101100,How does NEFT and RTGS differ?,How does NEFT/RTGS work?,0


### Checking if there are any missing values 

In [10]:
quora_df.isnull().sum()

index           0
question1       0
question2       2
is_duplicate    0
dtype: int64

### Since the number of missing values is very low I am deleting the rows with missing values

In [11]:
quora_df.dropna(inplace=True)
quora_df.reset_index(drop=True, inplace=True)

### Now that we have cleaned the dataset, we can start creating the features. The first feature that I consider is the difference of word count between the two questions.

In [12]:
word_count_difference = [abs(len(set(quora_df['question1'][i].lower().split(" "))) - len(set(quora_df['question2'][i].lower().split(" ")))) for i in range(len(quora_df))]

### Now most of the questions end with the same last word. So the next feature that I consider is a boolean feature whose value is '1' if the last words of both the questions match or '0' if the last words do not match

In [13]:
last_word_check = [ 1 if (quora_df['question1'][i].replace("?","").split(" ")[-1].lower() == quora_df['question2'][i].replace("?","").split(" ")[-1].lower()) else 0 for i in range(len(quora_df))  ]

### The first word of a question is mostly an interogative word represents the tone of the question such as when,where,which and gives out information if the question is regarding time, place or any object. Two similar questions would have the same tone and generally starts with the same interogative word. 
### So the next feature is a boolean feacture whose value is '1' if the first word of both the questions match and '0' if they do not match. 

In [14]:
first_word_check =  [ 1 if (quora_df['question1'][i].split(" ")[0].lower() == quora_df['question2'][i].split(" ")[0].lower()) else 0 for i in range(len(quora_df))]

### For some of the next features I have removed the stop words and compared the remaining words. However I have not used the inbuilt nltk stop words library as it considers certain important words as stop words such as "before","after" etc. So I have created my own stop words list by removing those important words from the inbuilt stop words list. 

In [15]:
stop_words = ['i','me','my','we','our','ours','ourselves', 'you', "you're","you've","you'll","you'd",'your','yours','yourself','yourselves','he','him', 'his','himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's",'its', 'itself', 'they', 'them', 'their', 'theirs','themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'of', 'at', 'by', 'to', 'from', 'when', 'where', 'why', 'how', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'don', "don't", 'should', "should've", 'd', 'll', 'm', 'o', 're', 've', 'y','ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't",'hasn', "hasn't" 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]

### In the next two cells I have removed stop words from every question and stored the remaining words in a set. So for every question I have a set of important words   

In [16]:
word_tokens_question1 = [(set(word_tokenize(question.strip(string.punctuation).lower()))-set(stop_words)) if (len(set(word_tokenize(question.strip(string.punctuation).lower()))-set(stop_words))!=0) else set(word_tokenize(question.strip(string.punctuation).lower()))  for question in quora_df['question1']]

In [17]:
word_tokens_question2 = [(set(word_tokenize(question.strip(string.punctuation).lower()))-set(stop_words)) if (len(set(word_tokenize(question.strip(string.punctuation).lower()))-set(stop_words))!=0) else set(word_tokenize(question.strip(string.punctuation).lower()))  for question in quora_df['question2']]

### Now the next feature that I have considered is the character difference between two questions. I create dictionary of letters and their count for every question. Then I compare the dictionaries of the two questions that are to be compared and find the difference in the count of different characters and sum up the difference. Questions with higher sum generally tend to be different. 

In [18]:
character_difference = []
for i in range(len(word_tokens_question1)):
    counter_question1 = Counter()
    counter_question2 = Counter()
    for words in word_tokens_question1[i]:
        counter_question1 = counter_question1 + Counter(words)
    for words in word_tokens_question2[i]:
        counter_question2 = counter_question2 + Counter(words)
    character_difference.append(len(((counter_question1 - counter_question2) + (counter_question2 - counter_question1)).values())) 

### The next feature considered is the count of same words in both the questions. Two questions with more number of similar words should be similar. 

In [19]:
word_similarity = []
for i in range(len(word_tokens_question1)):
    #print(i)
    word_similarity.append(len((word_tokens_question1[i]) & (word_tokens_question2[i]))/len((word_tokens_question1[i]) | (word_tokens_question2[i])))

### I have used fuzzy wuzzy library for the next 4 features. FuzzyWuzzy package in python was developed and open-sourced by Seatgeek to tackle the ticket search usecase for their website. Fuzzy string matching is the process of finding strings that match a given pattern approximately (rather than exactly). There are four popular types of fuzzy matching logic supported by fuzzywuzzy package:
### 1) Simple Ratio - uses pure Levenshtein Distance based matching 
### 2) Partial Ratio – matches based on best substrings 
### 3) Token Sort Ratio – tokenizes the strings and sorts them alphabetically before matching
### 4) Token Set Ratio – tokenizes the strings and compared the intersection and remainder 
### I have used all the above scores as features

### Calculating the ratio score 

In [20]:
simple_ratio = []
for i in range(len(quora_df)):
    #print(i)
    simple_ratio.append(fuzz.ratio(quora_df['question1'][i],quora_df['question2'][i]))

### Calculating the partial ratio

In [21]:
partial_ratio = []
for i in range(len(quora_df)):
    #print(i)
    partial_ratio.append(fuzz.partial_ratio(quora_df['question1'][i],quora_df['question2'][i]))

### Calculating the token sort ratio score

In [22]:
token_sort_ratio = []
for i in range(len(quora_df)):
    #print(i)
    token_sort_ratio.append(fuzz.token_sort_ratio(quora_df['question1'][i],quora_df['question2'][i]))

### Calculating the token set ratio

In [23]:
token_set_ratio = []
for i in range(len(quora_df)):
    #print(i)
    token_set_ratio.append(fuzz.token_set_ratio(quora_df['question1'][i],quora_df['question2'][i]))

### Word2Vec is an important NLP technique. Word2vec is a two-layer neural net that processes text. Its input is a text corpus and its output is a set of vectors: feature vectors for words in that corpus. It turns text into a numerical form that deep nets can understand and then the neural network computes similarity between two words based on the corpus. 

### GloVe is an unsupervised learning algorithm for obtaining vector representations for words developed by Stanford university. I have used file containing pre trained vectors obtained after implementing the algorithm on Wikipedia corpus. The file is 'glove.6B.100d.txt'. I then convert the glove file format into word2vec file format. The word2vec file is stored as 'word2vec_output_file_new'. 

In [24]:
glove_input_file = 'glove.6B.100d.txt'
word2vec_output_file_new = 'glove.6B.100d.txt.word2vec'
glove2word2vec(glove_input_file, word2vec_output_file_new)

(400000, 100)

### The next cell creates a model based on the word2vec embeddings defined in the previous cell.

In [25]:
model = KeyedVectors.load_word2vec_format(word2vec_output_file_new, binary=False)

### I have used this module to calculate the vectors for every word in a question and sum it up. So I get a vector for every question. Then I calculate the difference vector by taking the difference of the two vectors of the two questions to be compared. If two questions are related then the difference vector elements are supposed to be close to zero in value. Then I sum up the elements of the difference vector. Again lower this value higher are the chances for the two questions to be similar

In [26]:
diff_vec=[]
for i in range(len(quora_df)):
    #print(i)
    #diff_vec.append(0)
    sum_j=np.zeros((100,))
    for j in (word_tokens_question1[i]):
        if (j in model.wv.vocab):
            sum_j = sum_j + model[j]
    sum_k = np.zeros((100,))   
    for k in (word_tokens_question2[i]):
        if(k in model.wv.vocab):
            sum_k = sum_k + model[k]
    diff_vec.append((sum_j-sum_k).sum())

  import sys
  # This is added back by InteractiveShellApp.init_path()


### Let's append the features columns to the dataset dataframe

In [27]:
quora_df['character_difference'] = character_difference
quora_df['word_count_difference'] = word_count_difference
quora_df['last_word_check'] = last_word_check
quora_df['first_word_check'] = first_word_check
quora_df['word_similarity'] = word_similarity
quora_df['simple_ratio'] = simple_ratio
quora_df['partial_ratio'] = partial_ratio
quora_df['token_sort_ratio'] = token_sort_ratio
quora_df['token_set_ratio'] = token_set_ratio
quora_df['word2vec_score'] = diff_vec
quora_df.head(10)

Unnamed: 0,index,question1,question2,is_duplicate,character_difference,word_count_difference,last_word_check,first_word_check,word_similarity,simple_ratio,partial_ratio,token_sort_ratio,token_set_ratio,word2vec_score
0,227636,Should I heat my room with a ceramic tower hea...,How can I keep my room warm without heater?,0,15,5,0,0,0.181818,45,58,46,55,28.184022
1,173323,How much does Arijit Singh charge to sing a song?,Which is an easy Hindi song by Arijit to sing?,0,11,0,0,0,0.333333,44,44,58,59,-9.798698
2,276645,What are the best hotels in Rajasthan for stay...,Where can I find best hotels at Rajasthan for ...,1,2,1,1,0,0.777778,83,81,83,85,6.219662
3,56091,What the purpose of life on earth?,What is the meaning of life? Whats our purpose...,1,11,4,1,1,0.571429,56,70,76,100,15.366938
4,199525,Which is the best pilot training academy in In...,What are the best commercial pilot training sc...,1,11,1,1,0,0.625,70,66,70,79,-0.317527
5,96466,How do aromatherapy diffusers work?,How does aromatherapy help depression?,0,11,0,0,1,0.142857,66,63,62,65,-2.745762
6,213811,Is religion the biggest scam mankind has ever ...,"Is religion a scam, and if so, how?",0,16,2,0,1,0.25,54,60,49,67,9.255804
7,197533,How can I find a pro bono lawyer?,How do I find a pro bono lawyer?,1,2,0,1,1,0.8,92,91,92,95,8.841726
8,228727,Is there a way to learn about literature while...,I am a used car dealer in Uttar Pradesh.I want...,0,18,14,0,0,0.038462,14,30,39,34,-2.702942
9,101100,How does NEFT and RTGS differ?,How does NEFT/RTGS work?,0,8,2,0,1,0.166667,74,75,69,88,-7.135651


### Now that we have our features ready, we can build a neural network and try to train the model on this feature set. 
### X dataframe contains the training features and Y dataframe contains the labels. Then we create dummy columns to represent duplicate and non duplicate labels

In [29]:
X = quora_df[quora_df.columns[4:14]]
Y = quora_df[quora_df.columns[3]]
Y = pd.get_dummies(Y)

### Splitting the dataset into train and test sets 

In [30]:
train_x,test_x,train_y,test_y = train_test_split(X,Y,test_size=0.2,random_state=0)

In [31]:
train_x = train_x.astype('float32');
test_x = test_x.astype('float32');

In [32]:
#train_y = np.reshape(train_y,(train_y.shape[0],1))

In [33]:
train_x.shape

(238819, 10)

In [34]:
#test_y = np.reshape(test_y,(test_y.shape[0],1))

### Defining the no. of dimensions, no. of classes and hidden layers. 

In [35]:
n_dim = train_x.shape[1]
n_class = train_y.shape[1]

In [36]:
n_hidden_1 = 60
n_hidden_2 = 60
n_hidden_3 = 60
n_hidden_4 = 60

### The weights and biases variables are defined below. They are defined as variables as we need to update them with every iteration in training the neural network. To initialize their values I have used the tf.truncated_normal function.

In [37]:
############new
weights = {
    'h1': tf.Variable(tf.truncated_normal([n_dim,n_hidden_1])),
    'h2': tf.Variable(tf.truncated_normal([n_hidden_1,n_hidden_2])),
    'h3': tf.Variable(tf.truncated_normal([n_hidden_2,n_hidden_3])),
    'h4': tf.Variable(tf.truncated_normal([n_hidden_3,n_hidden_4])),
    'out': tf.Variable(tf.truncated_normal([n_hidden_4,n_class]))
}

biases = {
    'b1':tf.Variable(tf.truncated_normal([n_hidden_1])),
    'b2':tf.Variable(tf.truncated_normal([n_hidden_2])),
    'b3':tf.Variable(tf.truncated_normal([n_hidden_3])),
    'b4':tf.Variable(tf.truncated_normal([n_hidden_4])),
    'out':tf.Variable(tf.truncated_normal([n_class]))
}

### Epochs are set to 1000.  An epoch describes the number of times the algorithm processes the entire data set. 
### Batch size is set to 100. Batch size represents the number of data points that the algorithm processes in every iteration. 
### The display step just sets the number of epoch at which certain statistics of the model is displayed. 
### x and y are placeholders for the features and labels data. They are defined as placeholders so that we can assign the  data to it later.

In [38]:
training_epochs = 1000
display_step = 50
batch_size = 100

x = tf.placeholder("float", [None, n_dim])
y = tf.placeholder("float", [None, n_class])

### Next we define out network structure. I have used 4 hidden layers with sigmoid activation function. 

In [39]:
def multilayer_perceptron(x, weights, biases):
    
    layer1 = tf.add(tf.matmul(x,weights['h1']),biases['b1'])
    #layer1 = tf.reshape(layer1,[x.shape[0]*n_hidden_1,1])
    layer1 = tf.nn.sigmoid(layer1)
    
    
    layer2 = tf.add(tf.matmul(layer1,weights['h2']),biases['b2'])
    #layer2 = tf.reshape(layer2,[layer1.shape[0]*n_hidden_2,1])
    layer2 = tf.nn.sigmoid(layer2)
    
    layer3 = tf.add(tf.matmul(layer2,weights['h3']),biases['b3'])
    #layer3 = tf.reshape(layer3,[layer2.shape[0]*n_hidden_3,1])
    layer3 = tf.nn.sigmoid(layer3)
    
    layer4 = tf.add(tf.matmul(layer3,weights['h4']),biases['b4'])
    #layer4 = tf.reshape(layer4,[layer3.shape[0]*n_hidden_4,1])
    layer4 = tf.nn.sigmoid(layer4)
    
    out_layer = tf.matmul(layer4,weights['out']) + biases['out']
    #out_layer = tf.reshape(out_layer,[layer4.shape[0]*n_class,1])
    return out_layer                                       

### The next cell runs the neural network with given input (x, weights and biases) and calculates the values predicted by our model when the data passes through the network once. 

In [40]:
predictions = multilayer_perceptron(x, weights, biases)

### Next we define the cost of our model. I have used the cross entropy loss function. 

In [41]:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=predictions, labels=y))

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.



### optimizer defines the function to be used to optimize the model (weights and biases) based on the cost. I have used Adam optimizer which is a type of gradient descent algorithm. I have set the learning rate to 0.0001

In [42]:
optimizer = tf.train.AdamOptimizer(learning_rate=0.0001).minimize(cost)

### Next we run our neural network. We start the tensorflow session. The global_variables_initializer initializes our variables (weights and biases) with actual values around zero. Then it runs for the set number of epochs and finally we get the optimized weights and biases.

In [44]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for epoch in range(training_epochs):
        print ("epoch: ", epoch)
        avg_cost = 0.0
        total_batch = int(len(train_x) / batch_size)
        x_batches = np.array_split(train_x, total_batch)
        y_batches = np.array_split(train_y, total_batch)
        
        for i in range(total_batch):
            batch_x, batch_y = x_batches[i], y_batches[i]
            _, c = sess.run([optimizer, cost], 
                            feed_dict={
                                x: batch_x, 
                                y: batch_y 
                            })
            avg_cost += c / total_batch
        if epoch % 50 == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
    print("Optimization Finished!")
    correct_prediction = tf.equal(tf.argmax(predictions, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Accuracy:", accuracy.eval({x: test_x, y: test_y}))

epoch:  0
Epoch: 0001 cost= 0.604484250
epoch:  1
epoch:  2
epoch:  3
epoch:  4
epoch:  5
epoch:  6
epoch:  7
epoch:  8
epoch:  9
epoch:  10
epoch:  11
epoch:  12
epoch:  13
epoch:  14
epoch:  15
epoch:  16
epoch:  17
epoch:  18
epoch:  19
epoch:  20
epoch:  21
epoch:  22
epoch:  23
epoch:  24
epoch:  25
epoch:  26
epoch:  27
epoch:  28
epoch:  29
epoch:  30
epoch:  31
epoch:  32
epoch:  33
epoch:  34
epoch:  35
epoch:  36
epoch:  37
epoch:  38
epoch:  39
epoch:  40
epoch:  41
epoch:  42
epoch:  43
epoch:  44
epoch:  45
epoch:  46
epoch:  47
epoch:  48
epoch:  49
epoch:  50
Epoch: 0051 cost= 0.518627671
epoch:  51
epoch:  52
epoch:  53
epoch:  54
epoch:  55
epoch:  56
epoch:  57
epoch:  58
epoch:  59
epoch:  60
epoch:  61
epoch:  62
epoch:  63
epoch:  64
epoch:  65
epoch:  66
epoch:  67
epoch:  68
epoch:  69
epoch:  70
epoch:  71
epoch:  72
epoch:  73
epoch:  74
epoch:  75
epoch:  76
epoch:  77
epoch:  78
epoch:  79
epoch:  80
epoch:  81
epoch:  82
epoch:  83
epoch:  84
epoch:  85
epoc

epoch:  657
epoch:  658
epoch:  659
epoch:  660
epoch:  661
epoch:  662
epoch:  663
epoch:  664
epoch:  665
epoch:  666
epoch:  667
epoch:  668
epoch:  669
epoch:  670
epoch:  671
epoch:  672
epoch:  673
epoch:  674
epoch:  675
epoch:  676
epoch:  677
epoch:  678
epoch:  679
epoch:  680
epoch:  681
epoch:  682
epoch:  683
epoch:  684
epoch:  685
epoch:  686
epoch:  687
epoch:  688
epoch:  689
epoch:  690
epoch:  691
epoch:  692
epoch:  693
epoch:  694
epoch:  695
epoch:  696
epoch:  697
epoch:  698
epoch:  699
epoch:  700
Epoch: 0701 cost= 0.485473974
epoch:  701
epoch:  702
epoch:  703
epoch:  704
epoch:  705
epoch:  706
epoch:  707
epoch:  708
epoch:  709
epoch:  710
epoch:  711
epoch:  712
epoch:  713
epoch:  714
epoch:  715
epoch:  716
epoch:  717
epoch:  718
epoch:  719
epoch:  720
epoch:  721
epoch:  722
epoch:  723
epoch:  724
epoch:  725
epoch:  726
epoch:  727
epoch:  728
epoch:  729
epoch:  730
epoch:  731
epoch:  732
epoch:  733
epoch:  734
epoch:  735
epoch:  736
epoch:  73

### Conclusion: The project helped in exploring different natural language processing techniques that can be used to extract semantics out of a given sentence. The accuracy achieved by the model on test data is around 74%. To further increase the accuracy we can 

### 1) Add some more features 
### 2) We can use cross validation to expose our model to different types of sentence structures
### 3) Increase the depth of the network
