In this, we will see how to dp sentiment analysis of text data using Deep Neural Networks.

### Read the dataset (tweets.csv)

In [0]:
import pandas as pd

tw = pd.read_csv('gdrive/My Drive/AIML/Project/CV/External/tweets.csv', engine='python')

In [9]:
tw.shape

(9093, 3)

### Change the labels for Positive and Negative emotions as 1 and 0 respectively.

Hint: use map on that column and give labels `or` You can use labelEncoder also.

In [10]:
tw.is_there_an_emotion_directed_at_a_brand_or_product.value_counts()

No emotion toward brand or product    5389
Positive emotion                      2978
Negative emotion                       570
I can't tell                           156
Name: is_there_an_emotion_directed_at_a_brand_or_product, dtype: int64

In [11]:
df = tw[(tw['is_there_an_emotion_directed_at_a_brand_or_product'] == 'Positive emotion') | (tw['is_there_an_emotion_directed_at_a_brand_or_product'] == 'Negative emotion')]
df.head(10)

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion
7,"#SXSW is just starting, #CTIA is around the co...",Android,Positive emotion
8,Beautifully smart and simple idea RT @madebyma...,iPad or iPhone App,Positive emotion
9,Counting down the days to #sxsw plus strong Ca...,Apple,Positive emotion
10,Excited to meet the @samsungmobileus at #sxsw ...,Android,Positive emotion
11,Find &amp; Start Impromptu Parties at #SXSW Wi...,Android App,Positive emotion


In [12]:
df['is_there_an_emotion_directed_at_a_brand_or_product'].replace('Positive emotion', 1, inplace=True)
df['is_there_an_emotion_directed_at_a_brand_or_product'].replace('Negative emotion', 0, inplace=True)
df['is_there_an_emotion_directed_at_a_brand_or_product'].value_counts()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)


1    2978
0     570
Name: is_there_an_emotion_directed_at_a_brand_or_product, dtype: int64


### Convert Text Into numbers

In [0]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer

#### Build Keras Tokenizer and fit on the text using `fit_on_texts` with no.of words=3000

In [0]:
t = Tokenizer(3000)
t.fit_on_texts(df['tweet_text'])


#### Convert Text Into numbers using `texts_to_matrix` with `TF-IDF` mode

In [15]:
dtm = t.texts_to_matrix(df['tweet_text'], 'tfidf')
df_with_dtm = pd.DataFrame(dtm)
print(df_with_dtm.shape)
print(df['is_there_an_emotion_directed_at_a_brand_or_product'].shape)

(3548, 3000)
(3548,)


### Build the Graph

#### Normalize the data using BatchNormalization layer, add fully connected layers with `200, 100, 60, 30, 1` neurons  with `relu` activations for hidden layers and `sigmoid` activation for the output layer. Use `binary_crossentropy` loss and `adam` optimizer for training the model. And, report the final validation accuracy.

In [0]:
model = tf.keras.models.Sequential()

model.add(tf.keras.layers.Dense(200, activation='relu', input_shape=(3000,)) )
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(100, activation='relu'))
model.add(tf.keras.layers.Dense(60, activation='relu'))
model.add(tf.keras.layers.Dense(30, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

adam_optimizer = tf.keras.optimizers.Adam()

#Compile the model
model.compile(optimizer=adam_optimizer, loss='binary_crossentropy', metrics=['accuracy'])

In [0]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df_with_dtm, df['is_there_an_emotion_directed_at_a_brand_or_product'], test_size=0.3, random_state=5)

In [18]:
model.fit(X_train.values,y_train.values, validation_data=(X_test.values, y_test.values), epochs=30, batch_size=32)

Train on 2483 samples, validate on 1065 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<tensorflow.python.keras.callbacks.History at 0x7f558e42ac50>

In [19]:
loss_and_metrics_test = model.evaluate(X_test, y_test)
print(loss_and_metrics_test)

[0.7559794236796563, 0.8619718311538159]


In [20]:
print('Validation Accuracy: ', loss_and_metrics_test[1])

Validation Accuracy:  0.8619718311538159
