# Transfer Learning MNIST

* Train a simple convnet on the MNIST dataset the first 5 digits [0..4].
* Freeze convolutional layers and fine-tune dense layers for the classification of digits [5..9].

## 1. Import necessary libraries for the model

In [0]:
from __future__ import absolute_import, division, print_function
import numpy as np
import pandas as pd
import keras
from keras.datasets import cifar10, mnist
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, Flatten, Reshape
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
import pickle
from matplotlib import pyplot as plt
import seaborn as sns
plt.rcParams['figure.figsize'] = (15, 8)

## 2. Import MNIST data and create 2 datasets with one dataset having digits from 0 to 4 and other from 5 to 9 

In [34]:
%matplotlib inline
# Load/Prep the Data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print('x_train shape Before:', x_train.shape)
# create two datasets one with digits below 5 and one with 5 and above
x_train_lt5 = x_train[y_train < 5]
y_train_lt5 = y_train[y_train < 5]
x_test_lt5 = x_test[y_test < 5]
y_test_lt5 = y_test[y_test < 5]

x_train_gte5 = x_train[y_train >= 5]
y_train_gte5 = y_train[y_train >= 5] - 5
x_test_gte5 = x_test[y_test >= 5]
y_test_gte5 = y_test[y_test >= 5] - 5

x_train shape Before: (60000, 28, 28)


## 3. Print x_train, y_train, x_test and y_test for both the datasets

In [6]:
print(x_train_lt5,y_train_lt5,x_test_lt5,y_test_lt5)

[[[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 ...

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]] [0 4 1 ... 2 1 3] [[[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0

In [7]:
print(x_train_gte5,y_train_gte5,x_test_gte5,y_test_gte5)

[[[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 ...

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]] [0 4 0 ... 0 1 3] [[[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0

## ** 4. Let us take only the dataset (x_train, y_train, x_test, y_test) for Integers 0 to 4 in MNIST **
## Reshape x_train and x_test to a 4 Dimensional array (channel = 1) to pass it into a Conv2D layer

In [0]:
x_train_lt5 = x_train_lt5.reshape(x_train_lt5.shape[0], 28, 28, 1).astype('float32')
x_test_lt5 = x_test_lt5.reshape(x_test_lt5.shape[0], 28, 28, 1).astype('float32')

## 5. Normalize x_train and x_test by dividing it by 255

In [0]:
x_train_lt5/=255
x_test_lt5/=255

## 6. Use One-hot encoding to divide y_train and y_test into required no of output classes

In [0]:
y_train_lt5 = np_utils.to_categorical(y_train_lt5, 5)
y_test_lt5 = np_utils.to_categorical(y_test_lt5, 5)

## 7. Build a sequential model with 2 Convolutional layers with 32 kernels of size (3,3) followed by a Max pooling layer of size (2,2) followed by a drop out layer to be trained for classification of digits 0-4  

In [38]:
BATCH_SIZE = 32
EPOCHS = 10

# Define model
model = Sequential()

# 1st Conv Layer
model.add(Convolution2D(32, 3, 3, input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(0.5))

  


## 8. Post that flatten the data and add 2 Dense layers with 128 neurons and neurons = output classes with activation = 'relu' and 'softmax' respectively. Add dropout layer inbetween if necessary  

In [0]:
# Fully Connected Layer
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))

model.add(Dense(128))
model.add(Activation('relu'))

# Prediction Layer
model.add(Dense(5))
model.add(Activation('softmax'))

# Loss and Optimizer
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

## 9. Print the training and test accuracy

In [40]:
# Store Training Results
early_stopping = keras.callbacks.EarlyStopping(monitor='val_acc', patience=10, verbose=1, mode='auto')
callback_list = [early_stopping]

# Train the model2
model.fit(x_train_lt5, y_train_lt5, batch_size=BATCH_SIZE, nb_epoch=EPOCHS, 
          validation_data=(x_test_lt5, y_test_lt5), callbacks=callback_list)

  


Train on 30596 samples, validate on 5139 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fd1f01f24a8>

## 10. Make only the dense layers to be trainable and convolutional layers to be non-trainable

In [0]:
for layer in model.layers[:3]:
    layer.trainable=False
for layer in model.layers[3:]:
    layer.trainable=True

In [42]:
#Module to print colourful statements
from termcolor import colored

#Check which layers have been frozen 
for layer in model.layers:
  print (colored(layer.name, 'blue'))
  print (colored(layer.trainable, 'red'))

[34mconv2d_4[0m
[31mFalse[0m
[34mmax_pooling2d_4[0m
[31mFalse[0m
[34mdropout_4[0m
[31mFalse[0m
[34mflatten_5[0m
[31mTrue[0m
[34mdense_9[0m
[31mTrue[0m
[34mactivation_9[0m
[31mTrue[0m
[34mdense_10[0m
[31mTrue[0m
[34mactivation_10[0m
[31mTrue[0m
[34mdense_11[0m
[31mTrue[0m
[34mactivation_11[0m
[31mTrue[0m


## 11. Use the model trained on 0 to 4 digit classification and train it on the dataset which has digits 5 to 9  (Using Transfer learning keeping only the dense layers to be trainable)

In [43]:
x_train_gte5 = x_train_gte5.reshape(x_train_gte5.shape[0], 28, 28, 1).astype('float32')
x_train_gte5/=255
x_test_gte5 = x_test_gte5.reshape(x_test_gte5.shape[0], 28, 28, 1).astype('float32')
x_test_gte5/=255
y_train_gte5 = np_utils.to_categorical(y_train_gte5, 5)
y_test_gte5 = np_utils.to_categorical(y_test_gte5, 5)
model.fit(x_train_gte5, y_train_gte5, batch_size=BATCH_SIZE, nb_epoch=EPOCHS, 
          validation_data=(x_test_gte5, y_test_gte5), callbacks=callback_list)

Train on 29404 samples, validate on 4861 samples
Epoch 1/10
  992/29404 [>.............................] - ETA: 4s - loss: 1.2742 - acc: 0.7288

  
  'Discrepancy between trainable weights and collected trainable'


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fd1aa3e0550>

## 12. Print the accuracy for classification of digits 5 to 9

In [44]:
#Testing the model on train set
score = model.evaluate(x_train_gte5, y_train_gte5)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.0022912165159602784
Test accuracy: 0.9993198204325942


In [45]:
#Testing the model on test set
score = model.evaluate(x_test_gte5, y_test_gte5)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.03820245536642542
Test accuracy: 0.991154083521909


## Sentiment analysis <br> 

The objective of the second problem is to perform Sentiment analysis from the tweets data collected from the users targeted at various mobile devices.
Based on the tweet posted by a user (text), we will classify if the sentiment of the user targeted at a particular mobile device is positive or not.

### 13. Read the dataset (tweets.csv) and drop the NA's while reading the dataset

In [46]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
df=pd.read_csv(r'/content/drive/My Drive/tweets.csv',engine='python')
df.dropna(inplace=True)

### 14. Preprocess the text and add the preprocessed text in a column with name `text` in the dataframe.

In [0]:
def preprocess(text):
    try:
        return text.decode('ascii')
    except Exception as e:
        return text

In [86]:
df['text'] = [preprocess(text) for text in df.tweet_text]
df.head()

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product,text
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion,.@wesley83 I have a 3G iPhone. After 3 hrs twe...
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion,@jessedee Know about @fludapp ? Awesome iPad/i...
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion,@swonderlin Can not wait for #iPad 2 also. The...
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion,@sxsw I hope this year's festival isn't as cra...
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion,@sxtxstate great stuff on Fri #SXSW: Marissa M...


### 15. Consider only rows having Positive emotion and Negative emotion and remove other rows from the dataframe.

In [87]:
df['is_there_an_emotion_directed_at_a_brand_or_product'].value_counts()

Positive emotion                      2672
Negative emotion                       519
No emotion toward brand or product      91
I can't tell                             9
Name: is_there_an_emotion_directed_at_a_brand_or_product, dtype: int64

In [0]:
df=df[(df['is_there_an_emotion_directed_at_a_brand_or_product'] != "I can't tell")]
df=df[(df['is_there_an_emotion_directed_at_a_brand_or_product'] != "No emotion toward brand or product")]

In [101]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3191 entries, 0 to 9088
Data columns (total 4 columns):
tweet_text                                            3191 non-null object
emotion_in_tweet_is_directed_at                       3191 non-null object
is_there_an_emotion_directed_at_a_brand_or_product    3191 non-null object
text                                                  3191 non-null object
dtypes: object(4)
memory usage: 124.6+ KB


### 16. Represent text as numerical data using `CountVectorizer` and get the document term frequency matrix

#### Use `vect` as the variable name for initialising CountVectorizer.

In [0]:
# import and instantiate CountVectorizer (with the default parameters)
from sklearn.feature_extraction.text import CountVectorizer

In [0]:
vect = CountVectorizer(ngram_range=(1, 1))

In [102]:
vect.fit(df['text'])

CountVectorizer(analyzer='word', binary=False, decode_error='strict',
                dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
                lowercase=True, max_df=1.0, max_features=None, min_df=1,
                ngram_range=(1, 1), preprocessor=None, stop_words=None,
                strip_accents=None, token_pattern='(?u)\\b\\w\\w+\\b',
                tokenizer=None, vocabulary=None)

### 17. Find number of different words in vocabulary

In [103]:
# examine the fitted vocabulary
len(vect.get_feature_names())

5613

#### Tip: To see all available functions for an Object use dir

### 18. Find out how many Positive and Negative emotions are there.

Hint: Use value_counts on that column

In [104]:
df['is_there_an_emotion_directed_at_a_brand_or_product'].value_counts()

Positive emotion    2672
Negative emotion     519
Name: is_there_an_emotion_directed_at_a_brand_or_product, dtype: int64

### 19. Change the labels for Positive and Negative emotions as 1 and 0 respectively and store in a different column in the same dataframe named 'Label'

Hint: use map on that column and give labels

In [105]:
df["target"]=df["is_there_an_emotion_directed_at_a_brand_or_product"].map(lambda x: 1 if x=='Positive emotion' else 0)
df["target"].value_counts()

1    2672
0     519
Name: target, dtype: int64

### 20. Define the feature set (independent variable or X) to be `text` column and `labels` as target (or dependent variable)  and divide into train and test datasets

In [0]:
df_dtm=vect.transform(df['text'])

In [114]:
# examine the vocabulary and document-term matrix together
df_dtm_1=pd.DataFrame(df_dtm.toarray(), columns=vect.get_feature_names())
df_dtm_1.head()

Unnamed: 0,000,02,03,08,10,100,100s,100tc,101,106,10am,10k,10mins,10pm,10x,11,11ntc,11th,12,12b,12th,13,130,14,1406,1413,1415,15,150,1500,150m,157,15am,15k,16162,16gb,16mins,17,188,1986,...,yield,yikes,yo,yobongo,yonkers,york,you,youneedthis,your,yours,yourself,youtube,yowza,yr,yrs,yummy,yup,zaarly,zaarlyiscoming,zagg,zaggle,zappos,zazzle,zazzlesxsw,zazzlsxsw,ze,zelda,zeldman,zero,zimride,zing,zip,zite,zms,zombies,zomg,zone,zoom,zzzs,ύ_
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [115]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df_dtm_1,df['target'], test_size=0.2)
print (X_train.shape, y_train.shape)
print (X_test.shape, y_test.shape)

(2552, 5613) (2552,)
(639, 5613) (639,)


## 21. **Predicting the sentiment:**


### Use Naive Bayes and Logistic Regression and their accuracy scores for predicting the sentiment of the given text

In [0]:
# import and instantiate a Multinomial Naive Bayes model
from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()

In [117]:
# train the model using X_train_dtm
nb.fit(X_train, y_train)

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

In [0]:
# make class predictions for X_test_dtm
y_pred_class = nb.predict(X_test)

In [120]:
# calculate accuracy of class predictions
from sklearn import metrics
metrics.accuracy_score(y_test, y_pred_class)

0.8528951486697965

In [121]:
# print the confusion matrix
metrics.confusion_matrix(y_test, y_pred_class)

array([[ 33,  62],
       [ 32, 512]])

In [0]:
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(random_state=0, solver='lbfgs').fit(X_train, y_train)

In [123]:
y_pred_class = clf.predict(X_test)
metrics.accuracy_score(y_test, y_pred_class)

0.8826291079812206

In [124]:
# print the confusion matrix
metrics.confusion_matrix(y_test, y_pred_class)

array([[ 36,  59],
       [ 16, 528]])

## 22. Create a function called `tokenize_predict` which can take count vectorizer object as input and prints the accuracy for x (text) and y (labels)

In [0]:
def tokenize_test(vect):
    x_train_dtm = vect.fit_transform(x_train)
    print('Features: ', x_train_dtm.shape[1])
    x_test_dtm = vect.transform(x_test)
    nb = MultinomialNB()
    nb.fit(x_train_dtm, y_train)
    y_pred_class = nb.predict(x_test_dtm)
    print('Accuracy: ', metrics.accuracy_score(y_test, y_pred_class))

### Create a count vectorizer function which includes n_grams = 1,2  and pass it to tokenize_predict function to print the accuracy score

In [127]:
x_train, x_test, y_train, y_test = train_test_split(df['text'],df['target'], test_size=0.2)
vect = CountVectorizer(ngram_range=(1, 2))
tokenize_test(vect)

Features:  25801
Accuracy:  0.8607198748043818


In [129]:
vect = CountVectorizer(ngram_range=(1, 2),stop_words='english')
tokenize_test(vect)

Features:  19857
Accuracy:  0.863849765258216


### Create a count vectorizer function with stopwords = 'english' and max_features =300  and pass it to tokenize_predict function to print the accuracy score

In [130]:
vect = CountVectorizer(ngram_range=(1, 2),stop_words='english',max_features =300)
tokenize_test(vect)

Features:  300
Accuracy:  0.7715179968701096


### Create a count vectorizer function with n_grams = 1,2  and max_features = 15000  and pass it to tokenize_predict function to print the accuracy score

In [131]:
vect = CountVectorizer(ngram_range=(1, 2),stop_words='english',max_features =15000)
tokenize_test(vect)

Features:  15000
Accuracy:  0.8575899843505478


### Create a count vectorizer function with n_grams = 1,2  and include terms that appear at least 2 times (min_df = 2)  and pass it to tokenize_predict function to print the accuracy score

In [132]:
vect = CountVectorizer(ngram_range=(1, 2),stop_words='english',max_features =15000,min_df=2)
tokenize_test(vect)

Features:  5628
Accuracy:  0.8450704225352113
