# Transfer Learning MNIST

* Train a simple convnet on the MNIST dataset the first 5 digits [0..4].
* Freeze convolutional layers and fine-tune dense layers for the classification of digits [5..9].

### Import MNIST data and create 2 datasets with one dataset having digits from 0 to 4 and other from 5 to 9 

#### Import the mnist dataset from keras datasets

In [17]:
#Importing important modules
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.callbacks import ModelCheckpoint, EarlyStopping
#Installing Tensorboard for Colab
!pip install tensorboardcolab



In [0]:
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

#### Creating two datasets one with digits below 5 and one with 5 and above

In [0]:
x_train_lt5 = x_train[y_train < 5]
y_train_lt5 = y_train[y_train < 5]
x_test_lt5 = x_test[y_test < 5]
y_test_lt5 = y_test[y_test < 5]



In [119]:
set(y_train_lt5)

{0, 1, 2, 3, 4}

In [0]:
x_train_gt5 = x_train[y_train >= 5]
y_train_gt5 = y_train[y_train >= 5] - 5  # make classes start at 0 for
x_test_gt5 = x_test[y_test >= 5]         # np_utils.to_categorical
y_test_gt5 = y_test[y_test >= 5] - 5

In [57]:
set(y_train_gt5)

{0, 1, 2, 3, 4}

In [58]:
set(y_train)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

### Check 

Verify shapes of x_train, y_train, x_test and y_test for both the datasets with the below given shapes.

In [121]:
x_test.shape

(10000, 28, 28)

In [122]:
print(x_train_lt5.shape)
print(y_train_lt5.shape)
print(x_test_lt5.shape)
print(y_test_lt5.shape)

(30596, 28, 28)
(30596,)
(5139, 28, 28)
(5139,)


In [123]:
print(x_train_gt5.shape)
print(y_train_gt5.shape)
print(x_test_gt5.shape)
print(y_test_gt5.shape)

(29404, 28, 28)
(29404,)
(4861, 28, 28)
(4861,)


### Let us take only the dataset (x_train, y_train, x_test, y_test) for Integers 0 to 4 in MNIST
### Reshape x_train and x_test to a 4 Dimensional array (channel = 1) to pass it into a Conv2D layer

In [0]:
x_train_lt5 = x_train_lt5.reshape(x_train_lt5.shape[0],28,28,1)

In [9]:
x_train_lt5[0].shape

(28, 28, 1)

In [0]:
x_test_lt5 = x_test_lt5.reshape(x_test_lt5.shape[0],28,28,1)

In [11]:
x_test_lt5[0].shape

(28, 28, 1)

### Change into float32 datatype and Normalize x_train and x_test by dividing it by 255.0

In [0]:
x_train_lt5 = x_train_lt5.astype('float32')
x_test_lt5 = x_test_lt5.astype('float32')

#Normalizing the input
x_train_lt5 /= 255.0
x_test_lt5 /= 255.0


### Check

Verify the shapes of the X_train and X_test with the shapes given below.

In [127]:
print('x_train shape:', x_train_lt5.shape)
print(x_train_lt5.shape[0], 'train samples')
print(x_test_lt5.shape[0], 'test samples')

x_train shape: (30596, 28, 28, 1)
30596 train samples
5139 test samples


In [128]:
print('X_train shape:', x_train_lt5.shape)
print('X_test shape:', x_test_lt5.shape)

X_train shape: (30596, 28, 28, 1)
X_test shape: (5139, 28, 28, 1)


In [0]:
batch_size = 128
num_classes = 5
epochs = 5

### Use One-hot encoding to divide y_train and y_test into required no of output classes

In [0]:
# convert class vectors to binary class matrices
y_train_lt5 = keras.utils.to_categorical(y_train_lt5, num_classes)
y_test_lt5 = keras.utils.to_categorical(y_test_lt5, num_classes)

In [21]:
set(y_train_lt5)

{0, 1, 2, 3, 4}

In [0]:
# input image dimensions
img_rows, img_cols = 28, 28

#Keras expects data to be in the format (N_E.N_H,N_W,N_C)
#N_E = Number of Examples, N_H = height, N_W = Width, N_C = Number of Channels.
x_train_lt5 = x_train_lt5.reshape(x_train_lt5.shape[0], img_rows, img_cols, 1)
x_test_lt5 = x_test_lt5.reshape(x_test_lt5.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

### Build a sequential model with 2 Convolutional layers with 32 kernels of size (3,3) followed by a Max pooling layer of size (2,2) followed by a drop out layer to be trained for classification of digits 0-4  

In [0]:
#Initialize the model
model = Sequential()

#Add a Convolutional Layer with 32 filters of size 3X3 and activation function as 'ReLU' 
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))

#Add a Convolutional Layer with 64 filters of size 3X3 and activation function as 'ReLU' 
model.add(Conv2D(64, (3, 3), activation='relu'))

#Add a MaxPooling Layer of size 2X2 
model.add(MaxPooling2D(pool_size=(2, 2)))

#Apply Dropout with 0.25 probability 
model.add(Dropout(0.25))

### Post that flatten the data and add 2 Dense layers with 128 neurons and neurons = output classes with activation = 'relu' and 'softmax' respectively. Add dropout layer inbetween if necessary  

In [0]:
#Flatten the layer
model.add(Flatten())

#Add Fully Connected Layer with 128 units and activation function as 'ReLU'
model.add(Dense(128, activation='relu'))
#Apply Dropout with 0.5 probability 
model.add(Dropout(0.5))

#Add Fully Connected Layer with 10 units and activation function as 'softmax'
model.add(Dense(num_classes, activation='softmax'))

### Print the training and test accuracy for 5 epochs

In [0]:
from keras.optimizers import Adam
from keras.losses import categorical_crossentropy

#To use adam optimizer for learning weights with learning rate = 0.001
optimizer = Adam(lr=0.001)
#Set the loss function and optimizer for the model training
model.compile(loss=categorical_crossentropy,
              optimizer=optimizer,
              metrics=['accuracy'])

In [135]:
#Import tensorboardcolab modules for creating a tensorboard call back which will passed in model.fit function.
from tensorboardcolab import TensorBoardColab, TensorBoardColabCallback

#Tensorboard callback is going to be added to model.fit function to draw graphs of loss values after every epoch
tbc = TensorBoardColab()

Wait for 8 seconds...
TensorBoard link:
http://c5246f93.ngrok.io


In [0]:
#Adding Early stopping callback to the fit function is going to stop the training,
#if the val_loss is not going to change even '0.001' for more than 10 continous epochs
import os
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0.001, patience=10)

#Adding Model Checkpoint callback to the fit function is going to save the weights whenever val_loss achieves a new low value. 
#Hence saving the best weights occurred during training

model_checkpoint =  ModelCheckpoint('mnist_cnn_checkpoint_{epoch:02d}_loss{val_loss:.4f}.h5',
                                                           monitor='val_loss',
                                                           verbose=1,
                                                           save_best_only=True,
                                                           save_weights_only=True,
                                                           mode='auto',
                                                           period=1)

checkpoint_path = "mnist.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

# Create a callback that saves the model's weights
cp_callback = ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1)

In [141]:
#Training on the dataset and adding the all the callbacks to the fit function.
#Once the training starts, results start appearing on Tensorboard after 1 epoch
model.fit(x_train_lt5, y_train_lt5,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test_lt5, y_test_lt5),
          callbacks=[TensorBoardColabCallback(tbc),early_stopping,model_checkpoint,cp_callback])

Train on 30596 samples, validate on 5139 samples
Epoch 1/5

Epoch 00001: val_loss improved from inf to 0.00947, saving model to mnist_cnn_checkpoint_01_loss0.0095.h5

Epoch 00001: saving model to mnist.ckpt
Epoch 2/5

Epoch 00002: val_loss improved from 0.00947 to 0.00919, saving model to mnist_cnn_checkpoint_02_loss0.0092.h5

Epoch 00002: saving model to mnist.ckpt
Epoch 3/5

Epoch 00003: val_loss improved from 0.00919 to 0.00627, saving model to mnist_cnn_checkpoint_03_loss0.0063.h5

Epoch 00003: saving model to mnist.ckpt
Epoch 4/5

Epoch 00004: val_loss did not improve from 0.00627

Epoch 00004: saving model to mnist.ckpt
Epoch 5/5

Epoch 00005: val_loss improved from 0.00627 to 0.00475, saving model to mnist_cnn_checkpoint_05_loss0.0047.h5

Epoch 00005: saving model to mnist.ckpt


<keras.callbacks.History at 0x7f6feb1611d0>

In [149]:
#Testing the model on test set
score = model.evaluate(x_test_lt5, y_test_lt5)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.003304791110768503
Test accuracy: 0.9990270480638257


In [143]:
!ls {checkpoint_dir}

Graph				       mnist_cnn_checkpoint_03_loss0.0076.h5
mnist.ckpt			       mnist_cnn_checkpoint_03_loss0.0101.h5
mnist_cnn_checkpoint_01_loss0.0095.h5  mnist_cnn_checkpoint_04_loss0.0053.h5
mnist_cnn_checkpoint_01_loss0.0147.h5  mnist_cnn_checkpoint_04_loss0.0059.h5
mnist_cnn_checkpoint_01_loss0.0155.h5  mnist_cnn_checkpoint_05_loss0.0044.h5
mnist_cnn_checkpoint_01_loss0.0199.h5  mnist_cnn_checkpoint_05_loss0.0047.h5
mnist_cnn_checkpoint_02_loss0.0085.h5  mnist_cnn_checkpoint_08_loss0.0054.h5
mnist_cnn_checkpoint_02_loss0.0092.h5  mnist_cnn_checkpoint_10_loss0.0033.h5
mnist_cnn_checkpoint_02_loss0.0116.h5  sample_data
mnist_cnn_checkpoint_03_loss0.0063.h5


### Use the model trained on 0 to 4 digit classification and train it on the dataset which has digits 5 to 9  (Using Transfer learning keeping only the dense layers to be trainable)

### Make only the dense layers to be trainable and convolutional layers to be non-trainable

#### Check model summary to see model layer names

In [150]:
for layers in model.layers:
    print(layers.name)
    if('dense' not in layers.name):
        layers.trainable = False
        print(layers.name + 'is not trainable\n')
    if('dense' in layers.name):
        print(layers.name + ' is trainable\n')

conv2d_5
conv2d_5is not trainable

conv2d_6
conv2d_6is not trainable

max_pooling2d_3
max_pooling2d_3is not trainable

dropout_5
dropout_5is not trainable

flatten_3
flatten_3is not trainable

dense_5
dense_5 is trainable

dropout_6
dropout_6is not trainable

dense_6
dense_6 is trainable



### Do the required preprocessing for `x_train_gt5` also same as `x_train_lt5` and for `y_train_gt5` same as `y_train_lt5`

1. Reshape
2. Change to float32 datatype
3. Normalize (dividing with 255)
4. y_train and y_test Convert into one-hot vectors

Reshape

In [0]:
x_train_gt5 = x_train_gt5.reshape(x_train_gt5.shape[0],28,28,1)

In [152]:
x_train_gt5.shape

(29404, 28, 28, 1)

In [80]:
set(y_train_gt5)

{0, 1, 2, 3, 4}

In [0]:
x_test_gt5 = x_test_gt5.reshape(x_test_gt5.shape[0],28,28,1)

In [0]:
Change to Float and Normalize

In [0]:
x_train_gt5 = x_train_gt5.astype('float32')
x_test_gt5 = x_test_gt5.astype('float32')

#Normalizing the input
x_train_gt5 /= 255.0
x_test_gt5 /= 255.0

OneHot

In [0]:
# convert class vectors to binary class matrices
y_train_gt5 = keras.utils.to_categorical(y_train_gt5, num_classes)
y_test_gt5 = keras.utils.to_categorical(y_test_gt5, num_classes)

### Check

Verify the shapes with the given below.

In [144]:
!ls {checkpoint_dir}

Graph				       mnist_cnn_checkpoint_03_loss0.0076.h5
mnist.ckpt			       mnist_cnn_checkpoint_03_loss0.0101.h5
mnist_cnn_checkpoint_01_loss0.0095.h5  mnist_cnn_checkpoint_04_loss0.0053.h5
mnist_cnn_checkpoint_01_loss0.0147.h5  mnist_cnn_checkpoint_04_loss0.0059.h5
mnist_cnn_checkpoint_01_loss0.0155.h5  mnist_cnn_checkpoint_05_loss0.0044.h5
mnist_cnn_checkpoint_01_loss0.0199.h5  mnist_cnn_checkpoint_05_loss0.0047.h5
mnist_cnn_checkpoint_02_loss0.0085.h5  mnist_cnn_checkpoint_08_loss0.0054.h5
mnist_cnn_checkpoint_02_loss0.0092.h5  mnist_cnn_checkpoint_10_loss0.0033.h5
mnist_cnn_checkpoint_02_loss0.0116.h5  sample_data
mnist_cnn_checkpoint_03_loss0.0063.h5


In [0]:
#The pre-trained weights must exist in a folder called "data" in the current folder
model.load_weights('mnist_cnn_checkpoint_10_loss0.0033.h5')

In [157]:
print(x_train_gt5.shape)
print(y_train_gt5.shape)
print(x_test_gt5.shape)
print(y_test_gt5.shape)

(29404, 28, 28, 1)
(29404, 5)
(4861, 28, 28, 1)
(4861, 5)


## Print the accuracy for classification of digits 5 to 9

In [158]:
#Training on the dataset and adding the all the callbacks to the fit function.
#Once the training starts, results start appearing on Tensorboard after 1 epoch
model.fit(x_train_gt5, y_train_gt5,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test_gt5, y_test_gt5),
          callbacks=[early_stopping,model_checkpoint])

Train on 29404 samples, validate on 4861 samples
Epoch 1/5
  896/29404 [..............................] - ETA: 5s - loss: 3.1050 - acc: 0.5335

  'Discrepancy between trainable weights and collected trainable'



Epoch 00001: val_loss did not improve from 0.00475
Epoch 2/5

Epoch 00002: val_loss did not improve from 0.00475
Epoch 3/5

Epoch 00003: val_loss did not improve from 0.00475
Epoch 4/5

Epoch 00004: val_loss did not improve from 0.00475
Epoch 5/5

Epoch 00005: val_loss did not improve from 0.00475


<keras.callbacks.History at 0x7f6feb879f60>

In [159]:
#Testing the model on test set
score = model.evaluate(x_test_gt5, y_test_gt5)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.022041278657298862
Test accuracy: 0.994651306315573


# Text classification using TF-IDF

###  Load the dataset from sklearn.datasets

In [0]:
from sklearn.datasets import fetch_20newsgroups

In [0]:
categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med']

### Training data

In [89]:
twenty_train = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=42)

Downloading 20news dataset. This may take a few minutes.
Downloading dataset from https://ndownloader.figshare.com/files/5975967 (14 MB)


### Test data

In [0]:
twenty_test = fetch_20newsgroups(subset='test', categories=categories, shuffle=True, random_state=42)

###  a.  You can access the values for the target variable using .target attribute 
###  b. You can access the name of the class in the target variable with .target_names


In [91]:
twenty_train.target

array([1, 1, 3, ..., 2, 2, 2])

In [97]:
twenty_train.target_names

['alt.atheism', 'comp.graphics', 'sci.med', 'soc.religion.christian']

In [98]:
twenty_train.target

array([1, 1, 3, ..., 2, 2, 2])

In [93]:
twenty_train.data[0:5]

['From: sd345@city.ac.uk (Michael Collier)\nSubject: Converting images to HP LaserJet III?\nNntp-Posting-Host: hampton\nOrganization: The City University\nLines: 14\n\nDoes anyone know of a good way (standard PC application/PD utility) to\nconvert tif/img/tga files into LaserJet III format.  We would also like to\ndo the same, converting to HPGL (HP plotter) files.\n\nPlease email any response.\n\nIs this the correct group?\n\nThanks in advance.  Michael.\n-- \nMichael Collier (Programmer)                 The Computer Unit,\nEmail: M.P.Collier@uk.ac.city                The City University,\nTel: 071 477-8000 x3769                      London,\nFax: 071 477-8565                            EC1V 0HB.\n',
 "From: ani@ms.uky.edu (Aniruddha B. Deglurkar)\nSubject: help: Splitting a trimming region along a mesh \nOrganization: University Of Kentucky, Dept. of Math Sciences\nLines: 28\n\n\n\n\tHi,\n\n\tI have a problem, I hope some of the 'gurus' can help me solve.\n\n\tBackground of the probl

### Now with dependent and independent data available for both train and test datasets, using TfidfVectorizer fit and transform the training data and test data and get the tfidf features for both

In [0]:
# define X and y
X = twenty_train.data
y = twenty_train.target


In [107]:
y

array([1, 1, 3, ..., 2, 2, 2])

In [0]:
from sklearn.model_selection import train_test_split
# split the new DataFrame into training and testing sets [Default test size = 25%]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

### Use logisticRegression with tfidf features as input and targets as output and train the model and report the train and test accuracy score

In [0]:
# define a function that accepts a vectorizer and calculates the accuracy
def tokenize_test(vect):
    X_train_dtm = vect.fit_transform(X_train)
    print('Features: ', X_train_dtm.shape[1])
    X_test_dtm = vect.transform(X_test)
    print('For NB:')
    nb = MultinomialNB()
    nb.fit(X_train_dtm, y_train)
    train_pred=nb.predict(X_train_dtm)
    print('Train Accuracy:',metrics.accuracy_score(y_train,train_pred))
    y_pred_class = nb.predict(X_test_dtm)
    print('Test Accuracy: ', metrics.accuracy_score(y_test, y_pred_class))
    
    # use logistic regression with all features
    print('Logistic Regression.....')
    logreg = LogisticRegression(C=1e9)
    logreg.fit(X_train_dtm, y_train)
    train_pred = logreg.predict(X_train_dtm)
    print('Train Accuracy:',metrics.accuracy_score(y_train,train_pred))
    y_pred_class = logreg.predict(X_test_dtm)
    print('Test Accuracy: ', metrics.accuracy_score(y_test, y_pred_class))

In [114]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
# include 1-grams and 2-grams
vect = CountVectorizer(ngram_range=(1, 2))
tokenize_test(vect)

Features:  239842
For NB:
Train Accuracy: 0.9994089834515366
Test Accuracy:  0.9610619469026549
Logistic Regression.....




Train Accuracy: 1.0
Test Accuracy:  0.952212389380531


In [115]:
# remove English stop words
vect = CountVectorizer(stop_words='english')
tokenize_test(vect)

Features:  30227
For NB:
Train Accuracy: 0.9988179669030733
Test Accuracy:  0.968141592920354
Logistic Regression.....




Train Accuracy: 1.0
Test Accuracy:  0.9451327433628318
