# Sign Language MNIST
### Contents of the notebook.
* Importing dataset.
* Data Preprocessing.
* Image Augmentation.
* Model creation and training.
* Testing model on test data.

### About the dataset
* The dataset format is patterned to match closely with the classic MNIST.
* Each training and test case represents a label (0-25) as a one-to-one map for each alphabetic letter A-Z (**and no cases for 9=J or 25=Z because of gesture motions**).
* The training data (27,455 cases) and test data (7172 cases) are approximately half the size of the standard MNIST but otherwise similar with a header row of label, pixel1,pixel2….pixel784 which represent a single 28x28 pixel image with grayscale values between 0-255.

![](https://storage.googleapis.com/kagglesdsdata/datasets%2F3258%2F5337%2Famer_sign2.png?GoogleAccessId=databundle-worker-v2@kaggle-161607.iam.gserviceaccount.com&Expires=1596398964&Signature=cUKmt2o%2F060VyoeUu9jpOYUhkcJ%2F639zVXND24JizRxQ1q0qxVQYYg3OYK0huHN9prmoh1yGEkbF9H4ipkmZmbwEN5wyWC2xjhqpjArXDlv%2BWUr9i7G%2BVQiPrdr%2F06BFyooOjsjJ5t7D%2FKwgp%2BAStYtGHrOyhaOxFfJcmphxG1PYz7qGTQtJ6EL9qDn%2BdshCtI1qbJb%2FYawL9azzBSbpj86ju%2F3QSkGlitK%2BYk8R9z9ZWDC6Hpe9Z89WbTnhIPYMqgMho6GfYuEVJenAdw8bJ2fdLVUV0XL06afQseEXVxiBOrqI8W1xWcO2gm94l1qBjRL%2BmHsAI4moEHrtJv3EFA%3D%3D)

In [None]:
import numpy as np
import pandas as pd
import keras
from keras.models import Sequential
from keras.layers import Dense, Conv2D , MaxPool2D , Flatten , Dropout , BatchNormalization
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ReduceLROnPlateau,EarlyStopping

# Importing Dataset

In [None]:
train=pd.read_csv('../input/sign-language-mnist/sign_mnist_train/sign_mnist_train.csv')
test=pd.read_csv('../input/sign-language-mnist/sign_mnist_test/sign_mnist_test.csv')

# Data Preprocessing
### Removing dependent column y from the dataframe. 

In [None]:
y_train=train['label']
y_test=test['label']
del train['label']
del test['label']

In [None]:
x_train=train.values
x_test=test.values

In [None]:
print(x_train.shape, x_test.shape)

### Reshaping the Arrays so that 2D images can be formed that will be used in CNN layers.

In [None]:
x_train=x_train.reshape(-1,28,28,1)
x_test=x_test.reshape(-1,28,28,1)

In [None]:
# Normalize the data
x_train = x_train / 255
x_test = x_test / 255


In [None]:
import matplotlib.pyplot as plt
plt.imshow(x_train[0][:,:,0],cmap='gray')
plt.title(y_train[0])
plt.show()

In [None]:
import seaborn as sns


### Check the distribution of the dataset.

In [None]:
g = sns.countplot(y_train)

y_train.value_counts()

### Use LabelBinarizer to convert dependent variables into [one-hot vectors](https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f)

In [None]:
from sklearn.preprocessing import LabelBinarizer
label_binarizer = LabelBinarizer()
y_train = label_binarizer.fit_transform(y_train)
y_test = label_binarizer.fit_transform(y_test)

In [None]:
y_train[0]

# Image Augmentation
### Image augmentation is used to prevent overfitting as it creates augmented images that help the model to learn better.

Refer the documention of [ImageDataGenerator](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator)

In [None]:
datagen = ImageDataGenerator(
        rotation_range=10, 
        zoom_range = 0.1,  
        width_shift_range=0.1,  
        height_shift_range=0.1)  


datagen.fit(x_train)

Refer the documentation of [ReduceLROnPlateau](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ReduceLROnPlateau)

In [None]:
learning_rate_reduction = ReduceLROnPlateau(monitor='val_accuracy', patience = 2, verbose=1,factor=0.5, min_lr=0.00001)

# Model Creation and Training

In [None]:
model = Sequential()
model.add(Conv2D(100 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu' , input_shape = (28,28,1)))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Conv2D(50 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Flatten())
model.add(Dense(units = 512 , activation = 'relu'))
model.add(Dropout(0.3))
model.add(Dense(units = 24 , activation = 'softmax'))
model.compile(optimizer = 'adam' , loss = 'categorical_crossentropy' , metrics = ['accuracy'])
model.summary()

In [None]:
history = model.fit(datagen.flow(x_train,y_train, batch_size = 128) ,epochs = 30 , validation_data = (x_test, y_test) , callbacks = [learning_rate_reduction, EarlyStopping(monitor='val_accuracy', patience=3)])

In [None]:
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label='val accuracy')
plt.legend()
plt.show()

In [None]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Accuracy on test data", score[1]*100)