**This code is for allowing us to download Kaggle data in Google Colab.**

Before you execute the code in the following cell, you need to follow these two steps:
-1.  Go to your Kaggle account, Scroll to API section and Click Expire API Token to remove previous tokens. 

-2. Click on Create New API Token - It will download kaggle.json file on your machine.

In [0]:
# Install kaggle
! pip install -q kaggle

# Upload the kaggle.json file that you downloaded locally
 from google.colab import files
 files.upload()

# Make directory named kaggle and copy kaggle.json file there
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/

# Change the permissions of the file
! chmod 600 ~/.kaggle/kaggle.json

# Download the dataset we need: ASL Alphabet
!kaggle datasets download -d grassknoted/asl-alphabet

# Unzip your downloaded dataset
!unzip asl-alphabet.zip -d asl-alphabet

**The following code also needs to be run only once. It is for connecting Colab to your Google Drive. When running this cell you will be given a password and you need to enter it here.**

In [0]:
from google.colab import drive
drive.mount('/content/drive')

**The code from the rest of the cells reads all photos and the associated class labels from the dataset into numpy arrays, splits them into training and testing set and finally, saves them into a .h5 file.**

Declaring some constant values.

In [0]:
CLASSES = ['A', 'B', 'C', 'D', 'del', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'nothing', 'O', 'P', 'Q', 'R', 'S', 'space',
           'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
TRAINING_DATA_FOLDER = 'asl-alphabet/asl_alphabet_train/asl_alphabet_train/'
NUMBER_OF_CLASSES = 29
SPLIT_RATIO = 0.1

**We use the TrainingDataReader class to read images. **

Initially we only read them into the training_data and training_class_labels attributes.

In [0]:
import os
import numpy as np
import cv2 as cv


class TrainingDataReader:

    def __init__(self):
      """
      attributes:
      training_data: contains data that we will use to train our model
      training_class_labels: actual classes that each training sample belongs to
      testing_data: contains data that we will use to test the accuracy of our trained model
      testing_class_labels: actual classes that each testing sample belongs to
      """
        self.number_of_classes = NUMBER_OF_CLASSES
        self.training_data = np.zeros(shape=(int((1 - SPLIT_RATIO) * 87000), 64, 64, 3), dtype='float32')
        self.training_class_labels = np.zeros(shape= int((1 - SPLIT_RATIO) * 87000), dtype='uint8')
        self.testing_data = np.zeros(shape=(int(SPLIT_RATIO * 87000), 64, 64, 3), dtype='float32')
        self.testing_class_labels = np.zeros(shape=int(SPLIT_RATIO * 87000), dtype='uint8')

    def read_training_data(self):
      """
      Obtains folder with data from the next class and calls 'read_data_for_one_class' to actually read the data
      """
        for category in CLASSES:
            path = os.path.join(TRAINING_DATA_FOLDER, category)
            class_num = CLASSES.index(category)
            self.read_data_for_one_class(path, class_num)

    def read_data_for_one_class(self, path, class_num):
      """
      Main method of this first part of the project.
      This method gets the path to a folder belonging to one class, iterates through the photos, reads them,
      normalizes them and adds them to the training data, as well as the corresponding class labels.
      """
            print('Currently reading data for class number {}...'.format(class_num))
            counter = class_num * 3000
            for image in os.listdir(path):
                try:
                    image_array = cv.imread(os.path.join(path, image), cv.IMREAD_COLOR)
                    new_array = cv.resize(image_array, (64, 64))

                    self.training_data[counter] = cv.normalize(new_array, None, alpha=0, beta=1, norm_type=cv.NORM_MINMAX, dtype=cv.CV_32F)
                    self.training_class_labels[counter] = class_num
                    counter = counter + 1
                except Exception as e:
                    pass
                    
training_data_reader = TrainingDataReader()
training_data_reader.read_training_data()

I had some problems installing sklearn here, but these lines of code solved the problem.

In [0]:
!pip uninstall sklearn -y
!pip install Cython
!pip install https://github.com/Santosh-Gupta/scikit-learn/archive/master.zip

Using sklearn's train_test_split method we will split our original data into training and testing set. It is very important to also note that this method shuffles the data by default. This is something that will be crucial to obtain a high accuracy not only on the training data, but on the validation data as well.

In [0]:
from sklearn.model_selection import train_test_split

training_data_reader.training_data, training_data_reader.testing_data, \
training_data_reader.training_class_labels, training_data_reader.testing_class_labels = \
train_test_split(training_data_reader.training_data, training_data_reader.training_class_labels, test_size = SPLIT_RATIO)

print('Splitting has been completed.')

Run this next cell to save your data in a file called hra_training_data.h5.

In [0]:
import h5py

with h5py.File('hra_training_data.h5', 'w') as hf:
  hf.create_dataset("training_data", data=training_data_reader.training_data)
  hf.create_dataset("training_class_labels", data=training_data_reader.training_class_labels)
  hf.create_dataset("testing_data", data=training_data_reader.testing_data)
  hf.create_dataset("testing_class_labels", data=training_data_reader.testing_class_labels)

Run this last cell in order to copy your data to your Google drive (make sure first that you mounted your drive).

After the copying is done, from your Google drive you can save the file locally. 

In [0]:
!cp hra_training_data.h5 "drive/My Drive/"