# Load Data

### Approach

##### Information

1. We have 30 alphabets, with total of 964 characters.
2. Each alphabet stored in seperate folder.
3. Each letter in the alphabet has its own dedicated folder.
4. There are 20 images for each letter.

##### Procedure

1. loop through all alphabets
2. in it loop through all letters
3. in it loop through all images
4. While iterating we store the following
    - dictionary to map, alphabet name and dedicated range of labels(unique ID) for letters (min label ,max label)
    - dictionary to map, these labels(ID) with respective (alphabet, letter) tuple.
    - increment a counter for number of letter transversed
    - store image data
    - store its corresponding label

### Import Relevant Libraries

In [1]:
import numpy as np 
import os
import cv2
import pickle

### Load data function

In [2]:
def loading(path, n=0):
    # containers:
    image_data = []
    label = []
    alphabet_range = {}
    labels_map_letters = {}
    current_id = n

    # for every alphabet:
    for alphabet in os.listdir(path):  #os.listdir : returns a list of all entries in the directory
        print("Loading the alphabet: " + alphabet)
        alphabet_range[alphabet] = [current_id, None] # we store starting label of alphabet
        alphabet_path = os.path.join(path, alphabet) 
        # constructs full pathname by joining 'path'(directory path) with 'alphabet'(file name)

        # for every letter:
        for letter in os.listdir(alphabet_path):
            labels_map_letters[current_id] = (alphabet, letter)
            letter_path = os.path.join(alphabet_path,letter)
            category_images=[] # store all letters temporarily

            #  for every image:
            for file_name in os.listdir(letter_path):
                image_path = os.path.join(letter_path,file_name)
                image = cv2.imread(image_path)
                #image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
                category_images.append(image)
                label.append(current_id)
                # we stored ranges of label to be in one category like label 0 to 19 under alphabet 1

            # error handling in case where stacking of images gives error 
            # (like when images are of different dimentions so unable to stack)
            try: 
                image_data.append(np.stack(category_images))
            except ValueError as e:
                print(e)
                print("error - category_images: ", category_images)

            alphabet_range[alphabet][1] = current_id # finally we store the ending range of this alphabet
            current_id +=1 

    label = np.vstack(label)
    image_data = np.stack(image_data)

    return image_data, label, alphabet_range

#### Training Data

In [3]:
train_path = '../Data/images_background'
save_path = '../Data/'
train_data, train_label, train_range = loading(train_path)

# with - ensures the file is properly closed after the block of code executes
with open(os.path.join(save_path,"train_data.pickle"), 'wb') as f: #'wb' - file will be opened in writing mode
    pickle.dump( (train_data, train_range), f) # f is opened pickle file

# we stored image data, corresponding labels and alphabet range dictionary

Loading the alphabet: Alphabet_of_the_Magi
Loading the alphabet: Anglo-Saxon_Futhorc
Loading the alphabet: Arcadian
Loading the alphabet: Armenian
Loading the alphabet: Asomtavruli_(Georgian)
Loading the alphabet: Balinese
Loading the alphabet: Bengali
Loading the alphabet: Blackfoot_(Canadian_Aboriginal_Syllabics)
Loading the alphabet: Braille
Loading the alphabet: Burmese_(Myanmar)
Loading the alphabet: Cyrillic
Loading the alphabet: Early_Aramaic
Loading the alphabet: Futurama
Loading the alphabet: Grantha
Loading the alphabet: Greek
Loading the alphabet: Gujarati
Loading the alphabet: Hebrew
Loading the alphabet: Inuktitut_(Canadian_Aboriginal_Syllabics)
Loading the alphabet: Japanese_(hiragana)
Loading the alphabet: Japanese_(katakana)
Loading the alphabet: Korean
Loading the alphabet: Latin
Loading the alphabet: Malay_(Jawi_-_Arabic)
Loading the alphabet: Mkhedruli_(Georgian)
Loading the alphabet: N_Ko
Loading the alphabet: Ojibwe_(Canadian_Aboriginal_Syllabics)
Loading the alpha

In [4]:
train_data.shape, train_label.shape

((964, 20, 105, 105, 3), (19280, 1))

#### Validation Data

In [5]:
val_path = '../Data/images_evaluation'
val_data, val_label, val_range = loading(val_path)


# with - ensures the file is properly closed after the block of code executes
with open(os.path.join(save_path,"val_data.pickle"), 'wb') as f: #'wb' - file will be opened in writing mode
    pickle.dump( (val_data, val_range), f) # f is opened pickle file

# we stored image data, corresponding labels and alphabet range dictionary

Loading the alphabet: Angelic
Loading the alphabet: Atemayar_Qelisayer
Loading the alphabet: Atlantean
Loading the alphabet: Aurek-Besh
Loading the alphabet: Avesta
Loading the alphabet: Ge_ez
Loading the alphabet: Glagolitic
Loading the alphabet: Gurmukhi
Loading the alphabet: Kannada
Loading the alphabet: Keble
Loading the alphabet: Malayalam
Loading the alphabet: Manipuri
Loading the alphabet: Mongolian
Loading the alphabet: Old_Church_Slavonic_(Cyrillic)
Loading the alphabet: Oriya
Loading the alphabet: Sylheti
Loading the alphabet: Syriac_(Serto)
Loading the alphabet: Tengwar
Loading the alphabet: Tibetan
Loading the alphabet: ULOG
