<a href="https://colab.research.google.com/github/cosmo3769/SSL-study/blob/classifier/image_classifier_iNaturalist_aves.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## References

[Image Classifier Keras](https://www.section.io/engineering-education/image-classifier-keras/)

[ImageDataGenerator Keras](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator)

## Setting up kaggle service to fetch the dataset

Go to your kaggle account. Generate an API token. The file named "kaggle.json" will be downloaded to your local system. Upload the file **kaggle.json** in the colab so to use the kaggle service in colab.  

In [None]:
# Install the kaggle library.

%%capture
! pip install kaggle

In [None]:
! mkdir ~/.kaggle

In [None]:
! cp kaggle.json ~/.kaggle/

In [None]:
! chmod 600 ~/.kaggle/kaggle.json

## Dataset

The dataset is taken from kaggle cometition on [Semi-Supervised Recognition Challenge - FGVC7](https://www.kaggle.com/competitions/semi-inat-2020/data). Here is the [GitHub page](https://github.com/cvl-umass/semi-inat-2020) giving the explanation of the dataset.

Some important points to note about dataset: 

| Split	| Details	| Classes	| Images |
| ----- | ------- | ------- | ------ |
| Train	| Labeled	| 200	    | 3,959  |
| Train	| Unlabeled, in-class	| 200	| 26,640 |
| Train	| Unlabeled, out-of-class |	-	| 122,208 |
| Val	  | Labeled	| 200	| 2,000 |
| Test | Public	| 200	| 4,000 |
| Test | Private | 200 | 4,000 |

In [None]:
! kaggle competitions download -c semi-inat-2020

Downloading semi-inat-2020.zip to /content
100% 14.3G/14.3G [04:39<00:00, 100MB/s] 
100% 14.3G/14.3G [04:39<00:00, 54.8MB/s]


In [None]:
%%capture
! unzip semi-inat-2020.zip

In [None]:
import os 

ANNOTATION_DIR = '/content/annotation/'
# os.listdir(ANNOTATION_DIR)

TRAINVAL_LABELLED_DIR = '/content/trainval_images/trainval_images/'
# os.listdir(TRAINVAL_LABELLED_DIR)

TRAIN_UNLABELLED_INCLASS_DIR = '/content/u_train_in/u_train_in/'
# os.listdir(TRAIN_UNLABELLED_INCLASS_DIR)

TRAIN_UNLABELLED_OUTCLASS_DIR = '/content/u_train_out/u_train_out/'
# os.listdir(TRAIN_UNLABELLED_OUTCLASS_DIR)

TEST_DIR = '/content/test/test/'
# os.listdir(TEST_DIR)

## Annotation Format

The dataset follows the annotation format of the COCO dataset. It is stored in the [JSON Format](https://www.json.org/json-en.html) and are organized as follows: 

```
{
  "info" : info,
  "images" : [image],
  "annotations" : [annotation],
}

info{
  "year" : int,
  "version" : str,
  "description" : str,
  "contributor" : str,
  "url" : str,
  "date_created" : datetime,
}

image{
  "id" : int,
  "width" : int,
  "height" : int,
  "file_name" : str
}

annotation{
  "id" : int,
  "image_id" : int,
  "category_id" : int
}

```



## Labelled training annotations

Showing the **annotations of labelled training images** from the annotation file [anno_l_train.json](/content/annotation/annotation/anno_l_train.json).

In [None]:
import json
import pandas as pd
from pandas import json_normalize

file = ANNOTATION_DIR + 'annotation/anno_l_train.json'

# load data using Python JSON module
with open(file,'r') as f:
    data = json.loads(f.read())
# Flatten data
annotations_labelled_training = pd.json_normalize(data, record_path =['annotations'])

annotations_labelled_training

Unnamed: 0,image_id,id,category_id
0,0,0,0
1,1,1,0
2,2,2,0
3,3,3,0
4,4,4,0
...,...,...,...
3954,3954,3954,199
3955,3955,3955,199
3956,3956,3956,199
3957,3957,3957,199


In [None]:
annotations_labelled_training.shape

(3959, 3)

In [None]:
annotations_labelled_training.columns

Index(['image_id', 'id', 'category_id'], dtype='object')

In [None]:
annotations_labelled_training.dtypes

image_id       int64
id             int64
category_id    int64
dtype: object

In [None]:
annotations_labelled_training['category_id'].value_counts()

23     43
13     42
5      37
73     36
26     36
       ..
197     7
181     7
193     6
199     6
185     5
Name: category_id, Length: 200, dtype: int64

In [None]:
annotations_labelled_training['category_id'].unique()

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 18

Showing the **images annotation of labelled training images** from the file [anno_l_train.json](/content/annotation/annotation/anno_l_train.json).

In [None]:
import json
import pandas as pd
from pandas import json_normalize

file = ANNOTATION_DIR + 'annotation/anno_l_train.json'

# load data using Python JSON module
with open(file,'r') as f:
    data = json.loads(f.read())
# Flatten data
images_annotations_labelled_training = pd.json_normalize(data, record_path =['images'])

images_annotations_labelled_training

Unnamed: 0,file_name,width,height,id
0,trainval_images/0/0.jpg,500,388,0
1,trainval_images/0/1.jpg,500,375,1
2,trainval_images/0/2.jpg,500,375,2
3,trainval_images/0/3.jpg,500,331,3
4,trainval_images/0/4.jpg,500,387,4
...,...,...,...,...
3954,trainval_images/199/1.jpg,500,375,3954
3955,trainval_images/199/2.jpg,500,333,3955
3956,trainval_images/199/3.jpg,500,375,3956
3957,trainval_images/199/4.jpg,500,375,3957


In [None]:
images_annotations_labelled_training['width'].unique()

array([500, 357, 441, 342, 288, 375, 497, 455, 382, 406, 373, 243, 467,
       442, 400, 281, 378, 369, 410, 448, 474, 335, 450, 473, 469, 446,
       479, 328, 488, 468, 475, 463, 454, 333, 429, 415, 374, 389, 263,
       356, 440, 408, 477, 355, 364, 472, 340, 496, 399, 344, 379, 390,
       282, 358, 499, 456, 324, 396, 484, 493, 443, 334, 470, 331, 426,
       478, 338, 397, 304, 341, 498, 492, 433, 195, 423, 462, 485, 360,
       405, 380, 343, 452, 354, 361, 377, 392, 432, 313, 403, 418, 420,
       368, 421, 487, 349, 482, 476, 461, 486, 327, 325, 402, 308, 430,
       376, 447, 367, 424, 427, 381, 247, 301, 458, 419, 453, 362, 238,
       428, 444, 416, 438, 437, 348, 459, 279, 312, 414, 481, 351, 274,
       411, 391, 466, 425, 449, 395, 394, 436, 300,  86,  70, 315, 107,
       332, 384, 413, 407, 337, 175, 359, 321, 363, 280, 252, 320, 490,
       258, 401, 417])

In [None]:
images_annotations_labelled_training['width'].value_counts()

500    3361
375     220
281      48
333      27
342      11
       ... 
343       1
405       1
462       1
423       1
417       1
Name: width, Length: 159, dtype: int64

In [None]:
images_annotations_labelled_training['height'].value_counts()

375    1148
333     831
500     643
281      88
334      73
       ... 
293       1
320       1
265       1
268       1
444       1
Name: height, Length: 259, dtype: int64

Concatenating DataFrames 

In [None]:
training_labelled = pd.concat([annotations_labelled_training , images_annotations_labelled_training.drop(['id'], axis = 1)], axis = 1)
training_labelled.head(50)

Unnamed: 0,image_id,id,category_id,file_name,width,height
0,0,0,0,trainval_images/0/0.jpg,500,388
1,1,1,0,trainval_images/0/1.jpg,500,375
2,2,2,0,trainval_images/0/2.jpg,500,375
3,3,3,0,trainval_images/0/3.jpg,500,331
4,4,4,0,trainval_images/0/4.jpg,500,387
5,5,5,0,trainval_images/0/5.jpg,500,410
6,6,6,0,trainval_images/0/6.jpg,500,361
7,7,7,0,trainval_images/0/7.jpg,500,347
8,8,8,0,trainval_images/0/8.jpg,500,333
9,9,9,0,trainval_images/0/9.jpg,500,379


## Labelled validation annotations

Showing the **annotation of labelled validation images** from the annotation file [anno_val.json](/content/annotation/annotation/anno_val.json).

In [None]:
import json
import pandas as pd
from pandas import json_normalize

file = ANNOTATION_DIR + 'annotation/anno_val.json'

# load data using Python JSON module
with open(file,'r') as f:
    data = json.loads(f.read())
# Flatten data
annotations_labelled_validation = pd.json_normalize(data, record_path =['annotations'])

annotations_labelled_validation

Unnamed: 0,image_id,id,category_id
0,0,0,0
1,1,1,0
2,2,2,0
3,3,3,0
4,4,4,0
...,...,...,...
1995,1995,1995,199
1996,1996,1996,199
1997,1997,1997,199
1998,1998,1998,199


Showing the **images annotation of labelled validation images** from the annotation file [anno_val.json](/content/annotation/annotation/anno_val.json).

In [None]:
import json
import pandas as pd
from pandas import json_normalize

file = ANNOTATION_DIR + 'annotation/anno_val.json'

# load data using Python JSON module
with open(file,'r') as f:
    data = json.loads(f.read())
# Flatten data
images_annotations_labelled_validation = pd.json_normalize(data, record_path =['images'])

images_annotations_labelled_validation

Unnamed: 0,file_name,width,height,id
0,trainval_images/0/30.jpg,500,278,0
1,trainval_images/0/31.jpg,500,333,1
2,trainval_images/0/32.jpg,375,500,2
3,trainval_images/0/33.jpg,500,375,3
4,trainval_images/0/34.jpg,500,375,4
...,...,...,...,...
1995,trainval_images/199/11.jpg,500,375,1995
1996,trainval_images/199/12.jpg,500,333,1996
1997,trainval_images/199/13.jpg,500,333,1997
1998,trainval_images/199/14.jpg,500,333,1998


Concatenating DataFrame

In [None]:
validation_labelled = pd.concat([annotations_labelled_validation , images_annotations_labelled_validation.drop(['id'], axis = 1)], axis = 1)
validation_labelled

Unnamed: 0,image_id,id,category_id,file_name,width,height
0,0,0,0,trainval_images/0/30.jpg,500,278
1,1,1,0,trainval_images/0/31.jpg,500,333
2,2,2,0,trainval_images/0/32.jpg,375,500
3,3,3,0,trainval_images/0/33.jpg,500,375
4,4,4,0,trainval_images/0/34.jpg,500,375
...,...,...,...,...,...,...
1995,1995,1995,199,trainval_images/199/11.jpg,500,375
1996,1996,1996,199,trainval_images/199/12.jpg,500,333
1997,1997,1997,199,trainval_images/199/13.jpg,500,333
1998,1998,1998,199,trainval_images/199/14.jpg,500,333


## Unlabelled training in class annotations

Showing the **annotation of unlabelled in class images** from the annotation file [annotation_u_train_in.json](/content/annotation/annotation/anno_u_train_in.json).

**NOTE -  Since the images are unlabelled, all the category id given to the image is -1** 

In [None]:
import json
import pandas as pd
from pandas import json_normalize

file = ANNOTATION_DIR + 'annotation/anno_u_train_in.json'

# load data using Python JSON module
with open(file,'r') as f:
    data = json.loads(f.read())
# Flatten data
annotations_unlabelled_inclass_training = pd.json_normalize(data, record_path =['annotations'])

annotations_unlabelled_inclass_training

Unnamed: 0,image_id,id,category_id
0,0,0,-1
1,1,1,-1
2,2,2,-1
3,3,3,-1
4,4,4,-1
...,...,...,...
26635,26635,26635,-1
26636,26636,26636,-1
26637,26637,26637,-1
26638,26638,26638,-1


Showing the **images annotation of unlabelled in class images** from the annotation file [annotation_u_train_in.json](/content/annotation/annotation/anno_u_train_in.json).

In [None]:
import json
import pandas as pd
from pandas import json_normalize

file = ANNOTATION_DIR + 'annotation/anno_u_train_in.json'

# load data using Python JSON module
with open(file,'r') as f:
    data = json.loads(f.read())
# Flatten data
images_annotations_unlabelled_inclass_training = pd.json_normalize(data, record_path =['images'])

images_annotations_unlabelled_inclass_training

Unnamed: 0,file_name,width,height,id
0,u_train_in/0.jpg,375,500,0
1,u_train_in/1.jpg,375,500,1
2,u_train_in/2.jpg,375,500,2
3,u_train_in/3.jpg,380,245,3
4,u_train_in/4.jpg,500,333,4
...,...,...,...,...
26635,u_train_in/26635.jpg,500,375,26635
26636,u_train_in/26636.jpg,500,281,26636
26637,u_train_in/26637.jpg,500,394,26637
26638,u_train_in/26638.jpg,500,333,26638


Concatenating DataFrame

In [None]:
training_unlabelled_inclass = pd.concat([annotations_unlabelled_inclass_training , images_annotations_unlabelled_inclass_training.drop(['id'], axis = 1)], axis = 1)
training_unlabelled_inclass

Unnamed: 0,image_id,id,category_id,file_name,width,height
0,0,0,-1,u_train_in/0.jpg,375,500
1,1,1,-1,u_train_in/1.jpg,375,500
2,2,2,-1,u_train_in/2.jpg,375,500
3,3,3,-1,u_train_in/3.jpg,380,245
4,4,4,-1,u_train_in/4.jpg,500,333
...,...,...,...,...,...,...
26635,26635,26635,-1,u_train_in/26635.jpg,500,375
26636,26636,26636,-1,u_train_in/26636.jpg,500,281
26637,26637,26637,-1,u_train_in/26637.jpg,500,394
26638,26638,26638,-1,u_train_in/26638.jpg,500,333


## Unlabelled training out of class annotations

Showing the **annotation of unlabelled out of class images** from the annotation file [annotation_u_train_out.json](/content/annotation/annotation/anno_u_train_out.json).

**NOTE -  Since the images are unlabelled, all the category id given to the image is -1** 

In [None]:
import json
import pandas as pd
from pandas import json_normalize

file = ANNOTATION_DIR + 'annotation/anno_u_train_out.json'

# load data using Python JSON module
with open(file,'r') as f:
    data = json.loads(f.read())
# Flatten data
annotations_unlabelled_outclass_training = pd.json_normalize(data, record_path =['annotations'])

annotations_unlabelled_outclass_training

Unnamed: 0,image_id,id,category_id
0,0,0,-1
1,1,1,-1
2,2,2,-1
3,3,3,-1
4,4,4,-1
...,...,...,...
122203,122203,122203,-1
122204,122204,122204,-1
122205,122205,122205,-1
122206,122206,122206,-1


Showing the **images annotation of unlabelled out of class images** from the annotation file [annotation_u_train_out.json](/content/annotation/annotation/anno_u_train_out.json).

In [None]:
import json
import pandas as pd
from pandas import json_normalize

file = ANNOTATION_DIR + 'annotation/anno_u_train_out.json'

# load data using Python JSON module
with open(file,'r') as f:
    data = json.loads(f.read())
# Flatten data
images_annotations_unlabelled_outclass_training = pd.json_normalize(data, record_path =['images'])

images_annotations_unlabelled_outclass_training

Unnamed: 0,file_name,width,height,id
0,u_train_out/0.jpg,500,377,0
1,u_train_out/1.jpg,500,333,1
2,u_train_out/2.jpg,500,331,2
3,u_train_out/3.jpg,500,333,3
4,u_train_out/4.jpg,375,500,4
...,...,...,...,...
122203,u_train_out/122203.jpg,333,500,122203
122204,u_train_out/122204.jpg,500,333,122204
122205,u_train_out/122205.jpg,500,337,122205
122206,u_train_out/122206.jpg,500,298,122206


Concatenating DataFrame

In [None]:
training_unlabelled_outclass = pd.concat([annotations_unlabelled_outclass_training , images_annotations_unlabelled_outclass_training.drop(['id'], axis = 1)], axis = 1)
training_unlabelled_outclass

Unnamed: 0,image_id,id,category_id,file_name,width,height
0,0,0,-1,u_train_out/0.jpg,500,377
1,1,1,-1,u_train_out/1.jpg,500,333
2,2,2,-1,u_train_out/2.jpg,500,331
3,3,3,-1,u_train_out/3.jpg,500,333
4,4,4,-1,u_train_out/4.jpg,375,500
...,...,...,...,...,...,...
122203,122203,122203,-1,u_train_out/122203.jpg,333,500
122204,122204,122204,-1,u_train_out/122204.jpg,500,333
122205,122205,122205,-1,u_train_out/122205.jpg,500,337
122206,122206,122206,-1,u_train_out/122206.jpg,500,298


## Test annotations

Showing the **images annotation of test images** from the annotation file [anno_test.json](/content/annotation/annotation/anno_test.json).

**NOTE - Since it is the test data, it has no annotations given in the annotations file, for we have to predict those.**

In [None]:
import json
import pandas as pd
from pandas import json_normalize

file = ANNOTATION_DIR + 'annotation/anno_test.json'

# load data using Python JSON module
with open(file,'r') as f:
    data = json.loads(f.read())
# Flatten data
images_annotations_test = pd.json_normalize(data, record_path =['images'])

images_annotations_test

Unnamed: 0,file_name,width,height,id
0,test/0.jpg,500,375,0
1,test/1.jpg,500,375,1
2,test/2.jpg,500,474,2
3,test/3.jpg,500,375,3
4,test/4.jpg,500,295,4
...,...,...,...,...
7995,test/7995.jpg,500,287,7995
7996,test/7996.jpg,500,333,7996
7997,test/7997.jpg,500,375,7997
7998,test/7998.jpg,500,333,7998


## Dataset Split into training and validation 

Splitting the [trainval_images](/content/trainval_images/trainval_images) dataset(containing both the training and validation images) into training and validation dataset according to the **file_name** column in the **training_labelled** and **validation_labelled** concatenated annotation dataframe.

### Training Split

Creating seperate directory for training dataset named **train**. Copying the training image files from [trainval_images](/content/trainval_images/trainval_images) and pasting to [train](/content/train/train) folder. 

In [None]:
!mkdir train
!mkdir train/train

In [None]:
import os
  
TRAIN_DIR = '/content/train/train/'
  
list = [  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,
       182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
       195, 196, 197, 198, 199]

list_string = [str(x) for x in list]
# list_string
  
for items in list_string:
    train_category_dirs = os.path.join(TRAIN_DIR, items)
    os.mkdir(train_category_dirs)

In [None]:
source_path = '/content/trainval_images/trainval_images/'
destination_path = '/content/train/train/'

In [None]:
import shutil

training = training_labelled['file_name'].str.replace(r'trainval_images/', '')
# training

for i, row in enumerate(training):
  filename = row
  source = os.path.join(source_path, filename) 
  destination = os.path.join(destination_path, filename)
  shutil.copy(source, destination)
  # print(destination)

### Validation Split

Creating seperate directory for validation dataset named **val**. Copying the validation image files from [trainval_images](/content/trainval_images/trainval_images) and pasting to [val](/content/val/val) folder. 

In [None]:
!mkdir val
!mkdir val/val

In [None]:
import os
  
VAL_DIR = '/content/val/val/'
  
list = [  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,
       182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
       195, 196, 197, 198, 199]

list_string = [str(x) for x in list]
# list_string
  
for items in list_string:
    val_category_dirs = os.path.join(VAL_DIR, items)
    os.mkdir(val_category_dirs)

In [None]:
source_path = '/content/trainval_images/trainval_images/'
destination_path = '/content/val/val/'

In [None]:
import shutil

validation = validation_labelled['file_name'].str.replace(r'trainval_images/', '')
# validation

for i, row in enumerate(validation):
  filename = row
  source = os.path.join(source_path, filename) 
  destination = os.path.join(destination_path, filename)
  shutil.copy(source, destination)
  # print(destination)

## Data Preprocessing and Visualization

[Keras ImageDataGenerator](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator)

In [None]:
# Import necessary libraries

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras import applications
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout
from tensorflow.keras.optimizers import Adam

In [None]:
# Making changes to training dataframe to pass it to "flow_from_dataframe" function

training_labelled['file_name'] = training_labelled['file_name'].str.replace(r'trainval_images/', '')
training_labelled['file_name'] = '/content/train/train/' + training_labelled['file_name'].str[:]
training_labelled['category_id'] = training_labelled['category_id'].map(str)
# training_labelled
training_dataframe = training_labelled.drop(['image_id', 'id', 'width', 'height'], axis = 1)
training_dataframe

Unnamed: 0,category_id,file_name
0,0,/content/train/train/0/0.jpg
1,0,/content/train/train/0/1.jpg
2,0,/content/train/train/0/2.jpg
3,0,/content/train/train/0/3.jpg
4,0,/content/train/train/0/4.jpg
...,...,...
3954,199,/content/train/train/199/1.jpg
3955,199,/content/train/train/199/2.jpg
3956,199,/content/train/train/199/3.jpg
3957,199,/content/train/train/199/4.jpg


In [None]:
# Making changes to validation dataframe to pass it to "flow_from_dataframe" function

validation_labelled['file_name'] = validation_labelled['file_name'].str.replace(r'trainval_images/', '')
validation_labelled['file_name'] = '/content/val/val/' + validation_labelled['file_name'].str[:]
validation_labelled['category_id'] = validation_labelled['category_id'].map(str)
validation_dataframe = validation_labelled.drop(['image_id', 'id', 'width', 'height'], axis = 1)
validation_dataframe

Unnamed: 0,category_id,file_name
0,0,/content/val/val/0/30.jpg
1,0,/content/val/val/0/31.jpg
2,0,/content/val/val/0/32.jpg
3,0,/content/val/val/0/33.jpg
4,0,/content/val/val/0/34.jpg
...,...,...
1995,199,/content/val/val/199/11.jpg
1996,199,/content/val/val/199/12.jpg
1997,199,/content/val/val/199/13.jpg
1998,199,/content/val/val/199/14.jpg


In [None]:
validation_dataframe.dtypes

category_id    object
file_name      object
dtype: object

In [None]:
train_path = '/content/train/train/'
val_path = '/content/val/val/'

In [None]:
# ImageDataGenerator

train_datagen = ImageDataGenerator(rescale=1./255,
                                zoom_range=0.2,
                                horizontal_flip=True)
validation_datagen = ImageDataGenerator(rescale=1./255)

In [None]:
# flow_from_dataframe

TARGET_SIZE = (256, 256)
BATCH_SIZE = 32
CLASS_MODE = 'categorical'  # for two classes; categorical for over 2 classes

# Connecting the ImageDataGenerator objects to our dataset
# train_generator = train_datagen.flow_from_directory(train_path,
                                                  
#                                                     target_size=TARGET_SIZE,
#                                                     batch_size=BATCH_SIZE,
#                                                     class_mode=CLASS_MODE)

# validation_generator = validation_datagen.flow_from_directory(val_path,
                                                       
#                                                               target_size=TARGET_SIZE,
#                                                               batch_size=BATCH_SIZE,
#                                                               class_mode=CLASS_MODE)

train_generator = train_datagen.flow_from_dataframe(
    training_dataframe,
    # directory = None,
    x_col = 'file_name',
    y_col = 'category_id',
    target_size = TARGET_SIZE,
    class_mode = CLASS_MODE,
    batch_size = BATCH_SIZE,
    shuffle = False
)

validation_generator = validation_datagen.flow_from_dataframe(
    validation_dataframe,
    # directory = None,
    x_col = 'file_name',
    y_col = 'category_id',
    target_size = TARGET_SIZE,
    class_mode = CLASS_MODE,
    batch_size = BATCH_SIZE,
    shuffle = False
)

Found 3959 validated image filenames belonging to 200 classes.
Found 2000 validated image filenames belonging to 200 classes.


In [None]:
train_generator.class_indices

{'0': 0,
 '1': 1,
 '10': 2,
 '100': 3,
 '101': 4,
 '102': 5,
 '103': 6,
 '104': 7,
 '105': 8,
 '106': 9,
 '107': 10,
 '108': 11,
 '109': 12,
 '11': 13,
 '110': 14,
 '111': 15,
 '112': 16,
 '113': 17,
 '114': 18,
 '115': 19,
 '116': 20,
 '117': 21,
 '118': 22,
 '119': 23,
 '12': 24,
 '120': 25,
 '121': 26,
 '122': 27,
 '123': 28,
 '124': 29,
 '125': 30,
 '126': 31,
 '127': 32,
 '128': 33,
 '129': 34,
 '13': 35,
 '130': 36,
 '131': 37,
 '132': 38,
 '133': 39,
 '134': 40,
 '135': 41,
 '136': 42,
 '137': 43,
 '138': 44,
 '139': 45,
 '14': 46,
 '140': 47,
 '141': 48,
 '142': 49,
 '143': 50,
 '144': 51,
 '145': 52,
 '146': 53,
 '147': 54,
 '148': 55,
 '149': 56,
 '15': 57,
 '150': 58,
 '151': 59,
 '152': 60,
 '153': 61,
 '154': 62,
 '155': 63,
 '156': 64,
 '157': 65,
 '158': 66,
 '159': 67,
 '16': 68,
 '160': 69,
 '161': 70,
 '162': 71,
 '163': 72,
 '164': 73,
 '165': 74,
 '166': 75,
 '167': 76,
 '168': 77,
 '169': 78,
 '17': 79,
 '170': 80,
 '171': 81,
 '172': 82,
 '173': 83,
 '174': 84,
 '

In [None]:
train_generator.image_shape

(256, 256, 3)

In [None]:
validation_generator.image_shape

(256, 256, 3)

In [None]:
# import tensorflow
# from tensorflow.keras.utils import to_categorical

# y_train = training_labelled['category_id']
# y_train = to_categorical(y_train) 
# print(y_train)

# y_val = validation_labelled['category_id']
# y_val = to_categorical(y_val) 
# print(y_val)

## Build Model

In [None]:
# base_model = tf.keras.applications.InceptionResNetV2(
#                      include_top = False,
#                      weights = 'imagenet',
#                      input_shape=(256, 256, 3)
#                      )
  
# base_model.trainable = False

# model = tf.keras.Sequential([
#                              base_model,
#                              tf.keras.layers.Conv2D(32, (5, 5), padding='same', activation='relu'),
#                              tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
#                              tf.keras.layers.Dropout(0.2),
#                              tf.keras.layers.Conv2D(64, (5, 5), padding='same', activation='relu'),
#                              tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
#                              tf.keras.layers.Dropout(0.2),
#                              tf.keras.layers.Flatten(),
#                              tf.keras.layers.Dense(128, activation='relu'),
#                              tf.keras.layers.Dropout(0.2),
#                              tf.keras.layers.Dense(200, activation='softmax')
# ])

model = Sequential()

model.add(Conv2D(32, (5,5), padding='same', activation='relu', input_shape=(256, 256, 3)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))

model.add(Conv2D(64, (5,5), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))

model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.2))

model.add(Dense(200, activation='softmax'))

model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_205 (Conv2D)         (None, 256, 256, 32)      2432      
                                                                 
 max_pooling2d_6 (MaxPooling  (None, 128, 128, 32)     0         
 2D)                                                             
                                                                 
 dropout_3 (Dropout)         (None, 128, 128, 32)      0         
                                                                 
 conv2d_206 (Conv2D)         (None, 128, 128, 64)      51264     
                                                                 
 max_pooling2d_7 (MaxPooling  (None, 64, 64, 64)       0         
 2D)                                                             
                                                                 
 dropout_4 (Dropout)         (None, 64, 64, 64)       

## Compile and Train the Model

In [None]:
model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
history = model.fit(
        train_generator,
        epochs = 10,
        validation_data = validation_generator,
        callbacks=[
          # Stopping our training if val_accuracy doesn't improve after 20 epochs
          tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=5),
    ]
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10


## Evaluation

## Testing