### References :    
1. [ Downloading Datasets into Google Drive via Google Colab ](https://towardsdatascience.com/downloading-datasets-into-google-drive-via-google-colab-bcb1b30b0166)    
2. [Run Kaggle Kernel on Google Colab](https://medium.com/@erdemalpkaya/run-kaggle-kernel-on-google-colab-1a71803460a9)  



# Step 1: Setting the Pre-requirement for Colab

## 1-1 Acquire Google Drive Authorization

In [1]:
from google.colab import drive
drive.mount( '/content/gdrive' )

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive


# Step 2: Download Kaggle Dataset 

## 2-1 Upload Kaggle API ( for downloading the Kaggle dataset )

In [2]:
from google.colab import files
uploaded = files.upload()

Saving kaggle.json to kaggle.json


## 2-2 Set up Kaggle Package ( for using Kaggle.json )

In [3]:
!pip install -q kaggle
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!ls ~/.kaggle
# we need to set permissions 
!chmod 600 /root/.kaggle/kaggle.json

kaggle.json


## 2-3 Download Kaggle Dataset 

In [0]:
# Download Data from Kaggle
# data webpage in Kaggle --> copy API

!kaggle competitions download -c ml-marathon-final

Downloading sample_submission.csv to /content/gdrive/My Drive/ML_100_Marathon_Final_Exam
  0% 0.00/9.12k [00:00<?, ?B/s]
100% 9.12k/9.12k [00:00<00:00, 1.23MB/s]
Downloading data.zip to /content/gdrive/My Drive/ML_100_Marathon_Final_Exam
 98% 77.0M/78.3M [00:00<00:00, 95.0MB/s]
100% 78.3M/78.3M [00:00<00:00, 101MB/s] 


In [0]:
# 解壓縮 data.zip
'''   
檔案解壓縮至資料夾 語法：
!unzip -q 檔案名稱.zip -d 資料夾名稱   
'''

!unzip -q data.zip   # unzip data.zip

# Step 3: Prepare Traning and Validation Data

In [0]:
os.chdir( '/content/gdrive/My Drive/ML_100_Marathon_Final_Exam/' )

In [0]:
# 複製圖片至同一資料夾 Train_dogs_and_cats 
import os
os.chdir( '/content/gdrive/My Drive/ML_100_Marathon_Final_Exam/kaggle_dogcat/train' )

# create a new folder in  ..../kaggle_dogcat/train
!mkdir -p Train_dogs_and_cats

# copy data from train/dogs and train/cats to Train_dogs_and_cats
os.chdir( '/content/gdrive/My Drive/ML_100_Marathon_Final_Exam/kaggle_dogcat/train/dogs' )
!cp dog.?.jpg dog.??.jpg dog.???.jpg dog.????.jpg dog.?????.jpg ../Train_dogs_and_cats
os.chdir( '/content/gdrive/My Drive/ML_100_Marathon_Final_Exam/kaggle_dogcat/train/cats' )
!cp cat.?.jpg cat.??.jpg cat.???.jpg cat.????.jpg cat.?????.jpg ../Train_dogs_and_cats

In [0]:
''' 
移動集資料夾 Train_dogs_and_cats
'''

# Step 1: 原本資料夾的路徑
oldpos_test = os.path.abspath( '/content/gdrive/My Drive/ML_100_Marathon_Final_Exam/kaggle_dogcat/train/Train_dogs_and_cats' )

# Step 2: 欲將資料夾搬移到的目錄
newpos_test = os.path.abspath( '/content/gdrive/My Drive/ML_100_Marathon_Final_Exam/' )

# Step 3: 搬移資料夾
shutil.move( oldpos_test, newpos_test )

In [0]:
# 建立目錄結構
os.chdir( '/content/gdrive/My Drive/ML_100_Marathon_Final_Exam/' )

!mkdir -p Sample/Train/Class_1_Cats
!mkdir -p Sample/Train/Class_0_Dogs
!mkdir -p Sample/Valid/Class_1_Cats
!mkdir -p Sample/Valid/Class_0_Dogs

In [0]:
# 複製圖片區分訓練集與驗證集
os.chdir( '/content/gdrive/My Drive/ML_100_Marathon_Final_Exam/Train_dogs_and_cats' )

# Training Data
!cp cat.?.jpg cat.??.jpg cat.???.jpg cat.????.jpg ../Sample/Train/Class_1_Cats/
!cp dog.?.jpg dog.??.jpg dog.???.jpg dog.????.jpg ../Sample/Train/Class_0_Dogs/

# Testing Data
!cp cat.?????.jpg ../Sample/Valid/Class_1_Cats/
!cp dog.?????.jpg ../Sample/Valid/Class_0_Dogs/

# Step 4: Training Data ( with Keras )  
  
  Reference :  
  [Keras 以 ResNet-50 預訓練模型建立狗與貓判識程式](https://blog.gtwang.org/programming/keras-resnet-50-pre-trained-model-build-dogs-cats-image-classification-system/)    

In [10]:
import sys
import pandas as pd
import numpy as np
import os

from tensorflow.python.keras import backend as K
from tensorflow.python.keras.models import Model, load_model
from tensorflow.python.keras.layers import Flatten, Dense, Dropout
from tensorflow.python.keras.applications.resnet50 import ResNet50
from tensorflow.python.keras.optimizers import Adam

from tensorflow.python.keras.preprocessing import image
from tensorflow.python.keras.preprocessing.image import ImageDataGenerator  

%pylab inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

Populating the interactive namespace from numpy and matplotlib


In [6]:
# data path (where your data is)
DATASET_PATH  = '/content/gdrive/My Drive/ML_100_Marathon_Final_Exam/Sample'

# model settings
IMAGE_SIZE = ( 224, 224 )
NUM_CLASSES = 2
BATCH_SIZE = 8
FREEZE_LAYERS = 2
NUM_EPOCHS = 40

# your output model
WEIGHTS_FINAL = 'model_resnet50_v2.h5'

# data augmentation 
train_datagen = ImageDataGenerator( rotation_range = 40,
                                    width_shift_range = 0.2,
                                    height_shift_range = 0.2,
                                    shear_range = 0.2,
                                    zoom_range = 0.2,
                                    channel_shift_range = 10,
                                    horizontal_flip = True,
                                    fill_mode = 'nearest'
                                    )

train_batches = train_datagen.flow_from_directory( DATASET_PATH + '/Train',
                                                   target_size = IMAGE_SIZE,
                                                   interpolation = 'bicubic',
                                                   class_mode = 'categorical',
                                                   shuffle = True,
                                                   batch_size = BATCH_SIZE
                                                   )

valid_datagen = ImageDataGenerator( )
valid_batches = valid_datagen.flow_from_directory( DATASET_PATH + '/Valid',
                                                   target_size = IMAGE_SIZE,
                                                   interpolation = 'bicubic',
                                                   class_mode = 'categorical',
                                                   shuffle = False,
                                                   batch_size = BATCH_SIZE
                                                   )

# print classes
for cls, idx in train_batches.class_indices.items( ):
    print( 'Class #{} = {}'.format( idx, cls ) )

# utilized pre-trained ResNet50 as model
# discard ResNet50 fully connected layers
net = ResNet50( include_top = False, 
                weights = 'imagenet', 
                input_tensor = None,
                input_shape = ( IMAGE_SIZE[0], IMAGE_SIZE[1], 3 ) 
                )

x = net.output
x = Flatten( )(x)

# add DropOut layer
x = Dropout( rate = 0.25 )(x)

# add Dense layer with softmax
output_layer = Dense( NUM_CLASSES, activation = 'softmax', name = 'softmax' )(x)


net_final = Model( inputs = net.input, outputs = output_layer )
for layer in net_final.layers[ :FREEZE_LAYERS ]:
    layer.trainable = False
for layer in net_final.layers[ FREEZE_LAYERS: ]:
    layer.trainable = True


# Adam optimizer with low learning rate to operate fine-tuning
net_final.compile( optimizer = Adam( lr = 1e-5 ),
                   loss = 'categorical_crossentropy', 
                   metrics=['accuracy']
                   )
    
  
# print the model
print( net_final.summary( ) )


# training
net_final.fit_generator( train_batches,
                         steps_per_epoch = train_batches.samples // BATCH_SIZE,
                         validation_data = valid_batches,
                         validation_steps = valid_batches.samples // BATCH_SIZE,
                         epochs = NUM_EPOCHS
                         )

# save our model
net_final.save( WEIGHTS_FINAL )

Found 3629 images belonging to 2 classes.
Found 371 images belonging to 2 classes.
Class #0 = Class_0_Dogs
Class #1 = Class_1_Cats




Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5


W0816 05:06:23.835550 140225774573440 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D)       (None, 230, 230, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 112, 112, 64) 9472        conv1_pad[0][0]                  
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 112, 112, 64) 256         conv1[0][0]                      
______________________________________________________________________________________________

In [0]:
# from tensorflow.python.keras import backend as K
# from tensorflow.python.keras.models import Model
# from tensorflow.python.keras.layers import Flatten, Dense, Dropout
# from tensorflow.python.keras.applications.resnet50 import ResNet50
# from tensorflow.python.keras.optimizers import Adam
# from tensorflow.python.keras.preprocessing.image import ImageDataGenerator

### Locate the folder
os.chdir( '/content/gdrive/My Drive/ML_100_Marathon_Final_Exam' )
!pwd

/content/gdrive/My Drive/ML_100_Marathon_Final_Exam


# Step 5: Predict ( using the model we have trained )

## Predict all the pictures ( loop through all the images in the folder )

In [11]:
# load trained model
os.chdir( '/content/gdrive/My Drive/ML_100_Marathon_Final_Exam/Train_dogs_and_cats/' )
net = load_model( 'model_resnet50_v2.h5' )

# dogs = 0; cats = 1
cls_list = [ 'dogs', 'cats' ] 

# create an empty dataframe for submit
Submission = pd.DataFrame( )
Submission[ 'ID' ] = [ 'Prediction' ]

# predict all photos (loop though the folder)
directory = os.fsencode( '/content/gdrive/My Drive/ML_100_Marathon_Final_Exam/test/' )
for f in os.listdir( directory ):
    f = os.fsdecode( f )
    img = image.load_img( '/content/gdrive/My Drive/ML_100_Marathon_Final_Exam/test/' + str(f), target_size = ( 224, 224 ) )
    if img is None :
        continue
    x = image.img_to_array( img )
    x = np.expand_dims( x, axis = 0 )
    pred = net.predict(x)[0]
    
    # collect all predicted results 
    Submission[ f[0:3] ] = [ pred[1] ]
    
    top_inds = pred.argsort()[::-1][:5]
    print( f )
    for i in top_inds:
        print('    {:.3f}  {}'.format( pred[i], cls_list[i] ) )

001.jpg
    1.000  dogs
    0.000  cats
000.jpg
    1.000  cats
    0.000  dogs
002.jpg
    1.000  cats
    0.000  dogs
220.jpg
    1.000  dogs
    0.000  cats
158.jpg
    1.000  dogs
    0.000  cats
205.jpg
    0.926  dogs
    0.074  cats
280.jpg
    1.000  cats
    0.000  dogs
290.jpg
    1.000  cats
    0.000  dogs
021.jpg
    1.000  cats
    0.000  dogs
335.jpg
    1.000  cats
    0.000  dogs
297.jpg
    1.000  cats
    0.000  dogs
145.jpg
    1.000  dogs
    0.000  cats
241.jpg
    0.762  dogs
    0.238  cats
256.jpg
    1.000  cats
    0.000  dogs
371.jpg
    1.000  dogs
    0.000  cats
224.jpg
    1.000  cats
    0.000  dogs
272.jpg
    1.000  dogs
    0.000  cats
242.jpg
    0.961  dogs
    0.039  cats
148.jpg
    1.000  dogs
    0.000  cats
084.jpg
    1.000  dogs
    0.000  cats
300.jpg
    1.000  cats
    0.000  dogs
155.jpg
    1.000  cats
    0.000  dogs
265.jpg
    1.000  dogs
    0.000  cats
332.jpg
    1.000  cats
    0.000  dogs
116.jpg
    1.000  cats
    0.000  dogs


In [0]:
import copy
Submit_Data = copy.deepcopy( Submission )

Submit_Data.drop( columns = [ 'ID' ], inplace = True )
Submit_Data = Submission.loc[ 0, : ].to_frame( )
Submit_Data.reset_index( inplace = True )
Submit_Data.columns = [ 'ID', 'Predicted' ]
Submit_Data.sort_values( by = 'ID', inplace = True )

# delete the last row of dataframe, and then reset index
Submit_Data = Submit_Data[ : -1 ].set_index( ['ID'] ).reset_index( ) 

In [13]:
print( 'The first 5 prdictions : ' '\n' )
print( Submit_Data.head( ) )

print( '\n' '======================' '\n' )

print( 'The last 5 prdictions : ' '\n' )
print( Submit_Data.tail( ) )

The first 5 prdictions : 

    ID    Predicted
0  000            1
1  001  2.95375e-07
2  002            1
3  003  1.87182e-11
4  004            1


The last 5 prdictions : 

      ID   Predicted
395  395  7.0236e-11
396  396    0.311685
397  397           1
398  398           1
399  399           1


# Step 6: Download the Submission from Colab

In [0]:
from google.colab import files

Submit_Data.to_csv( 'Submission_v2.csv', header = True, index = False, encoding = 'utf-8' )
files.download( 'Submission_v2.csv' )