# Dogs vs Cat Redux

In this tutorial, you will learn how generate and submit predictions to a Kaggle competiton

[Dogs vs. Cats Redux: Kernels Edition](https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition)
    
    

To start you will need to download and unzip the competition data from Kaggle and ensure your directory structure looks like this
```
utils/
    vgg16.py
    utils.py
lesson1/
    redux.ipynb
    data/
        redux/
            train/
                cat.437.jpg
                dog.9924.jpg
                cat.1029.jpg
                dog.4374.jpg
            test/
                231.jpg
                325.jpg
                1235.jpg
                9923.jpg
```

You can download the data files from the competition page [here](https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition/data) or you can download them from the command line using the [Kaggle CLI](https://github.com/floydwch/kaggle-cli).

You should launch your notebook inside the lesson1 directory
```
cd lesson1
jupyter notebook
```

In [None]:
# install kaggle cli, config and download dataset
%pip install kaggle-cli
%kg config -g -u username -p password -c competition
# username is kaggle account username found in profile kaggle account
# password is login password
# competition is url competition name phrase
# go to data dir and download dataset
%kg download

In [None]:
# unzip and remove zip
%unzip -q train.zip
%rm train.zip
%unzip -q test.zip
%rm test.zip

In [19]:
#Verify we are in the lesson1 directory
%cd /Users/Natsume/Downloads/fast-ai-pt1/deeplearning1/nbs/

/Users/Natsume/Downloads/fast-ai-pt1/deeplearning1/nbs


In [20]:
#Create references to important directories we will use over and over
import os, sys
current_dir = os.getcwd()
data_path = '/Users/Natsume/Downloads/data_for_all/dogscats'
LESSON_HOME_DIR = current_dir
DATA_HOME_DIR = data_path  #current_dir+'/data/redux'

In [21]:
#Allow relative imports to directories above lesson1/
sys.path.insert(1, os.path.join(sys.path[0], '..'))

#import modules
from utils import *
from vgg16 import Vgg16

#Instantiate plotting tool
#In Jupyter notebooks, you will need to run this command before doing any plotting
%matplotlib inline

## Action Plan
1. Create Validation and Sample sets
2. Rearrange image files into their respective directories 
3. Finetune and Train model
4. Generate predictions
5. Validate predictions
6. Submit predictions to Kaggle

## Create validation set and sample

In [22]:
#Create directories for sample experiment, validation, saving results, test/unknown/
%cd $DATA_HOME_DIR
%mkdir valid
%mkdir results
%mkdir -p sample/train
%mkdir -p sample/test
%mkdir -p sample/valid
%mkdir -p sample/results
%mkdir -p test/unknown  

/Users/Natsume/Downloads/data_for_all/dogscats
mkdir: valid: File exists
mkdir: results: File exists


In [23]:
# randomly choose 2000 images moved from train folder to valid dir
# $DATA_HOME_DIR  is to access variable content
%cd $DATA_HOME_DIR/train  
g = glob('*.jpg')
shuf = np.random.permutation(g)
for i in range(2000): os.rename(shuf[i], DATA_HOME_DIR+'/valid/' + shuf[i])

/Users/Natsume/Downloads/data_for_all/dogscats/train


In [24]:
val_dir = DATA_HOME_DIR+'/valid/'
val_dir
%cd $val_dir
%ls | wc -l   # count number of files inside

train_dir = DATA_HOME_DIR+'/train/'
train_dir
%cd $train_dir
%ls | wc -l   # count number of files inside

/Users/Natsume/Downloads/data_for_all/dogscats/valid
    2000
/Users/Natsume/Downloads/data_for_all/dogscats/train
   21002


In [25]:
from shutil import copyfile

In [26]:
# copy 200 images from train/ to sample/train
%cd $DATA_HOME_DIR/train  
g = glob('*.jpg')
shuf = np.random.permutation(g)
for i in range(200): copyfile(shuf[i], DATA_HOME_DIR+'/sample/train/' + shuf[i])

/Users/Natsume/Downloads/data_for_all/dogscats/train


In [27]:
%cd $DATA_HOME_DIR/valid

/Users/Natsume/Downloads/data_for_all/dogscats/valid


In [28]:
# copy 50 images from valid/ to sample/valid
g = glob('*.jpg')
shuf = np.random.permutation(g)
for i in range(50): copyfile(shuf[i], DATA_HOME_DIR+'/sample/valid/' + shuf[i])

In [29]:
val_dir = DATA_HOME_DIR+'/valid/'
val_dir
%cd $val_dir
%ls | wc -l   # count number of files inside

sample_val_dir = DATA_HOME_DIR+'/sample/valid/'
sample_val_dir
%cd $sample_val_dir
%ls | wc -l   # count number of files inside

/Users/Natsume/Downloads/data_for_all/dogscats/valid
    2000
/Users/Natsume/Downloads/data_for_all/dogscats/sample/valid
      50


## Rearrange image files into their respective directories

In [30]:
#Divide cat/dog images into separate directories

%cd $DATA_HOME_DIR/sample/train
%mkdir cats
%mkdir dogs
%mv cat.*.jpg cats/
%mv dog.*.jpg dogs/

%cd $DATA_HOME_DIR/sample/valid
%mkdir cats
%mkdir dogs
%mv cat.*.jpg cats/
%mv dog.*.jpg dogs/

%cd $DATA_HOME_DIR/valid
%mkdir cats
%mkdir dogs
%mv cat.*.jpg cats/
%mv dog.*.jpg dogs/

%cd $DATA_HOME_DIR/train
%mkdir cats
%mkdir dogs
%mv cat.*.jpg cats/
%mv dog.*.jpg dogs/

/Users/Natsume/Downloads/data_for_all/dogscats/sample/train
/Users/Natsume/Downloads/data_for_all/dogscats/sample/valid
/Users/Natsume/Downloads/data_for_all/dogscats/valid
/Users/Natsume/Downloads/data_for_all/dogscats/train
mkdir: cats: File exists
mkdir: dogs: File exists


In [31]:
# move all images inside test dir into test/unknown/
%cd $DATA_HOME_DIR/test
%mv *.jpg unknown/

/Users/Natsume/Downloads/data_for_all/dogscats/test
mv: rename *.jpg to unknown/*.jpg: No such file or directory
