# iWildCam 2019 - FGVC6
### Categorize animals in the wild

> **Work done by**: Nwachukwu Anthony  
> **Email**: nwachukwuanthony2015@gmail.com  
> **Inspired by**: *Fastai online courses on Deep Learning*  
> **Data from kaggle** competition, link below

Camera Traps (or Wild Cams) enable the automatic collection of large quantities of image data. Biologists all over the world use camera traps to monitor biodiversity and population density of animal species. We have recently been making strides towards automating the species classification challenge in camera traps, but as we try to expand the scope of these models from specific regions where we have collected training data to nearby areas we are faced with an interesting probem: how do you classify a species in a new region that you may not have seen in previous training data?
In order to tackle this problem, we have prepared a challenge where the training data and test data are from different regions, namely The American Southwest and the American Northwest. The species seen in each region overlap, but are not identical, and the challenge is to classify the test species correctly. To this end, we will allow training on our American Southwest data (from CaltechCameraTraps), on iNaturalist 2017/2018 data, and on simulated data generated from Microsoft AirSim. We have provided a taxonomy file mapping our classes into the iNat taxonomy.
This is an FGVCx competition as part of the FGVC6 workshop at CVPR 2019, and is sponsored by Microsoft AI for Earth. There is a github page for the competition here.

You will find the dataset on this website: https://www.kaggle.com/c/iwildcam-2019-fgvc6/data

### Import Libraries

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import numpy as np 
import pandas as pd
from fastai.vision import *
import os
print(os.listdir("../input"))

### Set the paths and Prepare the data

In [None]:
# Read the train csv.
df = pd.read_csv('../input/train.csv')

In [None]:
# Since kaggle doesn't allow write on the iput directory, we create a new directory outside it where
# we can freely work and make it the path
path = Path("../working")
path

In [None]:
# We create a new csv file for the train dataset. We save it working folder and name it "trainmodified.csv".
# It contains the id and target labels of the train dataset
df=df[['id', 'category_id']]
sizeOfData = 3000 #Since the dataset is much for the RAM, we limit the size for each category to "sizeOfData"
clases = list(set(df['category_id'].tolist()))
df_row=df.loc[df['category_id'] == clases[0]][0:sizeOfData]
for i in clases[1:]:
    df1=df.loc[df['category_id'] == i][0:sizeOfData]
    df_row = pd.concat([df_row, df1])
df_row.to_csv(r'../working/trainmodified.csv', index = None, header=True)
print(os.listdir("../working/"))

In [None]:
# Set the parameters and create the data for the model
np.random.seed(42) #makes sure you get same results each time you run the code
src = (ImageList.from_csv('../', 'working/trainmodified.csv', folder='input/train_images', suffix='.jpg')
       .split_by_rand_pct(0.2)
       .label_from_df(label_delim=' '))
tfms = get_transforms()
data = (src.transform(tfms, size=128)
        .databunch().normalize(imagenet_stats))

### Visualize the Data

In [None]:
print('Data Classes:', data.classes)
print('Length of Train set: '+str(len(data.train_ds))+', Length of Validation set: '+str(len(data.valid_ds)))
data.show_batch(rows=3, figsize=(7,8)) #View portion of dataset

### Tain

In [None]:
#Set the metrics. Use F-score
acc_02 = partial(accuracy_thresh, thresh=0.2)
f_score = partial(fbeta, thresh=0.2)
#Use CNN (Convolutional Neural Network) and pretrained model (resnet50)  to train
learn = cnn_learner(data, models.resnet50, metrics=[acc_02,f_score])

In [None]:
#Find and plot learning rate
learn.lr_find()
learn.recorder.plot()

In [None]:
#set learning rate
lr = 0.01

In [None]:
#Fit the model
learn.fit_one_cycle(5,slice(lr))

In [None]:
# Save it
learn.save('stage-1-rn50')

In [None]:
####learn.load('stage-1-rn50');

### More training

In [None]:
# Unfreeze the model, that is, traing afresh without the pretrained model
learn.unfreeze()

In [None]:
# Find and plot the learning rate
learn.lr_find()
learn.recorder.plot()

In [None]:
# Fit the model
learn.fit_one_cycle(10, slice(1e-5, lr/5))

In [None]:
# Save this latest trained model
learn.save('stage-2-rn50')

### Export the Model

In [None]:
learn.export()
print(os.listdir("../working"))
print(os.listdir("../input"))
print(os.listdir("../"))

### Test the Model

In [None]:
test = ImageList.from_folder('../input/test_images')
len(test)
learn = load_learner('../', test=test)

In [None]:
# Find the prediction
preds,_ = learn.get_preds(ds_type=DatasetType.Test)
labelled_preds = [learn.data.classes[(pred).tolist().index(max((pred).tolist()))] for pred in preds]
#Althernatively, you can replace line two with these two lines of code below
#labels = np.argmax(preds, 1)
#labelled_preds = [data.classes[int(x)] for x in labels]
print(labelled_preds)

In [None]:
# Save the predicted results to the working path as (submission.csv)
fnames = [f.name[:-4] for f in learn.data.test_ds.items]
tes = OrderedDict([('Id',fnames), ('Predicted', labelled_preds)] )
df = pd.DataFrame.from_dict(tes)
df.to_csv(path/'submission.csv', index=False)
print(os.listdir("../working"))

In [None]:
dfsubmit = pd.read_csv('../working/submission.csv')
dfsubmit.head()

Thank you