# First Kaggle Competition
> Explaining how I navigated a Kaggle

- toc: true 
- badges: false
- comments: true
- badges: false
- categories: [kaggle]

First I wanted to find the most basic image recognition competition to get a feel of how Kaggle works - I landed on [Aerial Cactus Identification](https://www.kaggle.com/c/aerial-cactus-identification).

I downloaded all the data which consisted of a test/train folders which had all the images and a train.csv which held all the labels to the train folder images. They also included a sample_submission.csv which is the format they would like to submission to be in. Easy.

As I am using the pre-release of fastai2 so my first step is download it and import the vision library

In [None]:
!pip install git+https://github.com/fastai/fastai2
from fastai2.vision.all import *

I create a folder and unload all the files. 

Then I create a path to the image files I want to train on.

In [None]:
!mkdir kaggle-cactus
path = Path('../kaggle-cactus/')
fnames = get_image_files(path/"train")

I tell fastai what type of DataBlock we will be using

In [None]:
dblock = DataBlock(get_items = get_image_files)

I then direct the datablock to look at my train folder

I do a sanity check to see if its pulling correctly

In [None]:
dsets = dblock.datasets(fnames)
dsets.train[0]

pass the dataset to a dataloader. A dataloader sets all the preferences up.

In [None]:
data = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 #how it should split the train
                 splitter=RandomSplitter(),
                 #file names in first column
                 get_x=ColReader(0, pref=path/"train"),
                 #labels in second column
                 get_y=ColReader(1),
                 #add some image augmentation
                 batch_tfms=(Warp(), Zoom(), Rotate()))

load the csv with all the labels

In [None]:
import pandas as pd
df = pd.read_csv('../kaggle-cactus/labels.csv')
dls = data.dataloaders(df)

sanity check to see if the images are being labeled correctly

In [None]:
dls.show_batch(nrows=1, ncols=5)

put it all together into a learner

In [None]:
learn = cnn_learner(dls, resnet34, metrics=error_rate)

In [None]:
learn.fine_tune(5)

In [None]:
learn.show_results()

lets apply the model to the test

In [None]:
test_dl = dls.test_dl(get_image_files(path/"test"))

pass it to the learner and get results (preds)

In [None]:
preds,_ = learn.get_preds(dl=test_dl)
#only grab the first column and turn into a numpy array
preds.numpy()[:, 0]

lets read the sample submission to fill in our data

In [None]:
test_df = pd.read_csv('sample_submission.csv')

fill in the has_cactus column

In [None]:
test_df.has_cactus = preds.numpy()[:, 0]

export

In [None]:
test_df.to_csv('submission.csv', index=False)