<a href="https://colab.research.google.com/github/Muosvr/fastai_presentation/blob/master/Fastai_v1_download.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Image Classification Using Fast.ai v1_halloween edition

Let's find out whether that house in the picture is haunted or not

# Using Google Colab

You can create a cell by pressing the code button on top, you can also add text cells as notes thus it’s called a notebook as you can see here. The shortcut for creating a new code cell is ctrl + m and then a for creating one above the current cell, or b for below.

 And to run a cell, you can either hover your mouse on the bracket at the beginning of each cell and click the triangle play button when it shows up, or you can use the shortcut shift + enter (or ctrl + enter if you don't want focus to jump to the next cell)


# Getting Image data

Let's get some images! The javascript code below helps you scrape the urls from Google Image into a file. Just copy into browser counsole and run the code once you have searched for the images and it should generate a file for download


Javascript for scraping from Google Image in browser
```
urls = Array.from(document.querySelectorAll('.rg_di .rg_meta')).map(el=>JSON.parse(el.textContent).ou);
window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));
```



We will first install Pytorch and than the fastai library. Fastai is built ontop of Pytorch which is itself a deep learning library that allows for  matrix computation on the GPU that is necessary to accelerate deep learning training

In [0]:
!pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu92/torch_nightly.html

In [0]:
!pip install fastai

Here we will import the necessary library for the notebook to use

In [0]:
from fastai import *
from fastai.vision import *
import os

The code below allows yoou to upload files from your computer to this notebook. It is copied from sample code snipets from Google colab. You can find it by searching "upload files" on the side bar

In [0]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Google colab has some basic bash comman build in such as ls, mkdir, and rm. For more advanced commands you can access everything you can do in a terminal by prepending a "!" to the codes. That was how we installed all the packages ealier.

In [0]:
ls

Now let's create an array of classes and use that to create two folders and downloard about 200 images into each category

In [0]:
classes = ['haunted_house', 'house']

In [0]:
mkdir data
path = Path('data') #creating a path object. Path is unqiue class to fastai, it makes it easier to manipulate file paths. 

Whenever you see a function you don't understand, put ? in front of it to see a basic explaination and ?? to see a more thorough explaination as well as link to the source code. You can also do doc(something_you_don't_understand) to look at the documentation

In [2]:
??Path

Object `Path` not found.


In [0]:
#create folders and download images from urls
for name in classes:
  url_file = name + '.txt'
  dest = 'data/' + name
  if not os.path.exists(dest):
    os.makedirs(dest)
  download_images(url_file, dest, max_pics=200)

Note the "!" make turns the next three cells into bash commands

In [0]:
!ls data

In [0]:
!cd data/haunted_house; ls | wc -l #another bash command. This counts how many files there are in the directory
#note the ";" in above code separate out two lines of code so they can all be written in one line following the "!"

In [0]:
!cd data/house; ls | wc -l #same as above, for a different directory

In [0]:
#download test set
download_images('haunted_or_not_test.txt', 'test')

Next we need to varify each image to make sure they are not corrupted so they don't interrupt training

In [0]:
#verify training image data
for name in file_names:
  class_name = name[:-4]
  print(class_name)
  verify_images(path/class_name, delete=True, max_workers=8)

In [0]:
#verify testing image data
verify_images('test', delete=True)

# View data

Now let's look at the images. The ImageDataBunch funccton below creates a data object for training. For now you just need to understand that you need to pass it the a path to the data, tell it where the training data is (In this case, it's "." for the root of the path). And the valid_pct tells the program what percentage of data is used to validate the results during training. This allows you to visualize how well the model is being trained in the process. For a small set of data this is usually set to 20%

In [0]:
np.random.seed(42)
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,
                                  ds_tfms=get_transforms(), size=224, num_workers=0).normalize(imagenet_stats)

In [0]:
data.classes #the data object allows you to see its properties like the catorgories it contrain

In [0]:
data.show_batch(rows=3, figsize=(7,8))

In [0]:
#more properties from the data object: 
data.c, len(data.train_ds), len(data.valid_ds) #number of classes, length of training set, and length of validation set

This is a good place to talk about batch size. It is the number of images that are passed on to the GPU to be processed at a time. Because the GPU has so many cores, it can process more than one image at a time, and in this case we give it 64 images at once so it will save a lot of training time. This number is dependent on the memory size of the GPU, normally you can try the even multiple of 32 such as 32, 64, 128, 256

In [0]:
bs = 64 #batch size

# Training

Now onto the fun part! The two lines below creates a nerual net structure and trains it for 4 cyce. Each cycle contains one epoch, which means one pass of the entire training set. Here "create_cnn" creates a type of model called the convolutional neral net(CNN), which is great for image recognition. First we pass it the data object we created previously, then the archetecture for the model, which in this case is a resnet34, and finally the metrics denotes how we want to monitor the model during training. 

Here Resnet34 is a CNN architecture create by Microsoft. You can also try other architecture such as Resnet50 which is deeper more complex version. This is where the magic of fastai lies, as it lets you use the models that are developed by cutting edge research. If you are interested you can read the original paper [here](https://arxiv.org/pdf/1512.03385.pdf).

In [0]:
learn = create_cnn(data, models.resnet34, metrics=error_rate)

In [0]:
learn.fit_one_cycle(4)

In [0]:
learn.save('stage-1') #This will save our model to file for later use

#Examining the model

Now that we have trained a model, let's see how it is doing. The first two code cells below creates an object for us to analyze the results than show us the most wrong predictions the model made during training, meaning that when model is confident but wrong. This help us understand where the model is failing so we can potentially improve it. As you can see, because we did not spend a lot of time cleaning the data we scrap from google, sometimes the model is wrong sinply because the image is really hard to tell or that it is labeled incorrectly in the first place.

In [0]:
interp = ClassificationInterpretation.from_learner(learn) #create an object to analyze result from the learn object we created earlier

In [0]:
interp.plot_top_losses(9, figsize=(15,11)) #show photos that the nerual net is most wrong about

In [0]:
data.classes

Now it's time to put out model to the test and see if it indeed can do its job recognizing a haunted house. The code below picks an image in random from the test set we create earlier and try to predict if it is a haunted house or not.

In [0]:
import glob
filenames = glob.glob('test/*') #We use glob, a python library, to help find the names of every image in the test folder and return them in an array

In [0]:
random_file = random.choice(filenames) #chose a random file from the filenames array
img = open_image(random_file) #opens the image
img

In [0]:
pre_class, pred_idx, outputs = learn.predict(img) #make prediction.
pre_class