## Bronthosaurus or T-Rex?

In [None]:
import os
iskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')

# It is NOT a good practice to install the latest version of a package (I will explain why, in details, in a blog post)
# so we are installing specific versions of these packages (and you are guaranteed this code will keep working even if the library introduces breaking changes)
if iskaggle:
    !pip install -U fastai==2.7.12
    !pip install -U duckduckgo_search==3.8.4
    !pip install -U fastbook==0.0.29

The idea is to train a model so that it can recognise if the given picture is a Bronthosaurus or a T-Rex.

These are the steps we will do:

1. Use DuckDuckGo to search for images of "bronthosaurus"
1. Use DuckDuckGo to search for images of "t-rex"
1. Fine-tune a pretrained neural network to recognise these two groups
1. Try running this model on a picture of a bronthosaurus and a t-rex, separately, and see if it works.

## Step 1: Download images of bronthosaurus and t-rex

In [None]:
from fastbook import *
from fastai.vision.widgets import *

def search_images(term, max_images=50):
    print(f"Searching for '{term}'")
    return L(search_images_ddg(term, max_images=max_images))

Let's start by searching for a bronthosaurus photo and seeing what kind of result we get. We'll start by getting URLs from a search:

In [None]:
#NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
#    If you get a JSON error, just try running it again (it may take a couple of tries).
urls = search_images('brontosaurus', max_images=1)
urls[0]

...and then download a URL and take a look at it:

In [None]:
from fastdownload import download_url
dest = 'bronthosaurus.jpg'
download_url(urls[0], dest, show_progress=False)

from fastai.vision.all import *
im = Image.open(dest)
im.to_thumb(256,256)

Now let's do the same with "t-rex":

In [None]:
download_url(search_images('t-rex', max_images=1)[0], 't-rex.jpg', show_progress=False)
Image.open('t-rex.jpg').to_thumb(256,256)

Our searches seem to be giving reasonable results, so let's grab a few examples of each of "bronthosaurus" and "t-rex" photos, and save each group of photos to a different folder

In [None]:
searches = ('bronthosaurus','t-rex')
path = Path('bronto_or_trex')

for o in searches:
    dest = (path/o)
    dest.mkdir(exist_ok=True, parents=True)
    download_images(dest, urls=search_images(f'{o} photo'))
    resize_images(path/o, max_size=400, dest=path/o)

## Step 2: Train our model

Some photos might not download correctly which could cause our model training to fail, so we'll remove them:

In [None]:
failed = verify_images(get_image_files(path))
failed.map(Path.unlink)
len(failed)

To train a model, we'll need `DataLoaders`, which is an object that contains a *training set* (the images used to create a model) and a *validation set* (the images used to check the accuracy of a model -- not used during training). In `fastai` we can create that easily using a `DataBlock`, and view sample images from it:

In [None]:
dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=[Resize(192, method='squish')]
).dataloaders(path, bs=32)

dls.show_batch(max_n=6)

Here what each of the `DataBlock` parameters means:

    blocks=(ImageBlock, CategoryBlock),

The inputs to our model are images, and the outputs are categories (in this case, "bird" or "forest").

    get_items=get_image_files, 

To find all the inputs to our model, run the `get_image_files` function (which returns a list of all image files in a path).

    splitter=RandomSplitter(valid_pct=0.2, seed=42),

Split the data into training and validation sets randomly, using 20% of the data for the validation set.

    get_y=parent_label,

The labels (`y` values) is the name of the `parent` of each file (i.e. the name of the folder they're in, which will be *bird* or *forest*).

    item_tfms=[Resize(192, method='squish')]

Before training, resize each image to 192x192 pixels by "squishing" it (as opposed to cropping it).

Now we're ready to train our model. The fastest widely used computer vision model is `resnet18`. You can train this in a few minutes, even on a CPU! (On a GPU, it generally takes under 10 seconds...)

`fastai` comes with a helpful `fine_tune()` method which automatically uses best practices for fine tuning a pre-trained model, so we'll use that.

In [None]:
learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(3)

Generally when I run this I see 100% accuracy on the validation set (although it might vary a bit from run to run).

"Fine-tuning" a model means that we're starting with a model someone else has trained using some other dataset (called the *pretrained model*), and adjusting the weights a little bit so that the model learns to recognise your particular dataset. In this case, the pretrained model was trained to recognise photos in *imagenet*, and widely-used computer vision dataset with images covering 1000 categories) For details on fine-tuning and why it's important, check out the [free fast.ai course](https://course.fast.ai/).

## Step 3: Use our model (and build your own!)

Let's see what our model thinks about that bronthosaurus we downloaded at the start:

In [None]:
is_bronto,_,probs = learn.predict(PILImage.create('bronthosaurus.jpg'))
print(f"This is a: {is_bronto}.")
print(f"Probability it's a bronthosaurus: {probs[0]:.4f}")

Now it's your turn. Click "Copy & Edit" and try creating your own image classifier using your own image searches!

If you enjoyed this, please consider clicking the "upvote" button in the top-right -- it's very encouraging to us notebook authors to know when people appreciate our work.