# Create, Train and Deploy an image classification model for free - zero hardware

This notebook will take you through every step needed to create an image classification model. 
We will be using using Kaggle, Fast.ai HuggingFace, Replit and Github to do everything we need for free! 

It'll be split into 3 main parts:
1. Getting started and importing dataset.
2. Creating and training model
3. Deploying your model

If you enjoy this notebook, or learn anything. Please upvote and follow me on twitter @jawdinmorris.
First we install the latest version of fastbook:

# 1.0 Kaggle, Data and Setup

# 1 Kaggle

**What is kaggle?** 
Kaggle is one of the world's largest data science platforms. It allows you to find, collate and manage datasets, machine learning models and even enter competitions! You're reading this, most likely, on Kaggle right now. 

**Why kaggle?**
Because it lets us use Jupyter notebooks in the cloud, for free. You can easily create and run code in little Code Blocks. Any code block in this notebook, you can run as you follow along and it will all work. Else, fork your own version and edit whatever you want!


To get your data I recommend using the datasets i've created: 
* https://www.kaggle.com/datasets/jawdinmorris/bird-images
* https://www.kaggle.com/datasets/jawdinmorris/cat-images
* https://www.kaggle.com/datasets/jawdinmorris/puppy-images

Else, you need to find and import any datasets you'd like of individual animals. Kaggle has countless other datasets you can use. Our model doesn't need a lot to work, I use about 250 per animal. But meaningful, accurate results can be achieved with less than 50. 

You can use Python and BeautifulSoup, Bing API or even Google Chrome Extensions to quickly gather many images yourself. 

# 1.2 Setup
Now you need to setup your kaggle notebook. First, install the latest verison of fastbook Notice the ! at the beginning of the command here, this is to signify it is a bash command, note python code.

In [None]:
!pip install -Uqq fastbook

Import fastbook and it's related widgets. 
Then run setup_book to get started.

Under that is boilerplate code provided by kaggle. This creates links to the data you imported before.

In [None]:
import fastbook
from fastbook import *
from fastai.vision.widgets import *
fastbook.setup_book()

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        os.path.join(dirname, filename)

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

Define each of the categories of animal classifications whilst also linking those to the folders the images are nested under. 

In this case we are using animal_types as a list of all our folders containing different species images. 

In [None]:
animal_types = 'puppy-images','cat-images', 'bird-images'
path = Path('/kaggle/input')

Get all the images based on the path defined. Then make sure none of them are corrupted.

In [None]:
fns = get_image_files(path)
fns
failed = verify_images(fns)
failed

# 2 Create and Train a model

# 2.1 Create model

We now define a DataBlock. A datablock is how Fastbook defines data formats. There are multiple paramaters that can be provided:

* blocks - What style is your input and output?
    * In this case we are using images as our input and category as our output.
* get_items - What data are we using?
    * In this case we defined our image_files above
* splitter - You must define a validation split for training data
    * In this case we are using 20% as testing data and setting the seed for that random 20% to be the same every time.
* get_y - What labels will our data have?
    * In this case, we're just going to use the parent folder's name
* item_tfms - We can also transform our data as we put it into the datablock
    * In this case we're resizing every image to 128px

In [None]:
animals = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128))

# 2.2 Get the most out of your data
Now we have the vessel to hold all our data for the model and we have our data, we now need to load that data in using a dataloader. 
We then show a batch of that data to confirm it's correct.

In [None]:
dls = animals.dataloaders(path)
dls.valid.show_batch(max_n=4, nrows=1)


When we are comparing images to each other, they are compared pixel by pixel so it's important all of our images are the same size. There are multiple methods of resizing, including stretching/squishing, adding borders, or in this case choosing a random area to crop. We do this at a size of 128 - our model really doesn't need large pictures.

In [None]:
animals = animals.new(item_tfms=RandomResizedCrop(128, min_scale=0.3))
dls = animals.dataloaders(path)
dls.train.show_batch(max_n=4, nrows=1, unique=True)

You can also augment your data. As we have 250 images it's actually not needed, but it's a low-cost process and allows us to exponentially increase how much data we have and how 'clean' the data a consumer inputs has to be. 

In [None]:
animals = animals.new(item_tfms=Resize(128), batch_tfms=aug_transforms(mult=2))
dls = animals.dataloaders(path)
dls.train.show_batch(max_n=8, nrows=2, unique=True)

You can loops these together, or set them individually. The below code doesn't do anything new, it's just a different way of showing it.

In [None]:
animals = animals.new(
    item_tfms=RandomResizedCrop(224, min_scale=0.5),
    batch_tfms=aug_transforms())
dls = animals.dataloaders(path)

# 2.3 Train a model

We now have our model set-up, the training is going to take a surprisingly small amount of code due to fast.ai - we already defined some of what we needed to in our Datablock above. Creating a learner and allowing it to finetune is what actually updates the weights and accuracy of our model.

You pass in at least three parameters:
* Training data through our dataloader
** In this case we use dls as we defined above
* Which method of learning to use
** In this case we're using resnet34, which is probably a bit overkill for the task at hand. You can find more information about different algos, built on pytorch at the fast.ai docs https://docs.fast.ai/)
* How we want to measure our success or failure
** In this case we want to use an error_rate as our metrics

We then go over and finetune (perform epochs) 20 times.

In [None]:
learn = vision_learner(dls, resnet50, metrics=error_rate)
learn.fine_tune(5)

# 3.0 Deploy the model

In [None]:
learn.export()


From here we move over to huggingface, where we want to take our exported model and upload it to a space. You will need app.py, requirements.txt, your model pkl (named ourModel) and a test image named (dog.jpg). 

# 3.1 Host the model with HuggingFace
[Example of my HuggingFace Space](https://huggingface.co/spaces/JawdinMorris/catdog)
HuggingFace allows us to host the exported model, in conjunction with a framework called Gradio we can make this available to users instantly. Create a HuggingFace Space and create the following files:
* App.py (Our main code)
* requirements. (Import needed packages)
* ourModel.pkl (Our model exported from Kaggle)
* dog.jpg (A simple example image)

In [None]:
## THIS WONT RUN. TAKE THIS TO HUGGINGFACE AS YOUR app.py
# import gradio as gr
# from fastai.vision.all import *
# import skimage

# learn = load_learner('ourModel.pkl')

# labels = learn.dls.vocab
# def predict(img):
#     img = PILImage.create(img)
#     img = img.resize((128,128))
#     pred,pred_idx,probs = learn.predict(img)
#     return {labels[i].rstrip('-images'): float(probs[i]) for i in range(len(labels))}

# title = "Animal Classifier"
# description = "A dog cat classifier test for deployment"
# article="<p style='text-align: center'>This is our test model</p>"
# examples = ['dog.jpg']
# interpretation='default'
# enable_queue=True

# gr.Interface(fn=predict,inputs=gr.inputs.Image(shape=(128, 128)),outputs=gr.outputs.Label(num_top_classes=2),title=title,description=description,article=article,examples=examples,interpretation=interpretation,enable_queue=enable_queue).launch()

In [None]:
# #THIS WONT RUN. TAKE THIS TO HUGGINGFACE AS requirements.txt
# fastai
# scikit-image

Now you actually have a deployed model! You could share the huggingfaces link with friends, or we can use it's easy API to add it wherever we want.

# 3.2 Code the Website (Replit)

[Example of Replit Repo](https://replit.com/@JordanMorris5/animal-compare)

Use replit to code the website. Create a HTML, JS, CSS Project. You need three files:
* index.html
* script.js
* style.css

Hit the API with https://hf.space/embed/UserName/ProjectName/+/api/predict/

Where UserName and ProjectName are replaced with your details.

You can console.log the response to see the output and then assign that as needed. Full template code below.

In [None]:
# data Array(1)
#     confidences Array(3)
#         0: {label: 'bird', confidence: 0.999995231628418}
#         1: {label: 'cat', confidence: 0.000004623149834515061}
#         2:{label: 'puppy', confidence: 1.33642004129797e-7}

In [None]:
# ##THIS WILL NOT RUN TAKE IT TO REPLIT AS index.html
# <!DOCTYPE html>
# <html>

# <head>
# 	<meta charset="utf-8">
# 	<meta name="viewport" content="width=device-width">
# 	<title>Cat or Puppy?</title>
# 	<link href="style.css" rel="stylesheet" type="text/css" />
# </head>

# <body>
# 	<h1>Cat or Puppy?</h1>
# 	<div class="container">
# 		<div class="input-column">
# 			<h2>Input Photo:</h2>
# 			<p>Predict whether an image is of a cat or dog by uploading an image below:</p>
# 			<input id="photo" type="file">
# 			<br>
# 			<button id="submit">Submit</button>
# 		</div>
# 		<div class="output-column">
# 			<div id="results"></div>
# 		</div>
# 	</div>
# 	<script>
# 	</script>
# </body>

# </html>

In [None]:
#THIS WILL NOT RUN. Take it to replit as script.js
# 		async function loaded(reader) {
# 			const image = new Image();
# 			image.src = reader.result;

# 			const canvas = document.createElement('canvas');
# 			const ctx = canvas.getContext('2d');
# 			canvas.width = image.width;
# 			canvas.height = image.height;
# 			ctx.drawImage(image, 0, 0);

# 			// Resize the image to a smaller size
# 			const MAX_WIDTH = 128;
# 			const MAX_HEIGHT = 128;
# 			let width = image.width;
# 			let height = image.height;

# 			if (width > height) {
# 				if (width > MAX_WIDTH) {
# 					height *= MAX_WIDTH / width;
# 					width = MAX_WIDTH;
# 				}
# 			} else {
# 				if (height > MAX_HEIGHT) {
# 					width *= MAX_HEIGHT / height;
# 					height = MAX_HEIGHT;
# 				}
# 			}

# 			canvas.width = width;
# 			canvas.height = height;

# 			const resizeCtx = canvas.getContext('2d');
# 			resizeCtx.drawImage(image, 0, 0, width, height);

# 			const dataURL = canvas.toDataURL('image/jpeg', 0.9);

# 			const response = await fetch('https://hf.space/embed/JawdinMorris/catdog/+/api/predict/', {
# 				method: "POST", body: JSON.stringify({"data": [dataURL]}),
# 				headers: {"Content-Type": "application/json"}
# 			});
# 			const json = await response.json();
# 			console.log(json)
# 			const label = json['data'][0]['confidences'][0]['label'];
# 			const confidence = json['data'][0]['confidences'][0]['confidence'] * 100;
# 			if (confidence < 95) {
# 				results.innerHTML = `<h2>I think you're trying to trick me!</h2>`;
# 			} else {
# 				results.innerHTML = `<h2> We think this photo is of a...</h2> <p class="prediction">${label} with a ${confidence.toFixed(2)}% confidence</p><img src="${dataURL}" width="128">`
# 			}

# 		}

# 		function read() {
# 			if (photo.files && photo.files[0]) {
# 				const reader = new FileReader();

# 				reader.addEventListener('load', () => loaded(reader))
# 				reader.readAsDataURL(photo.files[0]);
# 			} else {
# 				// Handle the case where the file was not uploaded correctly
# 				console.error('No file uploaded');
# 			}
# 		}
# 		function submitForm() {
# 			const results = document.getElementById('results');
# 			results.innerHTML = `<h2>We're thinking...</h2>`;
# 			read();
# 		}
# 		photo.addEventListener('input', read);
# 		submit.addEventListener('click', submitForm);

In [None]:
#THIS WILL NOT RUN. Take it to replit as styles.css
# html {
# 	height: 100%;
# 	width: 100%;
# 	padding: 0;
# 	margin: 0;
# }

# .container {
# 	display: flex;
# 	flex-direction: row;
# }

# .input-column {
# 	width: 40%;
# 	padding: 1em;
# 	display: flex;
# 	flex-direction: column;
# }

# .output-column {
# 	width: 60%;
# 	padding: 1em;
# 	display: flex;
# 	flex-direction: column;
# }

# .prediction {
# 	font-size: 2em;
# }

# 3.3 Host Website Github
[Example of my Github Repo](https://github.com/JordanMorrisDev/animal-compare)

From there, upload it to Github and create a github pages. It should now be available like here:
https://jordanmorrisdev.github.io/animal-compare/