Skip to content

Latest commit

 

History

History
325 lines (222 loc) · 10.4 KB

README.md

File metadata and controls

325 lines (222 loc) · 10.4 KB

Duck or Cat

Duck or Cat is a binary classification model. It classifies pictures of ducks and cats.

Screenshot DuckOrCat

Table of Contents

Content of this project

This project is split into 3 differents folders:

  • scrapy: Scrapy project that helps with the creation of a dataset. It massively download pictures of cats and ducks.
  • jupyter: Jupyter project that trains and validates the model.
  • flask: Flask project that alows the user to try the model.

Dataset

The balance of the dataset is the following:

  • train: 9000 cats and 9000 ducks
  • test: 2000 cats and 9000 ducks

The model

The model is a CNN classification model, it has the following structure:

Structure of the model

Result

Here is the evolution of the accuracy over 25 epochs:

Accuracy and loss over epochs

Here is the predictions of cats and ducks:

Cats Ducks

And here is the confusion matrix:

Confusion matrix

The model has an accuracy of 98.21%, which seems to be very different for the deployed model.

Project setup

Install CUDA drivers on Windows 11:

Here.

Install CuDNN on WSL2:

Find the correct version of cuDNN here then run:

wget https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.0/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.0.131_1.0-1_amd64.deb/
sudo dpkg -i cudnn-local-repo-ubuntu2204-8.9.0.131_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-8.9.0.131/cudnn-local-D7522631-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install libcudnn8 libcudnn8-dev nvidia-cudnn

Install CUDA on WSL2:

Delete the old key:

sudo apt-key del 7fa2af80

Find the correct version of CUDA here then run:

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
bsudo apt-get update
sudo apt-get -y install cuda

Install TensorRT:

Get the correct version of TensorRT here and install it:

sudo dpkg -i nv-tensorrt-local-repo-ubuntu2204-8.6.1-cuda-12.0_1.0-1_amd64.deb
sudo cp /var/nv-tensorrt-local-repo-ubuntu2204-8.6.1-cuda-12.0/*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get install tensorrt

Install Miniconda:

Get the correct version of Miniconda here and install it:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sudo bash Miniconda3-latest-Linux-x86_64.sh -p /usr/bin/miniconda3

Then run:

conda init

Disable Miniconda on start-up (optional):

Get into the Miniconda environement if you aren't already:

source /usr/bin/miniconda3/bin/activate

Turn off conda on start-up:

conda config --set auto_activate_base false

Create a conda environement:

Get into the Miniconda environement if you aren't already:

source /usr/bin/miniconda3/bin/activate

Create a new conda environement:

conda create --name env python=3.9

Install librairies:

Get in the environement:

conda activate env

Install cudatoolkit:

conda install -c conda-forge cudatoolkit=11.8.0

Install required librairies:

cd dev/DuckOrCat/ && pip install -r scrapy/requirements.txt

Set the path variables:

CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

Usage

This project works in this order :

  1. Scrapy to download/generate a dataset
  2. Jupyter to create and train the model
  3. Flask to deploy the model in real conditions.

But since the model is already built and stored on github at flask/model.h5, it's possible to skip step 1 and 2.

Scrapy

Thoughts process

Scrapy will massively download pictures from istockphoto.com.

The problem is that if you ask for duck or cat pictures on this site (or any free image bank), you often get wrong pictures (aka plastic ducks, cooked ducks, cats with dogs, pushies of cats, drawing of cats, etc ...), and it's rather impossible to handpick yourself several thousands of pictures.

Instead, you can make a search by image to get way better results and download all of them. I searched for several breeds of cats and ducks, so that my dataset may have some variety in it.

If you want to understand the difference, go to istockphoto.com and search for duck. Then compare the results to this, which is a search image for a mallard duck. It's also the first of 7 URLs you can find in scrapy/downloader/spiders/duck.py.

Get into the Miniconda environement if you aren't already:

source /usr/bin/miniconda3/bin/activate

To download picture of ducks or cats, get into the scrapy/ folder and run:

scrapy crawl [duck|cat]

Some scripts are available in scrapy/tools/:

  • delete_duplicate.py: delete every picture duplicates in the dataset folder.
  • rename.py: rename every pictures of a folder following a pattern.
  • resize.py: to save on storage and ressources during the training, resize every pictures to be a maximum of 1024 pixels of width and height.

To use any of them, get into the scrapy/tools/ and run:

python3 [script_name]

Jupyter

Thoughts process

Scrapy will train the model on the dataset (given that you have manually move the dataset folder from the scrapy folder to the jupyter folder), and save the model as a .h5 file, which will be useful later.

The training steps are very inspired by Keras CNN Dog or Cat Classification.

And although it has a 98% accuracy, the model in real condition is far from this result. It's probably due to the fact that Scrapy downloads the dataset by breeds of cats and ducks, which means that if you pass a breed of cat that I didn't get with Scrapy, the model will be confused.

Get into the Miniconda environment if you aren't already:

source /usr/bin/miniconda3/bin/activate

To develop on the model, get into the jupyter/ folder and run:

jupyter-lab

Flask

Thoughts process

Flask will take the model.h5 and expose it on a simple web interface.

I wanted to deploy this on Fly.io, sadly it would costs me too much as a model needs a lot of ressources. So it's possible to test it only locally for now.

Get into the Miniconda environement if you aren't already:

source /usr/bin/miniconda3/bin/activate

To develop the web interface, get into the flask/ folder and run:

flask --app  main.py --debug run

If you want to run the production version on Docker instead, you can run:

docker build -f Dockerfile -t flask . && docker run -p 8000:8000 -it flask

Then open http://localhost:8000/.

Versions

  • Windows 11 and WSL2 (Ubuntu 22.04.2 LTS)
  • Python 3.9
  • Flask 2.2.3
  • Scrapy 2.7.1
  • Tensorflow 2.13
  • CuDNN 8.9.4
  • CUDA 12
  • Docker version 20.10.24 (optional)

Structure

flask/
scrapy/
jupyter/
env/
.gitignore
README.md

References