Duck or Cat is a binary classification model. It classifies pictures of ducks and cats.
This project is split into 3 differents folders:
scrapy
: Scrapy project that helps with the creation of a dataset. It massively download pictures of cats and ducks.jupyter
: Jupyter project that trains and validates the model.flask
: Flask project that alows the user to try the model.
The balance of the dataset is the following:
- train: 9000 cats and 9000 ducks
- test: 2000 cats and 9000 ducks
The model is a CNN classification model, it has the following structure:
Here is the evolution of the accuracy over 25 epochs:
Here is the predictions of cats and ducks:
And here is the confusion matrix:
The model has an accuracy of 98.21%, which seems to be very different for the deployed model.
Find the correct version of cuDNN here then run:
wget https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.0/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.0.131_1.0-1_amd64.deb/
sudo dpkg -i cudnn-local-repo-ubuntu2204-8.9.0.131_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-8.9.0.131/cudnn-local-D7522631-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install libcudnn8 libcudnn8-dev nvidia-cudnn
Delete the old key:
sudo apt-key del 7fa2af80
Find the correct version of CUDA here then run:
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
bsudo apt-get update
sudo apt-get -y install cuda
Get the correct version of TensorRT here and install it:
sudo dpkg -i nv-tensorrt-local-repo-ubuntu2204-8.6.1-cuda-12.0_1.0-1_amd64.deb
sudo cp /var/nv-tensorrt-local-repo-ubuntu2204-8.6.1-cuda-12.0/*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get install tensorrt
Get the correct version of Miniconda here and install it:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sudo bash Miniconda3-latest-Linux-x86_64.sh -p /usr/bin/miniconda3
Then run:
conda init
Get into the Miniconda environement if you aren't already:
source /usr/bin/miniconda3/bin/activate
Turn off conda on start-up:
conda config --set auto_activate_base false
Get into the Miniconda environement if you aren't already:
source /usr/bin/miniconda3/bin/activate
Create a new conda environement:
conda create --name env python=3.9
Get in the environement:
conda activate env
Install cudatoolkit:
conda install -c conda-forge cudatoolkit=11.8.0
Install required librairies:
cd dev/DuckOrCat/ && pip install -r scrapy/requirements.txt
Set the path variables:
CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
This project works in this order :
- Scrapy to download/generate a dataset
- Jupyter to create and train the model
- Flask to deploy the model in real conditions.
But since the model is already built and stored on github at flask/model.h5
, it's possible to skip step 1 and 2.
Thoughts process
Scrapy will massively download pictures from istockphoto.com
.
The problem is that if you ask for duck or cat pictures on this site (or any free image bank), you often get wrong pictures (aka plastic ducks, cooked ducks, cats with dogs, pushies of cats, drawing of cats, etc ...), and it's rather impossible to handpick yourself several thousands of pictures.
Instead, you can make a search by image to get way better results and download all of them. I searched for several breeds of cats and ducks, so that my dataset may have some variety in it.
If you want to understand the difference, go to istockphoto.com
and search for duck
. Then compare the results to this, which is a search image for a mallard duck. It's also the first of 7 URLs you can find in scrapy/downloader/spiders/duck.py
.
Get into the Miniconda environement if you aren't already:
source /usr/bin/miniconda3/bin/activate
To download picture of ducks or cats, get into the scrapy/
folder and run:
scrapy crawl [duck|cat]
Some scripts are available in scrapy/tools/
:
delete_duplicate.py
: delete every picture duplicates in the dataset folder.rename.py
: rename every pictures of a folder following a pattern.resize.py
: to save on storage and ressources during the training, resize every pictures to be a maximum of 1024 pixels of width and height.
To use any of them, get into the scrapy/tools/
and run:
python3 [script_name]
Thoughts process
Scrapy will train the model on the dataset (given that you have manually move the dataset folder from the scrapy folder to the jupyter folder), and save the model as a .h5
file, which will be useful later.
The training steps are very inspired by Keras CNN Dog or Cat Classification.
And although it has a 98% accuracy, the model in real condition is far from this result. It's probably due to the fact that Scrapy downloads the dataset by breeds of cats and ducks, which means that if you pass a breed of cat that I didn't get with Scrapy, the model will be confused.
Get into the Miniconda environment if you aren't already:
source /usr/bin/miniconda3/bin/activate
To develop on the model, get into the jupyter/
folder and run:
jupyter-lab
Thoughts process
Flask will take the model.h5
and expose it on a simple web interface.
I wanted to deploy this on Fly.io, sadly it would costs me too much as a model needs a lot of ressources. So it's possible to test it only locally for now.
Get into the Miniconda environement if you aren't already:
source /usr/bin/miniconda3/bin/activate
To develop the web interface, get into the flask/
folder and run:
flask --app main.py --debug run
If you want to run the production version on Docker instead, you can run:
docker build -f Dockerfile -t flask . && docker run -p 8000:8000 -it flask
Then open http://localhost:8000/.
- Windows 11 and WSL2 (Ubuntu 22.04.2 LTS)
- Python 3.9
- Flask 2.2.3
- Scrapy 2.7.1
- Tensorflow 2.13
- CuDNN 8.9.4
- CUDA 12
- Docker version 20.10.24 (optional)
flask/
scrapy/
jupyter/
env/
.gitignore
README.md
- Scrapy
- Item Pipeline
- Pillow + scrapy = sometimes cannot identify image file
- Dst tensor is not initialized #38
- Install Tensorflow/Keras in WSL2 for Windows with NVIDIA GPU
- CUDA - Installation
- Image Recognition Guide
- Binary Classification
- 🐈🐕 Cat and Dog Classification
- Keras CNN Dog or Cat Classification
- TF2 - Tutorials - Keras - Save and Restore Models
- How To Make a Web Application Using Flask in Python 3
- How Can You Use TensorFlow with Docker?