Skip to content

Build, train and deploy ML model to classify penguin types on tabular data

Notifications You must be signed in to change notification settings

TerboucheHacene/penguin_ml

Repository files navigation

Penguin Classifier

  • Build and train a machine learning classification model for penguin type
  • Deploy the trained model using FastAPI and Docker
  • Streamlit App to showcase the model & present data anlysis results.
  • You can test the app on streamlit sharing

Alt text

How to run

  • Clone this repo.
  • Install poetry (a tool for dependency management and packaging in Python)
    curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
  • Install the dependencies:
    poetry install
    
  • Execute the following command to activate the environment:
    poetry shell
    

Data:

Data were collected and made available by Dr. Kristen Gorman and Palmer Station, Antarctica LTER. It is also available on Kaggle. It has observations about 3 different kinds of penguins, for the years between 2007 and 2009, with the following features:

  • species: penguin species (Chinstrap, Adélie, or Gentoo). This is the the target variable.
  • bill_length_mm: culmen length (mm)
  • bill_depth_mm: culmen depth (mm)
  • flipper_length_mm: flipper length (mm)
  • body_mass_g: body mass (g)
  • island: island name (Dream, Torgersen, or Biscoe) in the Palmer Archipelago (Antarctica)
  • sex: penguin sex

The data is a csv file in /data/penguins.csv that has 344 rows and 8 columns.

Alt text

Modeling

A basic Random Forest Classifier is used to predict the species of penguins (see sklearn). This is not the best model ever but can be a good baseline for further finetuning. The variables bill_length_mm, bill_depth_mm, flipper_length_mm and body_mass_g are numeric variables and can be fed directly to the algorithm. On the other hand, island, and sex are categorical variables and are transformed to one-hot encoding. This results in 9 inputs. Alt text

Streamlit App

A simple streamlit app is built to showcase the model as you see in the following figure:

Alt text

  • You can test the app locally:
    streamlit run streamlit_app/penguin_streamlit.py
  • Or test the model online using this link

Deployment using FastAPI and Docker

We deploy the model as an HTTP endpoint using FastAPI, and then dockerize the code in a docker image.

We use this docker image as the base image (from the developper of the FastAPI package)

  • To build the image:
    docker image build -t penguin_app .
  • To run the container locally:
    docker container run -d -p 8080:80 --name myapp penguin_app
    
  • To run the app using docker-compose:
    docker-compose up 
    
  • To run the app using docker swarm:
    docker stack deploy -c docker-compose.yml MyApp