Skip to content

Build, train and deploy ML model to classify penguin types on tabular data

Notifications You must be signed in to change notification settings


Repository files navigation

Penguin Classifier

  • Build and train a machine learning classification model for penguin type
  • Deploy the trained model using FastAPI and Docker
  • Streamlit App to showcase the model & present data anlysis results.
  • You can test the app on streamlit sharing

Alt text

How to run

  • Clone this repo.
  • Install poetry (a tool for dependency management and packaging in Python)
    curl -sSL | python -
  • Install the dependencies:
    poetry install
  • Execute the following command to activate the environment:
    poetry shell


Data were collected and made available by Dr. Kristen Gorman and Palmer Station, Antarctica LTER. It is also available on Kaggle. It has observations about 3 different kinds of penguins, for the years between 2007 and 2009, with the following features:

  • species: penguin species (Chinstrap, Adélie, or Gentoo). This is the the target variable.
  • bill_length_mm: culmen length (mm)
  • bill_depth_mm: culmen depth (mm)
  • flipper_length_mm: flipper length (mm)
  • body_mass_g: body mass (g)
  • island: island name (Dream, Torgersen, or Biscoe) in the Palmer Archipelago (Antarctica)
  • sex: penguin sex

The data is a csv file in /data/penguins.csv that has 344 rows and 8 columns.

Alt text


A basic Random Forest Classifier is used to predict the species of penguins (see sklearn). This is not the best model ever but can be a good baseline for further finetuning. The variables bill_length_mm, bill_depth_mm, flipper_length_mm and body_mass_g are numeric variables and can be fed directly to the algorithm. On the other hand, island, and sex are categorical variables and are transformed to one-hot encoding. This results in 9 inputs. Alt text

Streamlit App

A simple streamlit app is built to showcase the model as you see in the following figure:

Alt text

  • You can test the app locally:
    streamlit run streamlit_app/
  • Or test the model online using this link

Deployment using FastAPI and Docker

We deploy the model as an HTTP endpoint using FastAPI, and then dockerize the code in a docker image.

We use this docker image as the base image (from the developper of the FastAPI package)

  • To build the image:
    docker image build -t penguin_app .
  • To run the container locally:
    docker container run -d -p 8080:80 --name myapp penguin_app
  • To run the app using docker-compose:
    docker-compose up 
  • To run the app using docker swarm:
    docker stack deploy -c docker-compose.yml MyApp