Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


See a demo in production!

Test out the API:

curl -d "data=Le commerce n'est pas un monstre et la publicité" -X GET

See the website for a tutorial.

This repo is an example of how to build a machine learning application from scratch! This is a simple web application that uses the Naive Bayes algorithm to classify a string of text as belonging to one of several languages.

This application has a front-end built in Flask, a model server created with flask_restful, and a database in SQLite. Data is downloaded from Wikipedia.


Install dependencies with conda:

conda create --name language_classifier python=3.8
conda activate language_classifier
pip install -r requirements.txt

Run the app locally

Run the web application:

flask run



View the application:

Run the model API:


Test out the API:

curl -d "data=Le commerce n'est pas un monstre et la publicité" -X GET

The data pipeline

Download some data from Wikipedia:

cd scraper && python

Generate a SQLite database:

Explore the data:

$ sqlite3 language_data.db
> SELECT language, title FROM wiki_data LIMIT 25;

Make and test model (run all cells in this notebook):

cd ../modeling
jupyter lab create_model.ipynb

Deploy the model by editing the following lines in

MODEL_NAME = "NB_classif"

How to put this into production

You should deploy the web application to Heroku or something. Then you should deploy to GCP or AWS. You'll need to fiddle with the requirements for each of these apps as well as the DNS so the servers can talk to each other.


  • This application was designed with teaching in mind, so there are some simplifications.
  • This application has three distinct severs: a web app, a model API, and a SQL database. When run locally, these distinctions don't matter, but when you deploy an app like this, these pieces will all be distinct servers.
  • All the dependencies here are global - if this was deployed, not all dependencies would need to be shared between servers.