language-classifier

Test out the API:

curl https://language-identifier-app.herokuapp.com/identify -d "data=Le commerce n'est pas un monstre et la publicité" -X GET

See the website for a tutorial.

This repo is an example of how to build a machine learning application from scratch! This is a simple web application that uses the Naive Bayes algorithm to classify a string of text as belonging to one of several languages.

This application has a front-end built in Flask, a model server created with flask_restful, and a database in SQLite. Data is downloaded from Wikipedia.

Install

Install dependencies with conda:

conda create --name language_classifier python=3.8
conda activate language_classifier
pip install -r requirements.txt

Run the app locally

Run the web application:

export FLASK_APP=app.py
flask run

Or:

python app.py

View the application:

http://127.0.0.1:5000

Run the model API:

python api.py

Test out the API:

curl http://127.0.0.1:5001//identify -d "data=Le commerce n'est pas un monstre et la publicité" -X GET

The data pipeline

Download some data from Wikipedia:

cd scraper && python get_data.py

Generate a SQLite database:

create_database.py

Explore the data:

$ sqlite3 language_data.db
> SELECT language, title FROM wiki_data LIMIT 25;

Make and test model (run all cells in this notebook):

cd ../modeling
jupyter lab create_model.ipynb

Deploy the model by editing the following lines in classify_language.py:

MODEL_NAME = "NB_classif"
MODEL_VERSION = "1"

How to put this into production

You should deploy the web application app.py to Heroku or something. Then you should deploy api.py to GCP or AWS. You'll need to fiddle with the requirements for each of these apps as well as the DNS so the servers can talk to each other.

Caveats

This application was designed with teaching in mind, so there are some simplifications.
This application has three distinct severs: a web app, a model API, and a SQL database. When run locally, these distinctions don't matter, but when you deploy an app like this, these pieces will all be distinct servers.
All the dependencies here are global - if this was deployed, not all dependencies would need to be shared between servers.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
modeling		modeling
scraper		scraper
templates		templates
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
api.py		api.py
app.py		app.py
classify_language.py		classify_language.py
languages.py		languages.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

language-classifier

Install

Run the app locally

The data pipeline

How to put this into production

Caveats

About

Contributors 2

Languages

camoverride/language-classifier

Folders and files

Latest commit

History

Repository files navigation

language-classifier

Install

Run the app locally

The data pipeline

How to put this into production

Caveats

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages