Natural Language Processing Part 3

Date: 4 December 2018

Project Overview

Build a classification model that can distinguish between toxic and non-toxic comments and use the model in a real-life application.

The meetups serve as guidance. The goal is for all attendees to build a good machine learning model that can be used in a real-life application. We encourage all attendees to apply creativity to this project. There are no limits.

Installation Requirements

All code is written in Python. Please use this guide to get Python and Jupyter Notebook up and running.

Project Setup

This project contains a Flask Web App and Keras NLP model files trained to identify levels of toxicity in comments.

It is deployed on Heroku Heroku.

The Deployment instructions below will help you in deploying it as your own Web App on Heroku.

Dependencies

Python: 3.6
Flask: 1.0.2
Keras: 2.2.4
pandas: 0.23.4
numpy: 1.15.4
sklearn: 0.20.1

Training

For those who want to walk though the whole process from training to deployment, you need to download the data to train the model

To download the data, run:

python ml_model/download.py

This will download the training data and pre-trained embedding file in :

./assets/data/train.csv
./assets/embedding/fasttext-crawl-300d-2m/crawl-300d-2M.vec

To train the model, run:

python ml_model/train_classifier.py

This will train a pooled GRU with FastText embedding. The text preprocessor and the model will be seriallized and stored in:

./assets/model/preprocessor.pkl
./assets/model/model.h5 Note: This took ~1 hour to train on Intel Core i7-HQ CPU

Prediction

To get a feeling of doing predictions, run:

python ml_model/predict.py

# output:
# Corgi is stupid          - Toxicity: [0.99293655]
# good boy                 - Toxicity: [0.02075008]
# School of AI is awesome  - Toxicity: [0.01223523]
# F**K                     - Toxicity: [0.90747666]

Deployment

Create a Heroku account if you don't already have one.
Install Heroku CLI.
Fork this Github repo.
Clone the forked repo to your local machine.
Navigate to your cloned repo directory.
Create your Heroku App via heroku create <app-name>
Deploy your App via git push heroku master
Run heroku open to open your newly deployed web app on your Web Browser.

Meetup Content

Part 3 Slides

Resources

The project uses data from Kaggle's Toxic Comment Classification Challenge. The data can be found here.

If you are struggling with implementing some of the concepts discussed at the meetup, check out the slides notebook as guidance. There are also many kernels specific to the toxic comment challenge that you can refer to get some inspiration or help.

Alternatively, ask for assistance on Slack. That's what this community is all about :)

Other Resources:

Meetup Contributors

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
assets		assets
ml_model		ml_model
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
app.py		app.py
config.py		config.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt

License

SchoolofAI-Vancouver/NLP_Project_3

Folders and files

Latest commit

History

Repository files navigation

Natural Language Processing Part 3

Project Overview

Installation Requirements

Project Setup

Dependencies

Training

Prediction

Deployment

Meetup Content

Resources

Meetup Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Languages