Skip to content

SchoolofAI-Vancouver/NLP_Project_3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Natural Language Processing Part 3

organised by Vancouver School of AI

Date: 4 December 2018

Project Overview

Build a classification model that can distinguish between toxic and non-toxic comments and use the model in a real-life application.

The meetups serve as guidance. The goal is for all attendees to build a good machine learning model that can be used in a real-life application. We encourage all attendees to apply creativity to this project. There are no limits.

Installation Requirements

All code is written in Python. Please use this guide to get Python and Jupyter Notebook up and running.

Project Setup

This project contains a Flask Web App and Keras NLP model files trained to identify levels of toxicity in comments.

It is deployed on Heroku Heroku.

The Deployment instructions below will help you in deploying it as your own Web App on Heroku.

Dependencies

  • Python: 3.6
  • Flask: 1.0.2
  • Keras: 2.2.4
  • pandas: 0.23.4
  • numpy: 1.15.4
  • sklearn: 0.20.1

Training

For those who want to walk though the whole process from training to deployment, you need to download the data to train the model

To download the data, run:

python ml_model/download.py

This will download the training data and pre-trained embedding file in :

  • ./assets/data/train.csv
  • ./assets/embedding/fasttext-crawl-300d-2m/crawl-300d-2M.vec

To train the model, run:

python ml_model/train_classifier.py

This will train a pooled GRU with FastText embedding. The text preprocessor and the model will be seriallized and stored in:

  • ./assets/model/preprocessor.pkl
  • ./assets/model/model.h5 Note: This took ~1 hour to train on Intel Core i7-HQ CPU

Prediction

To get a feeling of doing predictions, run:

python ml_model/predict.py

# output:
# Corgi is stupid          - Toxicity: [0.99293655]
# good boy                 - Toxicity: [0.02075008]
# School of AI is awesome  - Toxicity: [0.01223523]
# F**K                     - Toxicity: [0.90747666]

Deployment

  1. Create a Heroku account if you don't already have one.
  2. Install Heroku CLI.
  3. Fork this Github repo.
  4. Clone the forked repo to your local machine.
  5. Navigate to your cloned repo directory.
  6. Create your Heroku App via heroku create <app-name>
  7. Deploy your App via git push heroku master
  8. Run heroku open to open your newly deployed web app on your Web Browser.

Meetup Content

Part 3 Slides

Resources

The project uses data from Kaggle's Toxic Comment Classification Challenge. The data can be found here.

If you are struggling with implementing some of the concepts discussed at the meetup, check out the slides notebook as guidance. There are also many kernels specific to the toxic comment challenge that you can refer to get some inspiration or help.

Alternatively, ask for assistance on Slack. That's what this community is all about :)

Other Resources:

Meetup Contributors

Akshi Chaudhary

Johannes Harmse

Xinbin Huang

Peter Lin

Johannes Giorgis

Guru Shiva