Skip to content
No description, website, or topics provided.
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
conf
data
docs
.gitignore
README.md
environment.yml

README.md

Toxic Comments

tl;dr: Surfacing toxic Wikipedia comments, by training an NLP deep learning model utilizing multi-task learning a variety of deep learning architectures. Data from a Kaggle competition

Quick start

# Install anaconda environment
conda env create -f environment.yml 

# Activate environment
source activate toxic

# Run script
cd bin/
python main.py

Repo structure

  • bin/main.py: Code entry point
  • docs/writeup/writeup.md: Project summary
  • conf/confs.yaml: Configuration file, used to choose parameters
  • docs/modeling_notes.md: Notes / support for design decisions
  • data/schemas: Data set schemas

Python Environment

Python code in this repo utilizes packages that are not part of the common library. To make sure you have all of the appropriate packages, please install Anaconda, and install the environment described in environment.yml (Instructions here).

Configuration file

This program utilizes a configuration file (conf/confs.yaml). It will run with the default parameters, but many parameters can be freely changed. Parameters include:

  • run_train: Whether to train a model
  • run_infer: Whether to used the trained model to predict classifications for the Kaggle test data set
  • test_run: Whether to run on a subset of observations. This is helpful for debugging
  • model_choice: Either the name of a method in bin/models, or serialized. If serialized, use serialized_model_choice_path to provide a path to a serialized model
  • num_epochs: The number of epochs to train for

Contact

Feel free to contact me at 13herger <at> gmail <dot> com

You can’t perform that action at this time.