Skip to content

Spam filter built for introductory computational linguistics workshop given at GirlsCodeMonth 2018 in Palo Alto.

Notifications You must be signed in to change notification settings

ENSCMA2/linghacks-girlscodemonth-workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LingHacks GirlsCodeMonth Workshop

Introductory computational linguistics workshop given at GirlsCodeMonth 2018. Topics covered:

  • Basic data preprocessing in Python
  • TFIDF word vectorization
  • Support vector machine algorithms

This workshop walks through a basic SVM classifier that detects if text is spam or not spam. We use the Kaggle SMS Spam Collection Dataset. File descriptions:

  1. bad_evaluate.py: trains and evaluates a classifier on a random train/test split of the entire dataset from Kaggle.
  2. bad_runner.py: trains a classifier on a random split of the entire dataset and lets user test with their own text.
  3. good_evaluate.py: trains and evaluates a classifier on a balanced spam/ham dataset with a random train/test split.
  4. good_runner.py: trains a classifier on a random split of the balanced dataset and lets user test with their own text.

To run any of these files from [your-computer]/linghacks-girlscodemonth-workshop:

python3 [the-file]

About

Spam filter built for introductory computational linguistics workshop given at GirlsCodeMonth 2018 in Palo Alto.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages