LingHacks GirlsCodeMonth Workshop

Introductory computational linguistics workshop given at GirlsCodeMonth 2018. Topics covered:

Basic data preprocessing in Python
TFIDF word vectorization
Support vector machine algorithms

This workshop walks through a basic SVM classifier that detects if text is spam or not spam. We use the Kaggle SMS Spam Collection Dataset. File descriptions:

bad_evaluate.py: trains and evaluates a classifier on a random train/test split of the entire dataset from Kaggle.
bad_runner.py: trains a classifier on a random split of the entire dataset and lets user test with their own text.
good_evaluate.py: trains and evaluates a classifier on a balanced spam/ham dataset with a random train/test split.
good_runner.py: trains a classifier on a random split of the balanced dataset and lets user test with their own text.

To run any of these files from [your-computer]/linghacks-girlscodemonth-workshop:

python3 [the-file]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
bad_evaluate.py		bad_evaluate.py
bad_runner.py		bad_runner.py
good_evaluate.py		good_evaluate.py
good_runner.py		good_runner.py
spam.csv		spam.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LingHacks GirlsCodeMonth Workshop

About

Releases

Packages

Languages

ENSCMA2/linghacks-girlscodemonth-workshop

Folders and files

Latest commit

History

Repository files navigation

LingHacks GirlsCodeMonth Workshop

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages