Spam API

Microservices API for spam filtering system.

Abstract

One of the goals of this repository is design an approach to design machine learning systems.

To run

$ python -m main

# Note that this will not work since the import will be messed up
$ python main.py

Flows

Prepare text data
- removal of stop words
- lemmatization
Feature extraction process
Training the classifiers
Checking performance

Pickled

To view the size of the pickled file:

$ du -h *.pkl

Tips

At first it may be tempting to construct your pipeline to include the feature extractor:

pipeline = Pipeline([('vect', CountVectorizer(stop_words = 'english')),
                      ('tfidf', TfidfTransformer()),
                      ('gaussian_nb', GaussianNB())])

But note that this will only be useful when training your model. For prediction, you need to reuse the feature extractor function. Also, when training multiple classifiers, you will end up running the feature extraction process which is not optimal.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
alt		alt
data		data
models		models
prediction_server		prediction_server
train_server		train_server
.gitignore		.gitignore
README.md		README.md
grid_pipeline_multi.ipynb		grid_pipeline_multi.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam API

Abstract

To run

Flows

Pickled

Tips

About

Releases

Packages

Languages

alextanhongpin/spam-api

Folders and files

Latest commit

History

Repository files navigation

Spam API

Abstract

To run

Flows

Pickled

Tips

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages