Skip to content

SujayAmbekar/Big-Data-Project

Repository files navigation

Big-Data-Project

Machine Learning with Spark Streaming for Ham-Spam Detection (Enron email classification dataset)

Batch wise processing using sklearn module and incremental learning.

Built Gaussian Naïve Bayes, SGD classifier, Multinomial Naïve Bayes, MiniBatchKmeans and PassiveAggressive classifiers.

Implemented joblib in order to save the trained model.

Used “partial fit” to perform online updates to the model.

About

Machine Learning with Spark Streaming

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages