Skip to content

A collection of small projects exploring PySpark features and functionality including packages and modules, algorithms, and general data science techniques.

Notifications You must be signed in to change notification settings

Bcromas/pyspark_projects

Repository files navigation

PySpark Projects

A collection of small projects exploring PySpark features and functionality including:

Packages and modules

readStream/writeStream, Pipeline, OneHotEncoder, StringIndexer, StandardScaler, VectorAssembler

Algorithms

RandomForestClassifier, KMeans, LinearRegressionm, ridge and LASSO regressions, LogisticRegression

Techniques

feature extraction, evaluating the colinearity of features, calculating AUC, extracting feature importances, pre-processing, and EDA.

About

A collection of small projects exploring PySpark features and functionality including packages and modules, algorithms, and general data science techniques.

Topics

Resources

Stars

Watchers

Forks