Skip to content

ChitturiPadma/SparkforDataScienceCookbook

Repository files navigation

SparkforDataScienceCookbook

Samples for Packt Publishing's Spark for Data Science Cookbook

The samples in this project were written with JDK 1.8, Scala 2.10.4, Spark 1.6.2 or Spark 2.0.2 and SBT .13.11

  1. Big Data Analytics with Spark - RDD, Persisting RDD, Broadcast variables, DataFrames and Spark Streaming
  2. Tricky Statistics with Spark - Sampling, Descriptive Statistics, Frequency tables, Pandas.
  3. Data Analysis with Spark - Univariate Analysis, Bivariate Analysis, Missing value treatment, Outlier detection
  4. Clustering, Classification and Regression - Linear, Logistic regression and k-means clustering
  5. Working with Spark MLlib - Spark MLLib (Naive Bayes, Decision trees, Recommendation system)
  6. NLP with Spark - NLTK, POS tagging, NER, Chunker, Sentence Detector, Lemmatization and Sentiment Analysis
  7. Working with Sparkling Water H2O - Deep Learning, H2O, K-means and SVM
  8. Data Visualization with Spark - Zeppelin, Lightning Server, Bokeh-scala
  9. Deep Learning on Spark - CaffeOnSpark, RBM, CNN, Tensor flow
  10. Working with SparkR - SparkR data frame, Machine learning with SparkR

About

Code for Packt Publishing's Spark for Data Science Cookbook.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages