Samples for Packt Publishing's Spark for Data Science Cookbook
The samples in this project were written with JDK 1.8, Scala 2.10.4, Spark 1.6.2 or Spark 2.0.2 and SBT .13.11
- Big Data Analytics with Spark - RDD, Persisting RDD, Broadcast variables, DataFrames and Spark Streaming
- Tricky Statistics with Spark - Sampling, Descriptive Statistics, Frequency tables, Pandas.
- Data Analysis with Spark - Univariate Analysis, Bivariate Analysis, Missing value treatment, Outlier detection
- Clustering, Classification and Regression - Linear, Logistic regression and k-means clustering
- Working with Spark MLlib - Spark MLLib (Naive Bayes, Decision trees, Recommendation system)
- NLP with Spark - NLTK, POS tagging, NER, Chunker, Sentence Detector, Lemmatization and Sentiment Analysis
- Working with Sparkling Water H2O - Deep Learning, H2O, K-means and SVM
- Data Visualization with Spark - Zeppelin, Lightning Server, Bokeh-scala
- Deep Learning on Spark - CaffeOnSpark, RBM, CNN, Tensor flow
- Working with SparkR - SparkR data frame, Machine learning with SparkR