Skip to content

Files

Latest commit

204a889 · May 25, 2023

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Apr 7, 2023
Jun 3, 2022
Jan 28, 2022
Jan 28, 2022
Jan 28, 2022
Oct 1, 2022
Jul 26, 2022
Jan 28, 2022
Sep 6, 2022
Jan 30, 2023
May 25, 2023
Mar 18, 2022
Dec 22, 2021
Dec 11, 2021
Jan 11, 2023

"... This book will be a great resource for
both readers looking to implement existing
algorithms in a scalable fashion and readers
who are developing new, custom algorithms
using Spark. ..."

Dr. Matei Zaharia
Original Creator of Apache Spark

FOREWORD by Dr. Matei Zaharia

Chapters

This directory contains all of the chapter codes for "Data Algorithms with Spark".


Bonus Chapters

The following directories are bonus chapters:

Bonus Chapter Description
Word Count Provided multiple solutions for word count problem using reduceByKey() and groupByKey() reducers.
Anagrams Find words, which are anagrams: provided multiple solutions for anagrams problem using reduceByKey(), groupByKey(), and combineByKey() reducers.
Lambda Expressions How to use Lambda Expressions in PySpark programs
TF-IDF Term Frequency - Inverse Document Frequency
K-mers K-mers for DNA Sequences
Correlation All vs. All Correlation
mapPartitions() Transformation mapPartitions() Complete Example
UDF User-Defined Function Example
DataFrames Transformations Examples on Creation and Transformation of DataFrames
DataFrames Tutorials DataFrames Tutorials: from collections and CSV text files
Join Operations Examples on join of RDDs
PySpark Tutorial 101 Examples on using PySpark RDDs and DataFrames
Physical Data Partitioning Tutorial of Physical Data Partitioning
Monoid: Design Principle Monoid as a Design Principle

Data Algorithms with Spark Data Algorithms with Spark