Skip to content

CjTouzi/edx-Introduction-to-Big-Data-with-Apache-Spark

Repository files navigation

edx-Introduction-to-Big-Data-with-Apache-Spark

Lab 1: Word Count example using Spark

This exercise consists of 4 parts:
Part 1: Creating a base RDD and pair RDDs
Part 2: Counting with pair RDDs
Part 3: Finding unique words and a mean value
Part 4: Apply word count to a file

Lab 2: Web Server Log Analysis with Apache Spark

This exercise consists of 4 parts:

Part 1: Apache Web Server Log file format
Part 2: Sample Analyses on the Web Server Log File
Part 3: Analyzing Web Server Log File
Part 4: Exploring 404 Response Codes

Lab 3: Text Analysis and entity resolution

This exercise consists of 5 parts and quiz questions:

Part 1: ER as Text Similarity - Bags of Words
Part 2: ER as Text Similarity - Weighted Bag-of-Words using Term-Frequency/Inverse-Document-Frequency
Part 3: ER as Text Similarity - Cosine Similarity
Part 4: Scalable ER
Part 5: Analysis (this is part where you will click through and view plots of your work from part 4)

Lab 4: Movie Recommendations using Apache Spark

This exercise consists of 3 parts and quiz questions:

Part 1: Basic Recommendations
Part 2: Collaborative Filtering
Part 3: Predictions for Yourself (this is part where you will enter your own ratings and see what movies are recommended for you)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages