Apache-Spark

https://github.com/Nehal-Pawar/Apache-Spark/tree/master/SparkProject1/src/pawar/nehal/spark

Apache-Spark

Apache Spark (Scala)

• Demonstrated Apache Spark features like broadcast, join, persist and cache while doing data analysis on movie rating, friends’ network by age, finding min and max average temperature and popular movies

• Explored the architecture and processing of Apache Spark as a framework through research with a faculty member and deployed the spark program on AWS EMR using SBT build tool

• Achieved faster result by broadcasting RDD instead of using Dataset for find most popular movie

• Showcased implementation of BFS on Spark to find the degree of separation among super-hero social network

Item Based Collaborative filtering (Scala)

• Recommended movies by paring similar user rating by utilizing spark features like self-join, cache and persist which preserves the RDD in memory for faster performance

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
SparkProject1		SparkProject1
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache-Spark

Apache Spark (Scala)

Item Based Collaborative filtering (Scala)

About

Releases

Packages

Languages

Nehal-Pawar/Apache-Spark

Folders and files

Latest commit

History

Repository files navigation

Apache-Spark

Apache Spark (Scala)

Item Based Collaborative filtering (Scala)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages