https://github.com/Nehal-Pawar/Apache-Spark/tree/master/SparkProject1/src/pawar/nehal/spark
• Demonstrated Apache Spark features like broadcast, join, persist and cache while doing data analysis on movie rating, friends’ network by age, finding min and max average temperature and popular movies
• Explored the architecture and processing of Apache Spark as a framework through research with a faculty member and deployed the spark program on AWS EMR using SBT build tool
• Achieved faster result by broadcasting RDD instead of using Dataset for find most popular movie
• Showcased implementation of BFS on Spark to find the degree of separation among super-hero social network
• Recommended movies by paring similar user rating by utilizing spark features like self-join, cache and persist which preserves the RDD in memory for faster performance