Apache Spark is a lightning-fast cluster computing technology, designed for fast computation. It is based on Hadoop MapReduce and it extends the MapReduce model to efficiently use it for more types of computations, which includes interactive queries and stream processing. The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application.
- In Class Instruction: 4 Hours
- In Class code along Dataset: customers
- Installation of Spark
- Hands-on exercise with datasets
- Spark is a fast and general engine for large-scale data processing.
- Hortonworks has an introductory tutorial.
- Understand what is Spark and where does it fit in the Hadoop ecosystem
- Components of Spark
- Extending Spark where required to achieve different objectives
- Running and Executing a Spark script
- Why we need Spark
- Spark v/s Mapreduce
- RDD Processing
- What is Transform and Action
- Apache Spark for beginners
- Research Paper on Spark