Skip to content

admin-greyatom/big_data_spark_in_class

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Logo

APACHE SPARK

Apache Spark is a lightning-fast cluster computing technology, designed for fast computation. It is based on Hadoop MapReduce and it extends the MapReduce model to efficiently use it for more types of computations, which includes interactive queries and stream processing. The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application.

At a glance

  • In Class Instruction: 4 Hours
    • In Class code along Dataset: customers

In Class Activity

  • Installation of Spark
  • Hands-on exercise with datasets

Pre Reads

  1. Spark is a fast and general engine for large-scale data processing.
  2. Hortonworks has an introductory tutorial.

Learning Objectives

  • Understand what is Spark and where does it fit in the Hadoop ecosystem
  • Components of Spark
  • Extending Spark where required to achieve different objectives
  • Running and Executing a Spark script

Agenda

  • Why we need Spark
  • Spark v/s Mapreduce
  • RDD Processing
  • What is Transform and Action

Slides

Spark

Post Reads

  1. Apache Spark for beginners
  2. Research Paper on Spark

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published