APACHE SPARK

Apache Spark is a lightning-fast cluster computing technology, designed for fast computation. It is based on Hadoop MapReduce and it extends the MapReduce model to efficiently use it for more types of computations, which includes interactive queries and stream processing. The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application.

At a glance

In Class Instruction: 4 Hours
- In Class code along Dataset: customers

In Class Activity

Installation of Spark
Hands-on exercise with datasets

Pre Reads

Spark is a fast and general engine for large-scale data processing.
Hortonworks has an introductory tutorial.

Learning Objectives

Understand what is Spark and where does it fit in the Hadoop ecosystem
Components of Spark
Extending Spark where required to achieve different objectives
Running and Executing a Spark script

Agenda

Why we need Spark
Spark v/s Mapreduce
RDD Processing
What is Transform and Action

Slides

Spark

Post Reads

Apache Spark for beginners
Research Paper on Spark

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
images/icons		images/icons
notebooks		notebooks
README.md		README.md
config_.json		config_.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

images/icons

images/icons

notebooks

notebooks

README.md

README.md

config_.json

config_.json

Repository files navigation

APACHE SPARK

At a glance

In Class Activity

Pre Reads

Learning Objectives

Agenda

Slides

Post Reads

About

Releases

Packages

admin-greyatom/big_data_spark_in_class

Folders and files

Latest commit

History

Repository files navigation

APACHE SPARK

At a glance

In Class Activity

Pre Reads

Learning Objectives

Agenda

Slides

Post Reads

About

Resources

Stars

Watchers

Forks