Skip to content

Develop Spark Applications with Python & Cloudera. Explore the RDD API, the original core abstraction of Spark. Use Spark SQL and DataFrames

Notifications You must be signed in to change notification settings

ZUBOGU/LearnSpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

LearnSpark

Develop Spark Applications with Python & Cloudera. Explore the RDD API, the original core abstraction of Spark. Use Spark SQL and DataFrames

Apache Spark 2

Speed

Logistic Regression: Hadoop vs. Spark

Apache Spark: 0.9s
MapReduce: 110s

Execution time: lower s better

Ease of Use

Word Count Examples

Many lines in Hadoop
A couple of lines in Spark
Easy to learn and full of features

Unified Engine for Big Data

Support different languages: Python, Scala, Java, R Cluster Managers: Many libraries

iterative algorithms + interactive data mining tools.

Running environment

Use Python Use Spark on YARN in a Cloudera cluster. optional: Spark Standalone.

Why Couldera?

CDH+Tools // On-perm & Cloud
Director  // Cloud
Altus // Platform-as-a-Service

CDH

Cloudera's Distribution including Hadoop (HDFS)

About

Develop Spark Applications with Python & Cloudera. Explore the RDD API, the original core abstraction of Spark. Use Spark SQL and DataFrames

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published