Master-Apache-Spark

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming

Course Outline

An overview of the architecture of Apache Spark.
RDD transformations and actions
Spark SQL
Develop Apache Spark 2.0 applications with PySpark
Advanced techniques to optimize and tune Apache Spark jobs
Spark on Amazon's Elastic MapReduce service
Big data ecosystem overview
Datasets and DataFrames
Analyze structured and semi-structured data
broadcast variables and accumulators
Best practices of working with Apache Spark in the field.
Big data ecosystem overview.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.vscode		.vscode
codelab		codelab
in		in
module		module
rdd		rdd
sql		sql
.gitignore		.gitignore
.zshrc		.zshrc
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Master-Apache-Spark

Course Outline

About

Releases

Packages

Languages

License

corneliouzbett/Master-Apache-Spark

Folders and files

Latest commit

History

Repository files navigation

Master-Apache-Spark

Course Outline

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages