Skip to content

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph p…

License

Notifications You must be signed in to change notification settings

corneliouzbett/Master-Apache-Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Master-Apache-Spark

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming

Course Outline

  1. An overview of the architecture of Apache Spark.
  2. RDD transformations and actions
  3. Spark SQL
  4. Develop Apache Spark 2.0 applications with PySpark
  5. Advanced techniques to optimize and tune Apache Spark jobs
  6. Spark on Amazon's Elastic MapReduce service
  7. Big data ecosystem overview
  8. Datasets and DataFrames
  9. Analyze structured and semi-structured data
  10. broadcast variables and accumulators
  11. Best practices of working with Apache Spark in the field.
  12. Big data ecosystem overview.

About

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph p…

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published