Skip to content

This series explores the basics of Apache Spark with the application of some practical elements of Spark, PySpark & SparkSQL

Notifications You must be signed in to change notification settings

WazirRohiman/Apache_Spark_Basics

Repository files navigation

Apache Spark Basics

This series explores the basics of Apache Spark with the application of some practical elements of Spark and PySpark.

This repo contains code and explanation that covers the following topics.

  • Apache Spark overview
  • SparkContext and SparkSession
  • RDDs
  • Transformations and actions to RDDs
  • The basics of Dataframes and SparkSQL
  • How to submit an apache spark application as follows
      Install a Spark Master and Worker using Docker Compose
      Create a python script containing a spark job
      Submit the job to the cluster directly from python (Note: you’ll learn how to submit a job from the command line in the Kubernetes Lab)
    

About

This series explores the basics of Apache Spark with the application of some practical elements of Spark, PySpark & SparkSQL

Topics

Resources

Stars

Watchers

Forks