This series explores the basics of Apache Spark with the application of some practical elements of Spark and PySpark.
This repo contains code and explanation that covers the following topics.
- Apache Spark overview
- SparkContext and SparkSession
- RDDs
- Transformations and actions to RDDs
- The basics of Dataframes and SparkSQL
- How to submit an apache spark application as follows
Install a Spark Master and Worker using Docker Compose Create a python script containing a spark job Submit the job to the cluster directly from python (Note: you’ll learn how to submit a job from the command line in the Kubernetes Lab)