Skip to content

Latest commit

 

History

History
71 lines (41 loc) · 3.43 KB

File metadata and controls

71 lines (41 loc) · 3.43 KB

Week 5: Batch Processing

5.1 Introduction

5.2 Installation

Follow these intructions to install Spark:

And follow this to run PySpark in Jupyter

5.3 Spark SQL and DataFrames

Script to prepare the Dataset download_data.sh

Note: The other way to infer the schema (apart from pandas) for the csv files, is to set the inferSchema option to true while reading the files in Spark.

5.4 Spark Internals

5.5 (Optional) Resilient Distributed Datasets

5.6 Running Spark in the Cloud

Homework

Community notes

Did you take notes? You can share them here.