Skip to content

Wenqiang Feng's notes for pySpark using real air quality data. I'm leaving practical and ready to use commands

License

Notifications You must be signed in to change notification settings

c-pzzo/pySpark_Notes

Repository files navigation

pySpark_Notes

Wenqiang Feng's notes for pySpark using real air quality data. I'm leaving practical and ready to use commands

Creating RDDs.ipynb

An RDD in Spark is simply an immutable distributed collection of objects sets. Each RDD is split into multiple partitions (similar pattern with smaller sets), which may be computed on different nodes of the cluster.

  • Start Spark environment
  • Importing data from different sources and transforming them into RDDs
  • List of commands for doing different actions over RDDs.

Source: https://runawayhorse001.github.io/LearningApacheSpark/rdd.html

About

Wenqiang Feng's notes for pySpark using real air quality data. I'm leaving practical and ready to use commands

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published