Skip to content

developershomes/SparkETL

Repository files navigation

Spark ETL

Extract Transform and Load using Spark or Extract , Load and Tranform using Spark

image

Here, we will create Spark notebooks for doing all of below ETL processes. Once we learn about all the ETL processes, we will start working on projects using Spark.

Please find list ETL Pipelines

  1. Chapter0 -> Spark ETL with Files (CSV | JSON | Parquet)
  2. Chapter1 -> Spark ETL with SQL Database (MySQL | PostgreSQL)
  3. Chapter2 -> Spark ETL with NonSQL Database (MongoDB)
  4. Chapter3 -> Spark ETL with Azure (Blob | ADLS)
  5. Chapter4 -> Spark ETL with AWS (S3 bucket)
  6. Chapter5 -> Spark ETL with Hive tables
  7. Chapter6 -> Spark ETL with APIs
  8. Chapter7 -> Spark ETL with Lakehouse (Delta Lake)
  9. Chapter8 -> Spark ETL with Lakehouse (Apache HUDI)
  10. Chapter9 -> Spark ETL with Lakehouse (Apache Iceberg)
  11. Chapter10 -> Spark ETL with Lakehouse (Delta Lake vs Apache Iceberg vs Apache HUDI)
  12. Chapter11 -> Spark ETL with Lakehouse (Delta table Optimization)
  13. Chapter12 -> Spark ETL with Apache Kafka
  14. Chapter13 -> Spark ETL with GCP (Big Query)

Also find below blog for understanding all the data engineering ETL Chapters

https://developershome.blog/category/data-engineering/spark-etl

Also find below youtube channel for understanding all the data engineering Chapters and learning new concepts of data engineering.

https://www.youtube.com/@developershomeIn

About

Spark all the ETL Pipelines

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published