Skip to content

A simple boilerplate for data engineering and data analysis training in Databricks.

License

Notifications You must be signed in to change notification settings

ac-gomes/data-engineering-with-databricks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Project Overview

This template was developed to help me in my learning when I started studying Databrick/Spark, but now I'm making it available to provide a good experience for other absolute beginners. You can start it even if you don't know how to create a dataframe or even a new directory to write files, tables and databases.

What does this template do?

  • Create 3 pySpark DataFrames for relational data transformation practice
  • Create 4 folders to write data [current user directory, raw, structured, curated]
  • Create 1 databese in the strutured zone
  • You can see the source code and learn from it
  • Reset the Environment (has the functions to clean your environment)
  • How to create tables (in Hive database), see the 04-Table_Reference notebook
  • Python Unit Testing with unittest on Databricks

Notebooks

  1. Config-DataFrame
  2. Config-Directories
  3. Config-Database
  4. common
  5. Reset-Environment
  6. Helpers
  7. Test
  8. Test_Runner
  9. 01-Training_Python
  10. 02-Table-Reference

How to use it?

Feel free to contribute 😃

Enjoy it!

About

A simple boilerplate for data engineering and data analysis training in Databricks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published