GitHub - ac-gomes/data-engineering-with-databricks: A simple boilerplate for data engineering and data analysis training in Databricks.

Project Overview

This template was developed to help me in my learning when I started studying Databrick/Spark, but now I'm making it available to provide a good experience for other absolute beginners. You can start it even if you don't know how to create a dataframe or even a new directory to write files, tables and databases.

What does this template do?

Create 3 pySpark DataFrames for relational data transformation practice
Create 4 folders to write data [current user directory, raw, structured, curated]
Create 1 databese in the strutured zone
You can see the source code and learn from it
Reset the Environment (has the functions to clean your environment)
How to create tables (in Hive database), see the 04-Table_Reference notebook
Python Unit Testing with unittest on Databricks

Notebooks

How to use it?

Just import the data-engineering.dbc file in your Databricks Community account and run the 01-Training_Python notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
data-engineering.dbc		data-engineering.dbc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

data-engineering.dbc

data-engineering.dbc

Repository files navigation

Project Overview

What does this template do?

Notebooks

How to use it?

Feel free to contribute 😃

Enjoy it!

About

Releases

Packages

License

ac-gomes/data-engineering-with-databricks

Folders and files

Latest commit

History

Repository files navigation

Project Overview

What does this template do?

Notebooks

How to use it?

Feel free to contribute 😃

Enjoy it!

About

Topics

Resources

License

Stars

Watchers

Forks