This repository provides a streamlined structure for ETL projects. Leveraging Poetry package manager, it efficiently manages your script directory, integrates Jupyter notebook management, and offers custom commands for kernel creation and new notebook setup. The included query manager class organizes SQL queries into designated files within the SQL directory, making them easily accessible in Python code to streamline your ETL workflow.
Thanks to GCBallesteros you can use this template in conjuction with neovim plugins NotebookNavigator and Jupytext to create a all in one python data management workspace. Also check out Dadbod-UI for querying databases through neovim.
My Notebook Neovim Configuration
Define core/config
variables with .env
db_user="Foo"
db_pass="Bar"
db_host="localhost"
db_name="testdb"
log_file="/var/logs/script.log"
Create a Kernel that uses the projects virtualenv
poetry run create-kernel <kernel-name>
Create a new notebook in the notebook directory
poetry run create-book <notebook-name>
Create a SQL file in the sql directory
sql/insert_user.sql
insert into users(name, email)
values(:name, :email)
sql/stats.sql
select top(100) *
from q1
left join q2
on q1.clientId = q2.clientId
Reference the Query via the Querymanager in Script or Notebook
main.py
from managers import dm, qm
def main():
dm.execute(qm.insert_user, name="Foo", email="Bar")
df: pd.Dataframe = dm.select_dataframe(qm.stats)
notebook.ipynb
# %%
from script.managers.query import qm
from script.managers.database import dm
df = dm.select_dataframe(qm.stats)
df
# %%
- Airflow Dag Template/Generator