Skip to content

DavidRR-F/python-etl-script-template

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

All in One Python ETL Template

This repository provides a streamlined structure for ETL projects. Leveraging Poetry package manager, it efficiently manages your script directory, integrates Jupyter notebook management, and offers custom commands for kernel creation and new notebook setup. The included query manager class organizes SQL queries into designated files within the SQL directory, making them easily accessible in Python code to streamline your ETL workflow.

Thanks to GCBallesteros you can use this template in conjuction with neovim plugins NotebookNavigator and Jupytext to create a all in one python data management workspace. Also check out Dadbod-UI for querying databases through neovim.

image

My Notebook Neovim Configuration

Example Usage

Define core/config variables with .env

db_user="Foo"  
db_pass="Bar"  
db_host="localhost"  
db_name="testdb"
log_file="/var/logs/script.log"

Notebook Managment

Create a Kernel that uses the projects virtualenv

poetry run create-kernel <kernel-name>

Create a new notebook in the notebook directory

poetry run create-book <notebook-name>

SQL Query Managment

Create a SQL file in the sql directory

sql/insert_user.sql

insert into users(name, email) 
values(:name, :email)

sql/stats.sql

select top(100) *
from q1
left join q2
on q1.clientId = q2.clientId

Reference the Query via the Querymanager in Script or Notebook

main.py

from managers import dm, qm

def main():
    dm.execute(qm.insert_user, name="Foo", email="Bar")
    df: pd.Dataframe = dm.select_dataframe(qm.stats)
    

notebook.ipynb

# %%
from script.managers.query import qm 
from script.managers.database import dm 
df = dm.select_dataframe(qm.stats)
df 
# %%

ToDo

  • Airflow Dag Template/Generator

File Structure

image