APie Simple ETL

This repo is a compilation of Python Classes to help Extract data from a source (a database, a file, a data lake, or other) and make the minimal necessary transformations to load it into your Data Warehouse.

Dev Setup

Install ASDF to keep versions consistent

Install Python 3.11.4

asdf plugin add python
asdf install python 3.11.4

Install Poetry 1.5.1

asdf plugin-add poetry https://github.com/asdf-community/asdf-poetry.git
asdf install poetry 1.5.1

In the source repository run:

poetry install
poetry run pre-commit install

Copy .env.example as .env into the 'tests' folder and enter the correct value for each variable

Each class by itself does not do anything. To see the code running, try running one of the Pipelines in the folder ./src/pipelines

Folders and Arch definition

There are two main folders at the root of the project:

src
- Here we have all the code
tests
- Here, we have all the tests, configuration files, fixtures, and anything else necessary to run tests on the code.

src folder

Inside the src folder, we have four main folders

extract
- Here, we have the definition of the extract classes and the declaration of the interface
load
transform
pipeline

tests folder

To simplify the import statements on your tests, every 'test_*.py' file is placed in the tests folder root. Any other definition necessary to run the tests will be placed into the respective folder.

Creating a Pipeline

The main objective when defining each of the base classes is that your pipeline can be defined as:

    from ../extract/choose_one_extract_class import ExtractClass
    from ../transform/choose_one_transform_class import TransformClass
    from ../load/choose_one_load_class import LoadClass

    # the **kwargs depend on the class you choose
    extract_obj = ExtractClass(**kwargs)
    transform_obj = TransformClass(**kwargs)
    load_obj = LoadClass(**kwargs)

    extracted_data = extract_obj.extract(**kwargs)
    transformed_data = transform_obj.transform(extracted_data, **kwargs)
    load_obj.load(transformed_data, **kwargs)

Not all pipelines will need a transformation before the load

How to create a new class

When creating a new class, you need to inherit the interface for that step, so if you want to extract data, you need to inherit the extract_interface
Develop all the methods defined in the interface and any other necessary methods
Create unit tests for your class
Open a PR

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.github/workflows		.github/workflows
.trunk		.trunk
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.tool-versions		.tool-versions
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

APie Simple ETL

Dev Setup

Folders and Arch definition

src folder

tests folder

Creating a Pipeline

How to create a new class

About

Releases

Packages

Languages

License

ArthurPieri/simple_etl

Folders and files

Latest commit

History

Repository files navigation

APie Simple ETL

Dev Setup

Folders and Arch definition

src folder

tests folder

Creating a Pipeline

How to create a new class

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages