Skip to content

Latest commit

 

History

History
74 lines (53 loc) · 2.22 KB

loader.rst

File metadata and controls

74 lines (53 loc) · 2.22 KB

Loader Base Class

spooq2.loader.loader

Create your own Loader

Let your loader class inherit from the loader base class. This includes the name, string representation and logger attributes from the superclass.

The only mandatory thing is to provide a load() method which
takes a
=> PySpark DataFrame!
and returns
nothing (or at least the API does not expect anything)

All configuration and parameterization should be done while initializing the class instance.

Here would be a simple example for a loader which save a DataFrame to parquet files:

Exemplary Sample Code

create_loader/parquet.py

References to include

This makes it possible to import the new loader class directly from spooq2.loader instead of spooq2.loader.parquet. It will also be imported if you use from spooq2.loader import *.

create_loader/init.diff

Tests

One of Spooq2's features is to provide tested code for multiple data pipelines. Please take your time to write sufficient unit tests! You can reuse test data from tests/data or create a new schema / data set if needed. A SparkSession is provided as a global fixture called spark_session.

create_loader/test_parquet.py

Documentation

You need to create a rst for your loader which needs to contain at minimum the automodule or the autoclass directive.

create_loader/parquet.rst.code

To automatically include your new loader in the HTML / PDF documentation you need to add it to a toctree directive. Just refer to your newly created parquet.rst file within the loader overview page.

create_loader/overview.diff

That should be it!