spooq2.loader.loader
Let your loader class inherit from the loader base class. This includes the name, string representation and logger attributes from the superclass.
takes a
=> PySpark DataFrame!
and returns
nothing (or at least the API does not expect anything)
All configuration and parameterization should be done while initializing the class instance.
Here would be a simple example for a loader which save a DataFrame to parquet files:
create_loader/parquet.py
This makes it possible to import the new loader class directly from spooq2.loader instead of spooq2.loader.parquet. It will also be imported if you use from spooq2.loader import *.
create_loader/init.diff
One of Spooq2's features is to provide tested code for multiple data pipelines. Please take your time to write sufficient unit tests! You can reuse test data from tests/data or create a new schema / data set if needed. A SparkSession is provided as a global fixture called spark_session.
create_loader/test_parquet.py
You need to create a rst for your loader which needs to contain at minimum the automodule or the autoclass directive.
create_loader/parquet.rst.code
To automatically include your new loader in the HTML / PDF documentation you need to add it to a toctree directive. Just refer to your newly created parquet.rst file within the loader overview page.
create_loader/overview.diff
That should be it!