spooq2.transformer.transformer
Let your transformer class inherit from the transformer base class. This includes the name, string representation and logger attributes from the superclass.
takes a
=> PySpark DataFrame!
and returns a
=> PySpark DataFrame!
All configuration and parameterization should be done while initializing the class instance.
Here would be a simple example for a transformer which drops records without an Id:
create_transformer/no_id_dropper.py
This makes it possible to import the new transformer class directly from spooq2.transformer instead of spooq2.transformer.no_id_dropper. It will also be imported if you use from spooq2.transformer import *.
create_transformer/init.diff
One of Spooq2's features is to provide tested code for multiple data pipelines. Please take your time to write sufficient unit tests! You can reuse test data from tests/data or create a new schema / data set if needed. A SparkSession is provided as a global fixture called spark_session.
create_transformer/test_no_id_dropper.py
You need to create a rst for your transformer which needs to contain at minimum the automodule or the autoclass directive.
create_transformer/no_id_dropper.rst.code
To automatically include your new transformer in the HTML / PDF documentation you need to add it to a toctree directive. Just refer to your newly created no_id_dropper.rst file within the transformer overview page.
create_transformer/overview.diff
That should be it!