Skip to content

Latest commit

 

History

History
73 lines (53 loc) · 2.36 KB

transformer.rst

File metadata and controls

73 lines (53 loc) · 2.36 KB

Transformer Base Class

spooq2.transformer.transformer

Create your own Transformer

Let your transformer class inherit from the transformer base class. This includes the name, string representation and logger attributes from the superclass.

The only mandatory thing is to provide a transform() method which
takes a
=> PySpark DataFrame!
and returns a
=> PySpark DataFrame!

All configuration and parameterization should be done while initializing the class instance.

Here would be a simple example for a transformer which drops records without an Id:

Exemplary Sample Code

create_transformer/no_id_dropper.py

References to include

This makes it possible to import the new transformer class directly from spooq2.transformer instead of spooq2.transformer.no_id_dropper. It will also be imported if you use from spooq2.transformer import *.

create_transformer/init.diff

Tests

One of Spooq2's features is to provide tested code for multiple data pipelines. Please take your time to write sufficient unit tests! You can reuse test data from tests/data or create a new schema / data set if needed. A SparkSession is provided as a global fixture called spark_session.

create_transformer/test_no_id_dropper.py

Documentation

You need to create a rst for your transformer which needs to contain at minimum the automodule or the autoclass directive.

create_transformer/no_id_dropper.rst.code

To automatically include your new transformer in the HTML / PDF documentation you need to add it to a toctree directive. Just refer to your newly created no_id_dropper.rst file within the transformer overview page.

create_transformer/overview.diff

That should be it!