Skip to content

Latest commit

 

History

History
70 lines (49 loc) · 2.03 KB

extractor.rst

File metadata and controls

70 lines (49 loc) · 2.03 KB

Extractor Base Class

spooq2.extractor.extractor

Create your own Extractor

Let your extractor class inherit from the extractor base class. This includes the name, string representation and logger attributes from the superclass.

The only mandatory thing is to provide an extract() method which
takes
=> no input parameters
and returns a
=> PySpark DataFrame!

All configuration and parameterization should be done while initializing the class instance.

Here would be a simple example for a CSV Extractor:

Exemplary Sample Code

create_extractor/csv_extractor.py

References to include

create_extractor/init.diff

Tests

One of Spooq2's features is to provide tested code for multiple data pipelines. Please take your time to write sufficient unit tests! You can reuse test data from tests/data or create a new schema / data set if needed. A SparkSession is provided as a global fixture called spark_session.

create_extractor/test_csv.py

Documentation

You need to create a rst for your extractor which needs to contain at minimum the automodule or the autoclass directive.

create_extractor/csv.rst.code

To automatically include your new extractor in the HTML documentation you need to add it to a toctree directive. Just refer to your newly created csv.rst file within the extractor overview page.

create_extractor/overview.diff

That should be all!