spooq2.extractor.extractor
Let your extractor class inherit from the extractor base class. This includes the name, string representation and logger attributes from the superclass.
takes
=> no input parameters
and returns a
=> PySpark DataFrame!
All configuration and parameterization should be done while initializing the class instance.
Here would be a simple example for a CSV Extractor:
create_extractor/csv_extractor.py
create_extractor/init.diff
One of Spooq2's features is to provide tested code for multiple data pipelines. Please take your time to write sufficient unit tests! You can reuse test data from tests/data or create a new schema / data set if needed. A SparkSession is provided as a global fixture called spark_session.
create_extractor/test_csv.py
You need to create a rst for your extractor which needs to contain at minimum the automodule or the autoclass directive.
create_extractor/csv.rst.code
To automatically include your new extractor in the HTML documentation you need to add it to a toctree directive. Just refer to your newly created csv.rst file within the extractor overview page.
create_extractor/overview.diff
That should be all!