.. automodule:: spooq.extractor.extractor :no-members: :noindex:
Let your extractor class inherit from the extractor base class. This includes the name, string representation and logger attributes from the superclass.
All configuration and parameterization should be done while initializing the class instance.
Here would be a simple example for a CSV Extractor:
.. literalinclude:: create_extractor/csv_extractor.py :caption: spooq/extractor/csv_extractor.py: :language: python
.. literalinclude:: create_extractor/init.diff :caption: spooq/extractor/__init__.py: :language: udiff
One of Spooq's features is to provide tested code for multiple data pipelines. Please take your time to write sufficient unit tests! You can reuse test data from tests/data or create a new schema / data set if needed. A SparkSession is provided as a global fixture called spark_session.
.. literalinclude:: create_extractor/test_csv.py :caption: tests/unit/extractor/test_csv.py: :language: python
You need to create a rst for your extractor which needs to contain at minimum the automodule or the autoclass directive.
.. literalinclude:: create_extractor/csv.rst.code :caption: docs/source/extractor/csv.rst: :language: RST
To automatically include your new extractor in the HTML documentation you need to add it to a toctree directive. Just refer to your newly created csv.rst file within the extractor overview page.
.. literalinclude:: create_extractor/overview.diff :caption: docs/source/extractor/overview.rst: :language: udiff
That should be all!