Skip to content

Latest commit

 

History

History
67 lines (54 loc) · 2.45 KB

extract_tracker.rst

File metadata and controls

67 lines (54 loc) · 2.45 KB

ExtractTracker - Python

Below is a deeper dive of the capabilities of the Python implementation ExtractTracker submodule. Note that ProcessTracker MUST be used in conjunction with ExtractTracker.

Registering Extracts

Once the process run has been registered, an extract can be registered, provided the following variables are set.:

process_run = ProcessTracker(process_name='Lahman Teams Load'
                                     , process_type='Stage Load'
                                     , actor_name='New ProcessTracker User'
                                     , tool_name='Spark'
                                     , source_name='Lahman Baseball Dataset')

extract = ExtractTracker(process_run=process_run
                              , filename='Teams.csv'
                              , location_name='Lahman Baseball Databank 2018'
                              , location_path='~/baseballdatabank-master_2018-03-28/baseballdatabank-master/core/')

Those variables will be used to populate the data store backend as explained in the following table:

ExtractTracker object initialization variables
Variable Name Variable Description Reference Object Object Created If Not Exist?
process_run An instance of ProcessTracker :ref:`process_tracking` No
filename The extract file's filename :ref:`extract_tracking` Yes
location An instance of Extract Location, optional if created already. :ref:`location_lkup` No
location_name The given name of the location. Optional. :ref:`location_lkup` Yes
location_path The filepath of the location. Required if location instance not provided. :ref:`location_lkup` Yes
status The extract file status. Optional. :ref:`extract_status_lkup` Yes

Changing Extract Status

As extract files are used within a process run, their status will need to be modified.:

extract.change_extract_status(status='loading')

Custom extract status can be entered, but the default status types must be used for ProcessTracker to know what to do with files. As long as the file's status is eventually changed to one of those then the process flow will continue.