# Automation

## TLDR

- The automation features gives you the possibility to add additional processing steps after the default update steps: download of new zip files from SEC, transforming them to Parquet format and indexing them in the SQLite DB. This means, when a new zip file is detected at SEC, not only the mentioned three steps are automatcially executed but also additional steps that you define by yourself.
- There are two hook methods you can define and use to implement additional logic. Both of them are activated by defining them in the configuration file.
- The simpler of this two hook methods just receives the `Configuration` object and is called after the all updated steps, the default update stapes (downloading of zip files, transform to parquet, indexing) and additional user defined update steps, were executed.
- The more complex one receives a `Configuration` object and has to return a list of instances derived from `AbstractProcess`. There are some basic implementations of `AbstractProcess`, that can be used to filter, concat, and standardize the data.
- The library contains an example implementation of a hook function which filters the data and also directly applies the standardizer for balance sheet, income statement, and cash flow. See below on how it is implemented and how it can be directly used.

<span style="color: #FF8C00;">==========================================================</span>

**If you find this tool useful, a sponsorship would be greatly appreciated!**

**https://github.com/sponsors/HansjoergW**

How to get in touch

* Found a bug: https://github.com/HansjoergW/sec-fincancial-statement-data-set/issues
* Have a remark: https://github.com/HansjoergW/sec-fincancial-statement-data-set/discussions/categories/general
* Have an idea: https://github.com/HansjoergW/sec-fincancial-statement-data-set/discussions/categories/ideas
* Have a question: https://github.com/HansjoergW/sec-fincancial-statement-data-set/discussions/categories/q-a
* Have something to show: https://github.com/HansjoergW/sec-fincancial-statement-data-set/discussions/categories/show-and-tell

<span style="color: #FF8C00;">==========================================================</span>

## Defining a simple postupdatehook function

If you define a postupdatehook function in the configuration file then this function will be called after the all update steps were executed.

It will be called, regardless if the previous steps did actually do something. For instance, if there was now new zip file detected to download, it will be called anyway, but not more than once every 24 hours (the usual period the framework checks for upates).

Since the hook method is called even if there were no updates, it is your responsibility to check if actually something did change. Otherwise, if you implemented time consuming logic, it would be executed every 24 hours once.

The postupdatehook function needs a `Configuration` parameter and does not return anything. It can have any name you like.

<pre>
# it is ok to import the Configuration class
from secfsdstools.a_config.configmodel import Configuration 
from secfsdstools.c_index.indexdataaccess import ParquetDBIndexingAccessor
from secfsdstools... import ...

def my_postupdatehook_function(configuration: Configuration):
    
    # you can use the configuration for instance to instantiate access to the SQLite db
    index_db = ParquetDBIndexingAccessor(db_dir=configuration.db_dir)
    ...
    
</pre>

To activate it, just define it in the DEFAULT section of the configuration file:

<pre>
[DEFAULT]
downloaddirectory = C:/data/sec/automated/dld
dbdirectory = C:/data/sec/automated/db
parquetdirectory = C:/data/sec/automated/parquet
useragentemail = your.email@goeshere.com
autoupdate = True
keepzipfiles = False
postupdatehook = mypackage.mymodule.my_postupdatehook_function
</pre>

## Defining a postupdateprocesses function

If you define a postupdateprocess function, it has to return a list of instances of `AbtractProcess`. These instances are then executed after the default steps download, transform to parquet, and indexing were executed.

Also here, every "process" will be called once every 24 hours, and therefore, every process implementation has to check itself if something changed.

As a parameter, the postupdatedprocesses function must have a `Configuration` parameter and also has to return list of instances `AbstractProcess`.

*Note: There are some basic implementations of the `AbstractProcess` class within the `secfsdstools.g_pipelines` package that provide implementation to filter, to concat bags, and to standardize joined bags. 
Please have a look at the following section which show an example on how this basic implementations can be used.*

<pre>
# it is ok to import the Configuration and AbstractProcess classes
from secfsdstools.a_config.configmodel import Configuration 
from secfsdstools.c_automation.task_framework import AbstractProcess


def my_postupdateprocesses_function(configuration: Configuration) -> List[AbstractProcess]:
    # do your secfsdstools imprts here
    from secfsdstools... import ...
    
    processes: List[AbstractProcess] = []
    ...
    
    return processes
    
</pre>

To activate it, added the appropriate configuration in the DEFAULT section of the configuration file:

<pre>
[DEFAULT]
downloaddirectory = C:/data/sec/automated/dld
dbdirectory = C:/data/sec/automated/db
parquetdirectory = C:/data/sec/automated/parquet
useragentemail = your.email@goeshere.com
autoupdate = True
keepzipfiles = False
postupdateprocesses = mypackage.mymodule.my_postupdateprocesses_function
</pre>