# A working example with a low memory footprint of the postupdateprocesses function (introduced in 2.2.0)

### What this pipeline creates

The package `secfsdstools.x_examples.automation provides` provides default implementations for postupdateprocesses function.

The first of them is `secfsdstools.x_examples.automation.automation.define_extra_processes`.

It result in creating the following bags:

- a single joined bag per statement (BS, IS, CF, ..) that will contain the data from all available quarters.
- standardized bags for BS, IS, CF that contain data from all the available quarters.
- a single joined bag containing all the data from all statements from all available quarters.

Moreover, all these bags are updated in an efficient way, as soon as new data becomes available at the SEC website.

Note: especially the creation of the final single joined bag is quite memory intensive. (Have a look at notebook 08_02_automation_a_memory_optimized_example for a version that needs less memory)


### How to use the example


You can use this function directly by adding it to your configuration file together with some additional configuration parameters used by it: 
<pre>
[DEFAULT]
...
postupdateprocesses=secfsdstools.x_examples.automation.automation.define_extra_processes

[Filter]
filtered_dir_by_stmt_joined = C:/data/sec/automated/_1_filtered_by_stmt_joined

[Concat]
concat_dir_by_stmt_joined = C:/data/sec/automated/_2_concat_by_stmt_joined

[Standardizer]
standardized_dir = C:/data/sec/automated/_3_standardized

; [SingleBag]
; singlebag_dir = C:/data/sec/automated/_4_single_bag
</pre>

The function will add 3 additional steps and a fourth optional step. The optional step is only executed if the needed parameter `singlebag_dir` is defined. 

These steps add the following processing:

The first step creates a joined bag for every zip file which is filtered for 10-K and 10-Q reports only
and also applies the filters `ReportPeriodRawFilter`, `MainCoregRawFilter`, `USDOnlyRawFilter`, `OfficialTagsOnlyRawFilter`. 
Furthermore, the data is also split by stmt.
The filtered joined bag is stored under the path that is defined under `filtered_dir_by_stmt_joined` in the configuration file.
The resulting directory structure will look like this:


    <filtered_dir_by_stmt_joined>
        quarter
            2009q2.zip
                BS
                CF
                CI
                CP
                EQ
                IS
            ...

The second step creates a single joined bag for every statement (balance sheet, income statement,
cash flow, cover page, ...) that contains the data from all zip files, resp from all the
available quarters. These bags are stored under the path defined as `concat_dir_by_stmt_joined`.
The resulting directory structure will look like this:

    <concat_dir_by_stmt_joined>
        BS
        CF
        CI
        CP
        EQ
        IS    


The third step standardizes the data for balance sheet, income statement, and cash flow and stores
the standardized bags under the path that is defined as `standardized_dir`.
The resulting structure will look like this:

    <standardized_dir>
        BS
        CF
        IS    
    

The fourth step is optional and is only executed if the configuration file contains an entry
for `singlebag_dir`. If it does, it will create a single joined bag concatenating all the bags
created in the second step, so basically creating a single bag that contains all the filtered data from
all the available zip files, resp. quarters. 
In framework versions prior to version 2.1, this step needed quite a lot of memory. This was improved 
in version 2.1, which does the concatenation directly on the filesystem, without loading the data into
memory and hence has a very low memory footproint.
The resulting directory structure will look like this:

    <singlebag_dir>
        all


Hint -> data can directly be loaded with the JoinedDataBag load, resp with StandardizedBag load.


### How the example is implemented.

Let us have a look at the implementation of the the function `define_extra_processes`:
