In [None]:
import pandas as pd
# ensure that all columns are shown and that colum content is not cut
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.width',1000)
pd.set_option('display.max_rows', 500) # ensure that all rows are shown

# Customize Standardizer
This Notebook gives some ideas how you could customize the standardizer classes.

All three standardizer classes `BalanceSheetStandardizer`, `IncomeStatementStandardizer`, and `CashFlowStandardizer` are derived from the same base class `Standardizer` and share the same constructor parameters. In fact, the whole behavior of the standardizer is defined by these parameters and the three standardizer classes are just containers which define the values for the constructor parameters but do not define additional methods or overwrite existing methods. So, it is simply a configuration of the base class.

Since every constructor parameter can be overwritten when instantiating one of the three standardizer classes, you can customize the standardizer in three ways:

1. Simply adapt the parameters of the constructor when you instantiate `BalanceSheetStandardizer`, `IncomeStatementStandardizer`, or `CashFlowStandardizer`. A simply way, for instance, to adapt the list of tags/columns that should appear in the final result.
2. Create a sublcass of `BalanceSheetStandardizer`, `IncomeStatementStandardizer`, or `CashFlowStandardizer` and redefine certain, more complex rules. For instance, maybe you want to define additional `Validation` rules, or you want to change the `Post` rules so that NaN-values are not set to zero but instead stay undefined.
3. Create a subclass directly from `Standardizer` and define everything yourself.

## Basic Constructor Parameters

The following simple bascic constructors are available to change some details of the behavior.

### `filter_for_main_statement`

A quaterly or annual report usually contains many different tables with data. Beside the tables with the primary financial information (Balance Sheet, Income Statement, or the CashFlow) there tables that often contain part of the information from the primary financial statements. Usually, however, you are just interested in the tables that contain the primary financial information.

If this flag is set to true (which is the default value), only the table that contains most data points that generally belong to the appropriate statement, will be returned in the result set.

### `additional_final_sub_fields`

When you call the `process` method of a standardizer, you will receive a restulting dataframe that just contains the `adsh` column as an identifier. In contrary, when you use the `present` method, the resulting data frame is enriched with additional information from the sub_df. By default, these are the columns `cik`, `name` (the last registered name of the company), `form` (either 10-K or 10Q), `fye` (the financial year ending as MMDD), `fy` (the financial year to which the report belongs), `fp` (the financial period Q1, Q2, Q3, or FY), `filed` (date when the report was filed with the SEC as an integer value in the format YYYYMMDD), `data` (same as `filed` but as areal date format).

However, there are many more columns in the sub_df available (like contact information). So if you would like to have the zip code of the town where the company is based, you can define this with the `additional_final_sub_fields` parameter:

    bs_standardizer = BalanceSheetStandardizer(additional_final_sub_fields=['zipba'])

    result_df = bs_standardizer.present(joined_bag)
    
    # or via the get_standardize_bag
    bs_standardizer.get_standardize_bag().result_df


### `additional_final_tags`

Every standardizer defines an internal list `final_tags` which defines the tags (resp. the columns) that are contained in the data frame that is returned. This columns are only a subset and sometimes aggregated fields of the fields that actually are avaiable. As the name standardizer suggest, the goal is to just provide information that is available in most of the reports. 

There may be situations, when you would like to have additional tags returned as well. For instance, instead of just having `LiabilitiesNoncurrent`, you might also be interested in the `LongTermDebt`. This is possible by defining the `additional_final_tags` parameter:


    bs_standardizer = BalanceSheetStandardizer(additional_final_tags=['LongTermDebt'])

    result_df = bs_standardizer.present(joined_bag)
    
    # or via the get_standardize_bag
    bs_standardizer.get_standardize_bag().result_df

### `final_tags`

Instead of just adding additional final tags with the `additional_final_tags` parameter, you can redefine the whole list directly with `final_tags` parameter. For instance, if you want to remove certain tags from the final result, or if you want them to appear in a certain order.

    # The default list is
    #     ['Assets', 'AssetsCurrent', 'Cash', 'AssetsNoncurrent',
    #      'Liabilities', 'LiabilitiesCurrent', 'LiabilitiesNoncurrent',
    #      'Equity',
    #      'HolderEquity',
    #      'RetainedEarnings',
    #      'AdditionalPaidInCapital',
    #      'TreasuryStockValue',
    #      'TemporaryEquity',
    #      'RedeemableEquity',
    #      'LiabilitiesAndEquity'] 
    # However, we are only interested in a subset of it and in a different order, so we adapt final_tags
    bs_standardizer = BalanceSheetStandardizer(final_tags=['LiabilitiesCurrent', 'LiabilitiesNoncurrent', 'Liabilities', 'AssetsCurrent', 'AssetsNoncurrent', 'Assets'])

    result_df = bs_standardizer.present(joined_bag)
    
    # or via the get_standardize_bag
    bs_standardizer.get_standardize_bag().result_df

## Subclassing

Subclassing makes sense when you want to change the more complex parameters. For instance, the definition of rules. Of course, you could also diretly do that just by changing the parameter, but it might make sense to put such definitions within a special class.

The following example shows, how we could change the definition of the `Post` rules, so that values are not set to zero in the `BalanceSheetStandardizer`.

    class NoSetToZeroBalanceSheetStandardizer(BalanceSheetStandardizer):
   
        # redefined post_rule_tree without any PostSetToZero rules
        post_rule_tree = RuleGroup(prefix="BS_POST",
                               rules=[
                                   # if only Assets is sets, set the AssetsCurrent to value
                                   # of Assets and AssetsNoncurrent to 0
                                   PostCopyToFirstSummand(sum_tag='Assets',
                                                          first_summand='AssetsCurrent',
                                                          other_summands=[
                                                              'AssetsNoncurrent']),
                                   # if only Liabilities is sets, set the LiabilitiesCurrent to
                                   # value of Liabilities and LiabilitiesNoncurrent to 0
                                   PostCopyToFirstSummand(sum_tag='Liabilities',
                                                          first_summand='LiabilitiesCurrent',
                                                          other_summands=[
                                                              'LiabilitiesNoncurrent']),
                               ])
        
        def __init__():
            super().__init__(
                post_rule_tree=post_rule_tree
            )
   
   
   


# How to find Tags