<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

In [6]:
#| include: false
#skip
! [ -e /content ] && pip install -Uqq gingado nbdev # install or upgrade gingado on colab

In [8]:
#| include: false
from nbdev.showdoc import show_doc

## Support for model documentation

:::{.callout-important}

Up until v0.0.1-11, function `get_username` existed. However, it was removed from `utils` since it depended on `pwd`, which is only available to Unix-like systems. Therefore, this [issue](https://github.com/dkgaraujo/gingado/issues/3) was preventing Windows users from importing `gingado.utils`. Since the function was not essential, it was removed until a suitable alternative that works in all major ystems can be found.

:::

In [1]:
#|output: asis
#| echo: false
show_doc(get_datetime)

---

[source](https://github.com/dkgaraujo/gingado/tree/main/blob/main/gingado/utils.py#L13){target="_blank" style="float:right; font-size:smaller"}

### get_datetime

>      get_datetime ()

Returns the time now

In [12]:
d = get_datetime()
assert isinstance(d, str)
assert len(d) > 0

## Support for time series

Objects of the class [`Lag`](https://dkgaraujo.github.io/gingado/utils.html#lag) are similar to `scikit-learn`'s transformers.

In [2]:
#|output: asis
#| echo: false
show_doc(Lag)

  else: warn(msg)


---

[source](https://github.com/dkgaraujo/gingado/tree/main/blob/main/gingado/utils.py#L24){target="_blank" style="float:right; font-size:smaller"}

### Lag

>      Lag (lags=1, jump=0, keep_contemporaneous_X=False)

Base class for all estimators in scikit-learn.

The code below demonstrates how [`Lag`](https://dkgaraujo.github.io/gingado/utils.html#lag) works in practice. Note in particular that, because [`Lag`](https://dkgaraujo.github.io/gingado/utils.html#lag) is a transformer, it can be used as part of a `scikit-learn`'s `Pipeline`. 

In [15]:
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

randomX = np.random.rand(15, 2)
randomY = np.random.rand(15)

lags = 3
jump = 2

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('lagger', Lag(lags=lags, jump=jump, keep_contemporaneous_X=False))
]).fit_transform(randomX, randomY)

Below we confirm that the lagger removes the correct number of rows corresponding to the lagged observations:

In [16]:
assert randomX.shape[0] - lags - jump == pipe.shape[0]

And because [`Lag`](https://dkgaraujo.github.io/gingado/utils.html#lag) is a transformer, its parameters (`lags` and `jump`) can be calibrated using hyperparameter tuning to achieve the best performance for a model.

## Support for data augmentation with SDMX

:::{.callout-note}

please note that working with SDMX may take some minutes depending on the amount of information you are downloading.

:::

In [3]:
#|output: asis
#| echo: false
show_doc(list_SDMX_sources)

---

[source](https://github.com/dkgaraujo/gingado/tree/main/blob/main/gingado/utils.py#L72){target="_blank" style="float:right; font-size:smaller"}

### list_SDMX_sources

>      list_SDMX_sources ()

Returns the list of codes representing the SDMX sources available for data download

In [19]:
sources = list_SDMX_sources()
print(sources)

assert len(sources) > 0
# all elements are of type 'str'
assert sum([isinstance(src, str) for src in sources]) == len(sources)

['ABS', 'ABS_XML', 'BBK', 'BIS', 'CD2030', 'ECB', 'ESTAT', 'ILO', 'IMF', 'INEGI', 'INSEE', 'ISTAT', 'LSD', 'NB', 'NBB', 'OECD', 'SGR', 'SPC', 'STAT_EE', 'UNICEF', 'UNSD', 'WB', 'WB_WDI']


In [4]:
#|output: asis
#| echo: false
show_doc(list_all_dataflows)

---

[source](https://github.com/dkgaraujo/gingado/tree/main/blob/main/gingado/utils.py#L81){target="_blank" style="float:right; font-size:smaller"}

### list_all_dataflows

>      list_all_dataflows (codes_only=False, return_pandas=True)

Returns a dictionary listing all available dataflows for all sources. When using as a parameter to an `AugmentSDMX` object or to the `load_SDMX_data` function, set `codes_only=True`

In [22]:
dflows = list_all_dataflows(return_pandas=False)

assert isinstance(dflows, dict)
all_sources = list_SDMX_sources()
assert len([s for s in dflows.keys() if s in all_sources]) == len(dflows.keys())

2022-06-23 03:11:44,954 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-06-23 03:11:56,586 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-06-23 03:12:00,619 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-06-23 03:12:01,299 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-06-23 03:12:04,337 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>


[`list_all_dataflows`](https://dkgaraujo.github.io/gingado/utils.html#list_all_dataflows) returns by default a pandas Series, facilitating data discovery by users like so:

In [25]:
dflows = list_all_dataflows(return_pandas=True)
assert type(dflows) == pd.core.series.Series

dflows

2022-06-23 03:15:55,933 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-06-23 03:16:09,138 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-06-23 03:16:13,734 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-06-23 03:16:14,414 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-06-23 03:16:17,489 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-06-23 03:16:18,114 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>


This format allows for more easily searching `dflows` by source:

In [52]:
list_all_dataflows(codes_only=True, return_pandas=True)

2022-06-23 03:29:33,379 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-06-23 03:29:43,859 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-06-23 03:29:48,210 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-06-23 03:29:48,889 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-06-23 03:29:52,014 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-06-23 03:29:52,646 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>


ABS_XML  0                 ABORIGINAL_POP_PROJ
         1          ABORIGINAL_POP_PROJ_REMOTE
         2    ABS_ABORIGINAL_POPPROJ_INDREGION
         3                   ABS_ACLD_LFSTATUS
         4                     ABS_ACLD_TENURE
                            ...               
UNSD     5                     DF_UNData_UNFCC
WB       0               DF_WITS_Tariff_TRAINS
         1      DF_WITS_TradeStats_Development
         2           DF_WITS_TradeStats_Tariff
         3            DF_WITS_TradeStats_Trade
Name: dataflow, Length: 9114, dtype: object

In [None]:
dflows['BIS']

WS_CBPOL_D                            Policy rates daily
WS_CBPOL_M                          Policy rates monthly
WS_CBS_PUB                      BIS consolidated banking
WS_CREDIT_GAP                     BIS credit-to-GDP gaps
WS_DEBT_SEC2_PUB                     BIS debt securities
WS_DER_OTC_TOV                  OTC derivatives turnover
WS_DSR                            BIS debt service ratio
WS_EER_D              BIS effective exchange rates daily
WS_EER_M            BIS effective exchange rates monthly
WS_GLI                       Global liquidity indicators
WS_LBS_D_PUB                      BIS locational banking
WS_LONG_CPI                     BIS long consumer prices
WS_OTC_DERIV2                OTC derivatives outstanding
WS_SPP              BIS property prices: selected series
WS_TC                    BIS long series on total credit
WS_XRU                   US dollar exchange rates, m,q,a
WS_XRU_D                 US dollar exchange rates, daily
WS_XTD_DERIV                 Ex

Or the user can search dataflows by their human-readable name instead of their code. For example, this is one way to see if any dataflow has information on interest rates:

In [51]:
dflows[dflows.str.contains('Interest rates', case=False)]

ECB    RIR                                         Retail Interest Rates
ESTAT  cpc_ecexint     Candidate countries and potential candidates: ...
       ei_mfir_m                           Interest rates - monthly data
       enpe_irt_st                           Money market interest rates
       enpr_ecexint     ENP countries: exchange rates and interest rates
       irt_st_a                Money market interest rates - annual data
       irt_st_m               Money market interest rates - monthly data
       irt_st_q             Money market interest rates - quarterly data
       tec00034        Short-term interest rates: Day-to-day money rates
       tec00035        Short-term interest rates: three-month interba...
       teimf100                   Day-to-day money market interest rates
IMF    6SR             M&B: Interest Rates and Share Prices (6SR) for...
       INR                                                Interest rates
       INR_NSTD                              Intere

The function [`load_SDMX_data`](https://dkgaraujo.github.io/gingado/utils.html#load_sdmx_data) is a convenience function that downloads data from SDMX sources (and any specific dataflows passed as arguments) if they match the key and parameters set by the user.

In [5]:
#|output: asis
#| echo: false
show_doc(load_SDMX_data)

---

[source](https://github.com/dkgaraujo/gingado/tree/main/blob/main/gingado/utils.py#L102){target="_blank" style="float:right; font-size:smaller"}

### load_SDMX_data

>      load_SDMX_data (sources, keys, params, verbose=True)

Loads datasets from SDMX.

In [41]:
df = load_SDMX_data(sources={'ECB': 'CISS', 'BIS': 'WS_CBPOL_D'}, keys={'FREQ': 'D'}, params={'startPeriod': 2003})

assert type(df) == pd.DataFrame
assert df.shape[0] > 0
assert df.shape[1] > 0

Querying data from ECB's dataflow 'CISS' - Composite Indicator of Systemic Stress...


2022-06-01 01:43:59,553 pandasdmx.reader.sdmxml - INFO: Use supplied dsd=… argument for non–structure-specific message


Querying data from BIS's dataflow 'WS_CBPOL_D' - Policy rates daily...


### To be deprecated

The function [`load_EURFX_data`](https://dkgaraujo.github.io/gingado/utils.html#load_eurfx_data) is a helper function to download a test dataset containing real life data. This dataset was chosen due to the assumption that most users have at least an intuitive understanding of what a foreign exchange is: the price of changing one currency for the other. This example dataset does not imply this data is more or less relevant than others; it is used only for pedagogical purposes.
:::{.callout-note}

This function will be deprecated in `gingado` version 0.0.2.

:::

In [6]:
#|output: asis
#| echo: false
show_doc(load_EURFX_data)

---

[source](https://github.com/dkgaraujo/gingado/tree/main/blob/main/gingado/utils.py#L134){target="_blank" style="float:right; font-size:smaller"}

### load_EURFX_data

>      load_EURFX_data (startYear=2003, lags=1, jump=0,
>                       keep_contemporaneous_X=True)

Loads a real-life dataset for testing use cases.

In [44]:
EUR_FX = load_EURFX_data()

assert type(EUR_FX) == pd.DataFrame
assert EUR_FX.shape[0] > 0
assert EUR_FX.shape[1] > 0

EUR_FX

  EUR_FX = load_EURFX_data()


Unnamed: 0_level_0,AUD,BRL,CAD,CHF,GBP,JPY,SGD,USD,AUD_lag_1,BRL_lag_1,CAD_lag_1,CHF_lag_1,GBP_lag_1,JPY_lag_1,SGD_lag_1,USD_lag_1
TIME_PERIOD,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2003-01-03,1.8440,3.6112,1.6264,1.4555,0.65000,124.56,1.8132,1.0392,1.8554,3.6770,1.6422,1.4528,0.65200,124.40,1.8188,1.0446
2003-01-06,1.8281,3.5145,1.6383,1.4563,0.64950,124.40,1.8210,1.0488,1.8440,3.6112,1.6264,1.4555,0.65000,124.56,1.8132,1.0392
2003-01-07,1.8160,3.5139,1.6257,1.4565,0.64960,124.82,1.8155,1.0425,1.8281,3.5145,1.6383,1.4563,0.64950,124.40,1.8210,1.0488
2003-01-08,1.8132,3.4405,1.6231,1.4586,0.64950,124.90,1.8102,1.0377,1.8160,3.5139,1.6257,1.4565,0.64960,124.82,1.8155,1.0425
2003-01-09,1.8172,3.4915,1.6371,1.4597,0.65300,125.16,1.8244,1.0507,1.8132,3.4405,1.6231,1.4586,0.64950,124.90,1.8102,1.0377
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-05-25,1.5126,5.1736,1.3720,1.0269,0.85295,135.34,1.4676,1.0656,1.5152,5.1793,1.3714,1.0334,0.85750,136.49,1.4722,1.0720
2022-05-26,1.5110,5.1741,1.3715,1.0283,0.85073,135.95,1.4709,1.0697,1.5126,5.1736,1.3720,1.0269,0.85295,135.34,1.4676,1.0656
2022-05-27,1.4995,5.0959,1.3661,1.0258,0.84875,136.05,1.4679,1.0722,1.5110,5.1741,1.3715,1.0283,0.85073,135.95,1.4709,1.0697
2022-05-30,1.4982,5.0629,1.3647,1.0327,0.85150,137.25,1.4719,1.0764,1.4995,5.0959,1.3661,1.0258,0.84875,136.05,1.4679,1.0722
