<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

In [None]:
#| include: false
#skip
! [ -e /content ] && pip install -Uqq gingado nbdev # install or upgrade gingado on colab

In [None]:
#| include: false
#| echo: false
# Code below included to ensure compatibility with scikit-learn v1.1.x
from sklearn import set_config
set_config(display='text')

In [None]:
#| include: false
from nbdev.showdoc import show_doc

## Support for model documentation

:::{.callout-important}

Up until v0.0.1-11, function `get_username` existed. However, it was removed from `utils` since it depended on `pwd`, which is only available to Unix-like systems. Therefore, this [issue](https://github.com/dkgaraujo/gingado/issues/3) was preventing Windows users from importing `gingado.utils`. Since the function was not essential, it was removed until a suitable alternative that works in all major ystems can be found.

:::

In [None]:
#| echo: false
#| output: asis
show_doc(get_datetime)

---

[source](https://github.com/dkgaraujo/gingado/blob/main/gingado/utils.py#L13){target="_blank" style="float:right; font-size:smaller"}

### get_datetime

>      get_datetime ()

Returns the time now

In [None]:
d = get_datetime()
assert isinstance(d, str)
assert len(d) > 0

## Support for time series

Objects of the class [`Lag`](https://dkgaraujo.github.io/gingado/utils.html#lag) are similar to `scikit-learn`'s transformers.

In [None]:
#| echo: false
#| output: asis
show_doc(Lag)

---

[source](https://github.com/dkgaraujo/gingado/blob/main/gingado/utils.py#L24){target="_blank" style="float:right; font-size:smaller"}

### Lag

>      Lag (lags=1, jump=0, keep_contemporaneous_X=False)

A transformer that lags variables

In [None]:
#| echo: false
#| output: asis
show_doc(Lag.fit)

---

[source](https://github.com/dkgaraujo/gingado/blob/main/gingado/utils.py#L31){target="_blank" style="float:right; font-size:smaller"}

### Lag.fit

>      Lag.fit (X:numpy.ndarray, y=None)

Fit the [`Lag`](https://dkgaraujo.github.io/gingado/utils.html#lag) transformer

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| X | ndarray |  | Array-like data of shape (n_samples, n_features) |
| y | NoneType | None | Array-like data of shape (n_samples,) or (n_samples, n_targets) or None |

In [None]:
#| echo: false
#| output: asis
show_doc(Lag.transform)

---

[source](https://github.com/dkgaraujo/gingado/blob/main/gingado/utils.py#L48){target="_blank" style="float:right; font-size:smaller"}

### Lag.transform

>      Lag.transform (X:numpy.ndarray)

Lag the dataset `X`

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| X | ndarray | Array-like data of shape (n_samples, n_features) |

In [None]:
#| echo: false
#| output: asis
show_doc(Lag.fit_transform)

---

### TransformerMixin.fit_transform

>      TransformerMixin.fit_transform (X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to `X` and `y` with optional parameters `fit_params`
and returns a transformed version of `X`.

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| X | array-like of shape (n_samples, n_features) |  | Input samples. |
| y | NoneType | None | Target values (None for unsupervised transformations). |
| fit_params |  |  |  |
| **Returns** | **ndarray array of shape (n_samples, n_features_new)** |  | **Transformed array.** |

The code below demonstrates how [`Lag`](https://dkgaraujo.github.io/gingado/utils.html#lag) works in practice. Note in particular that, because [`Lag`](https://dkgaraujo.github.io/gingado/utils.html#lag) is a transformer, it can be used as part of a `scikit-learn`'s `Pipeline`. 

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

In [None]:
randomX = np.random.rand(15, 2)
randomY = np.random.rand(15)

lags = 3
jump = 2

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('lagger', Lag(lags=lags, jump=jump, keep_contemporaneous_X=False))
]).fit_transform(randomX, randomY)

Below we confirm that the lagger removes the correct number of rows corresponding to the lagged observations:

In [None]:
assert randomX.shape[0] - lags - jump == pipe.shape[0]

And because [`Lag`](https://dkgaraujo.github.io/gingado/utils.html#lag) is a transformer, its parameters (`lags` and `jump`) can be calibrated using hyperparameter tuning to achieve the best performance for a model.

## Support for data augmentation with SDMX

:::{.callout-note}

please note that working with SDMX may take some minutes depending on the amount of information you are downloading.

:::

In [None]:
#| echo: false
#| output: asis
show_doc(list_SDMX_sources)

---

[source](https://github.com/dkgaraujo/gingado/blob/main/gingado/utils.py#L82){target="_blank" style="float:right; font-size:smaller"}

### list_SDMX_sources

>      list_SDMX_sources ()

Fetch the list of SDMX sources

In [None]:
sources = list_SDMX_sources()
print(sources)

assert len(sources) > 0
# all elements are of type 'str'
assert sum([isinstance(src, str) for src in sources]) == len(sources)

['ABS', 'ABS_XML', 'BBK', 'BIS', 'CD2030', 'ECB', 'ESTAT', 'ILO', 'IMF', 'INEGI', 'INSEE', 'ISTAT', 'LSD', 'NB', 'NBB', 'OECD', 'SGR', 'SPC', 'STAT_EE', 'UNICEF', 'UNSD', 'WB', 'WB_WDI']


In [None]:
#| echo: false
#| output: asis
show_doc(list_all_dataflows)

---

[source](https://github.com/dkgaraujo/gingado/blob/main/gingado/utils.py#L91){target="_blank" style="float:right; font-size:smaller"}

### list_all_dataflows

>      list_all_dataflows (codes_only:bool=False, return_pandas:bool=True)

List all SDMX dataflows. Note: When using as a parameter to an [`AugmentSDMX`](https://dkgaraujo.github.io/gingado/augmentation.html#augmentsdmx) object or to the [`load_SDMX_data`](https://dkgaraujo.github.io/gingado/utils.html#load_sdmx_data) function, set `codes_only=True`

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| codes_only | bool | False | Whether to return only the dataflow codes |
| return_pandas | bool | True | Whether to return the result in a pandas DataFrame format |

In [None]:
dflows = list_all_dataflows(return_pandas=False)

assert isinstance(dflows, dict)
all_sources = list_SDMX_sources()
assert len([s for s in dflows.keys() if s in all_sources]) == len(dflows.keys())

2022-09-24 00:42:16,490 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-09-24 00:42:19,769 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-09-24 00:42:20,425 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-09-24 00:42:23,620 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-09-24 00:42:24,561 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>


[`list_all_dataflows`](https://dkgaraujo.github.io/gingado/utils.html#list_all_dataflows) returns by default a pandas Series, facilitating data discovery by users like so:

In [None]:
dflows = list_all_dataflows(return_pandas=True)
assert type(dflows) == pd.core.series.Series

dflows

2022-09-24 00:42:45,132 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-09-24 00:42:45,740 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-09-24 00:42:48,913 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-09-24 00:42:49,596 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>


ABS_XML  ABORIGINAL_POP_PROJ                 Projected population, Aboriginal and Torres St...
         ABORIGINAL_POP_PROJ_REMOTE          Projected population, Aboriginal and Torres St...
         ABS_ABORIGINAL_POPPROJ_INDREGION    Projected population, Aboriginal and Torres St...
         ABS_ACLD_LFSTATUS                   Australian Census Longitudinal Dataset (ACLD):...
         ABS_ACLD_TENURE                     Australian Census Longitudinal Dataset (ACLD):...
                                                                   ...                        
UNSD     DF_UNData_UNFCC                                                       SDMX_GHG_UNDATA
WB       DF_WITS_Tariff_TRAINS                                WITS - UNCTAD TRAINS Tariff Data
         DF_WITS_TradeStats_Development                             WITS TradeStats Devlopment
         DF_WITS_TradeStats_Tariff                                      WITS TradeStats Tariff
         DF_WITS_TradeStats_Trade                 

This format allows for more easily searching `dflows` by source:

In [None]:
list_all_dataflows(codes_only=True, return_pandas=True)

2022-09-24 00:43:13,026 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-09-24 00:43:16,298 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-09-24 00:43:16,909 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-09-24 00:43:20,042 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2022-09-24 00:43:20,888 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>


ABS_XML  0                 ABORIGINAL_POP_PROJ
         1          ABORIGINAL_POP_PROJ_REMOTE
         2    ABS_ABORIGINAL_POPPROJ_INDREGION
         3                   ABS_ACLD_LFSTATUS
         4                     ABS_ACLD_TENURE
                            ...               
UNSD     5                     DF_UNData_UNFCC
WB       0               DF_WITS_Tariff_TRAINS
         1      DF_WITS_TradeStats_Development
         2           DF_WITS_TradeStats_Tariff
         3            DF_WITS_TradeStats_Trade
Name: dataflow, Length: 9548, dtype: object

In [None]:
dflows['BIS']

WS_CBPOL_D                            Policy rates daily
WS_CBPOL_M                          Policy rates monthly
WS_CBS_PUB                      BIS consolidated banking
WS_CREDIT_GAP                     BIS credit-to-GDP gaps
WS_DEBT_SEC2_PUB                     BIS debt securities
WS_DER_OTC_TOV                  OTC derivatives turnover
WS_DSR                            BIS debt service ratio
WS_EER_D              BIS effective exchange rates daily
WS_EER_M            BIS effective exchange rates monthly
WS_GLI                       Global liquidity indicators
WS_LBS_D_PUB                      BIS locational banking
WS_LONG_CPI                     BIS long consumer prices
WS_OTC_DERIV2                OTC derivatives outstanding
WS_SPP              BIS property prices: selected series
WS_TC                    BIS long series on total credit
WS_XRU                   US dollar exchange rates, m,q,a
WS_XRU_D                 US dollar exchange rates, daily
WS_XTD_DERIV                 Ex

Or the user can search dataflows by their human-readable name instead of their code. For example, this is one way to see if any dataflow has information on interest rates:

In [None]:
dflows[dflows.str.contains('Interest rates', case=False)]

ECB    RIR                                         Retail Interest Rates
ESTAT  cpc_ecexint     Candidate countries and potential candidates: ...
       ei_mfir_m                           Interest rates - monthly data
       enpe_irt_st                           Money market interest rates
       enpr_ecexint     ENP countries: exchange rates and interest rates
       irt_st_a                Money market interest rates - annual data
       irt_st_m               Money market interest rates - monthly data
       irt_st_q             Money market interest rates - quarterly data
       tec00034        Short-term interest rates: Day-to-day money rates
       tec00035        Short-term interest rates: three-month interba...
       teimf100                   Day-to-day money market interest rates
IMF    6SR             M&B: Interest Rates and Share Prices (6SR) for...
       INR                                                Interest rates
       INR_NSTD                              Intere

The function [`load_SDMX_data`](https://dkgaraujo.github.io/gingado/utils.html#load_sdmx_data) is a convenience function that downloads data from SDMX sources (and any specific dataflows passed as arguments) if they match the key and parameters set by the user.

In [None]:
#| echo: false
#| output: asis
show_doc(load_SDMX_data)

---

[source](https://github.com/dkgaraujo/gingado/blob/main/gingado/utils.py#L115){target="_blank" style="float:right; font-size:smaller"}

### load_SDMX_data

>      load_SDMX_data (sources:dict, keys:dict, params:dict, verbose:bool=True)

Loads datasets from SDMX.

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| sources | dict |  | A dictionary with the sources and dataflows per source |
| keys | dict |  | The keys to be used in the SDMX query |
| params | dict |  | The parameters to be used in the SDMX query |
| verbose | bool | True | Whether to communicate download steps to the user |

In [None]:
df = load_SDMX_data(sources={'ECB': 'CISS', 'BIS': 'WS_CBPOL_D'}, keys={'FREQ': 'D'}, params={'startPeriod': 2003})

assert type(df) == pd.DataFrame
assert df.shape[0] > 0
assert df.shape[1] > 0

Querying data from ECB's dataflow 'CISS' - Composite Indicator of Systemic Stress...


2022-09-24 00:43:29,219 pandasdmx.reader.sdmxml - INFO: Use supplied dsd=… argument for non–structure-specific message


Querying data from BIS's dataflow 'WS_CBPOL_D' - Policy rates daily...


In [None]:
#| echo: false
import nbdev; nbdev.nbdev_export()