# Data Properties

In [None]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

import my_nb_path  # isort: skip
import os

from IPython.display import Markdown

import a2rl as wi
from a2rl.nbtools import pprint, print  # Enable color outputs when rich is installed.
from a2rl.utils import (
    NotMDPDataError,
    assert_mdp,
    data_generator_gym,
    data_generator_simple,
    plot_information,
)


For many sequential decision making problems we look for some key patterns in the data

* Markov property

* A consistent reward or cost

* Actions being effective in contributing to the reward or affecting the Environment

* Seeing if there is a consistent way that actions are picked


We have a few helper visualisations to help these are markovian_matrix and normalized_markovian_matrix



## Data Inspection

In the offline setting we are restricted only to data. `whatif` offers three ways to generate some:

1. The load-and-discretize workflow <- The main one. See `discretized_sample_dataset()`.

2. `data_generator_gym` to load data interations between a trained agent and a gym environment <- This is for testing and research

3. `data_generator_simple` to generate sample data with different properties <- Also for testing and research


## Helper Functions

In [None]:
def discretized_sample_dataset(dataset_name: str, n_bins=50) -> wi.WiDataFrame:
    """Discretized a sample dataset.

    Args:
        dataset_name: name of the sample dataset.

    Returns:
        Whatif dataframe.

    See Also
    --------
    list_sample_datasets
    """
    dirname = wi.sample_dataset_path(dataset_name)
    tokeniser = wi.DiscreteTokenizer(n_bins=n_bins)
    df = tokeniser.fit_transform(wi.read_csv_dataset(dirname))
    return df

In [None]:
lags = 10  # Same as assert_mdp()'s default.

################################################################################
# To run in fast mode, set env var NOTEBOOK_FAST_RUN=1 prior to starting Jupyter
################################################################################
if os.environ.get("NOTEBOOK_FAST_RUN", "0") != "0":
    lags = 5
    display(
        Markdown(
            '<p style="color:firebrick; background-color:yellow; font-weight:bold">'
            "NOTE: notebook runs in fast mode. Use only 5 lags. Results may differ."
        )
    )
################################################################################

### Synthetic Data


Create Markov property and then add random actions (random policy) that affect the states. 

In [None]:
offline_data = data_generator_simple(
    markov_order=1,
    reward_function=False,
    action_effect=True,
    policy=False,
)

try:
    assert_mdp(offline_data, lags=lags)
except NotMDPDataError as e:
    print("Continue this example despite MDP check errors:\n", e)

plot_information(offline_data, lags=lags);

Use higher order Markov property and effective actions, and add a reward function that is related to the state and action

In [None]:
offline_data = data_generator_simple(
    markov_order=2,
    reward_function=True,
    action_effect=True,
    policy=False,
)

try:
    assert_mdp(offline_data, lags=lags)
except NotMDPDataError as e:
    print("Continue this example despite MDP check errors:\n", e)

plot_information(offline_data, lags=lags);

### OpenAi gym environment with known MDP

Use an agent that is not trained very much on Taxi dataset and see how's the data looks like.

In [None]:
%%time
from stable_baselines3 import DQN

gym_data = data_generator_gym(
    env_name="Taxi-v3",
    trainer=DQN,
    training_steps=10000,
    capture_steps=100,
)

try:
    assert_mdp(offline_data, lags=lags)
except NotMDPDataError as e:
    print("Continue this example despite MDP check errors:\n", e)

plot_information(gym_data, lags=lags);

### Chiller Data

In [None]:
%%time

df_chiller = discretized_sample_dataset("chiller", n_bins=10)
try:
    assert_mdp(df_chiller, lags=lags)
except NotMDPDataError as e:
    print("Continue this example despite MDP check errors:\n", e)

plot_information(df_chiller, lags=lags);