# Tabula Rasa

---

---

### Overview

This notebook trains a mixed monotonic model, with sub-models to generate arbitrary quantile predictions and estimate epistemic uncertainty, using `TabulaRasaRegressor()`.

It's designed to work with Pandas DataFrame's and takes advantage of class types and feature names to cut down on code.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tabularasa.TabulaRasa import TabulaRasaRegressor

---

### Load example data

If you haven't already, please generate the example dataset using the [example_data](example_data.ipynb) notebook.

In [2]:
df = pd.read_pickle('./data/simple_train.pkl')

In [3]:
df.dtypes

y     float32
x1    float32
x2    float32
x3      int64
x4    float32
dtype: object

Let's convert `x3` to a `category` data type to generate embeddings for it (`TabulaRasaRegressor()` automatically handles this for all columns with `object` or `category` data types).

In [4]:
df['x3'] = df['x3'].astype('category')

---

### Initialize model

When initializing the model, we typically pass 3 arguments:
- `df`: A `pandas.DataFrame` containing the training data, or a sample of it.  No training happens on initialization, just categorizing features, setting up categorical feature mappings, and scalers for numeric features.  Therefore, if it is a sample of the full dataset, it should well represent your full dataset (in terms of having unique values for each categorical feature, and distributions for continuous features).
- `targets`: A `list` of column names to use as regressand(s) which are in `df`.  All other 'number', 'category', or 'object' columns in `df` are assumed to be features and will be included in the models. 
- `monotonic_constraints`: A `dict` where keys are features (column names on `df`) to take on monotonic relationships with the `targets` and values are 1 or -1 to signify the direction of that relationship: increasing or decreasing (respectively).

In [5]:
trr = TabulaRasaRegressor(df,
                          targets=['y'],
                          monotonic_constraints={'x1': 1, 'x2': 1})

In [6]:
trr.fit(df)

*** Training expectation model ***
  epoch    train_loss    valid_loss     dur
-------  ------------  ------------  ------
      1      [36m511.3724[0m        [32m1.2401[0m  1.6692
      2        [36m2.2265[0m        2.4904  1.5025
      3        [36m1.8191[0m        1.4249  1.5021
      4        [36m1.2452[0m        [32m0.9940[0m  1.4847
      5        [36m1.0025[0m        [32m0.9934[0m  1.4985
      6        1.0071        1.0043  1.5175
      7        [36m1.0012[0m        [32m0.9929[0m  1.5700
      8        [36m0.9996[0m        1.0122  1.4878
      9        1.0146        0.9947  1.5187
     10        1.0077        1.0039  1.5119
     11        1.0051        0.9939  1.5087
     12        1.0035        0.9955  1.4988
     13        1.0028        0.9946  1.4965
     14        1.0021        1.0005  1.4991
     15        1.0029        [32m0.9928[0m  1.4858
     16        1.0033        0.9959  1.4932
     17        1.0023        0.9995  1.4942
     18        1.0053

    111        [36m1.6334[0m        [32m1.6326[0m  0.0386
    112        [36m1.6254[0m        [32m1.6225[0m  0.0492
    113        [36m1.6026[0m        [32m1.5834[0m  0.0469
    114        [36m1.5619[0m        [32m1.5536[0m  0.0440
    115        [36m1.5368[0m        [32m1.5460[0m  0.0458
    116        [36m1.5225[0m        [32m1.5262[0m  0.0427
    117        [36m1.4981[0m        [32m1.4897[0m  0.0445
    118        [36m1.4536[0m        [32m1.4543[0m  0.0448
    119        [36m1.4196[0m        1.4736  0.0514
    120        1.4253        [32m1.4154[0m  0.0487
    121        [36m1.3837[0m        [32m1.3886[0m  0.0471
    122        [36m1.3614[0m        [32m1.3686[0m  0.0477
    123        [36m1.3373[0m        [32m1.3503[0m  0.0446
    124        [36m1.3351[0m        [32m1.3375[0m  0.0436
    125        [36m1.3081[0m        [32m1.3062[0m  0.0483
    126        [36m1.2778[0m        [32m1.2903[0m  0.0447
    127        [36m1.2585

In [7]:
trr.predict(df)

array([[-0.02938868],
       [-0.02938868],
       [-0.02938868],
       ...,
       [-0.02938868],
       [-0.02938868],
       [-0.02938868]], dtype=float32)

# TODO: Something is obviously going wrong, but at least I'm getting through it all.