# Models Recipes

In this page, we will show you how to customize your own models. In `carefree-learn`, it is fairly easy to define various kinds of models with three APIs: `register_ml_module` (for [ML models](#ML-Models)), `register_module` (for [Other Models](#Other-Models)) and `register_custom_module` (for [Complex Models](#Complex-Models)).

> You might notice that if you run the blocks with `register_*` calls for more than once, `carefree-learn` will throw a warning which says " '...' has already been registered ", and your changes will have no effect. This is intentional because normally we **DO NOT** want to register anything for more than once.
> 
> However, if you are using some interactive developing tools (e.g. Jupyter Notebook), it is very common to modify the implementations for more than once. In this case, we can set `allow_duplicate=True` in the `register_*` functions to bypass this check. And of course, this should **NEVER** happen in production for safety!

# Table of Content

- [One-Stage Models](#One-Stage-Models)
  - [ML Models](#ML-Models)
    - [Configurations](#Configurations)
  - [Other Models](#Other-Models)
- [Complex Models](#Complex-Models)
- [Appendix](#Appendix)
  - [ML Encodings](#ML-Encodings)
    - [Optimizations](#Optimizations)

> You might also notice that:
> - The class name defined below somehow matches the registered name. This is also not required, since `carefree-learn` only cares about the name that you pass to the `register_*` function, and will not check the actual class name.

# Preparations

In [1]:
import torch
import cflearn

import numpy as np
import torch.nn as nn

from torch import Tensor
from typing import Dict

try:
    from sklearn import metrics
except:
    metrics = None

# One-Stage Models

We will first jump into the typical situation where we need to define one-stage models. The 'one-stage' here means that the training step only contains one optimizer step, so we can focus on how to define the forward pass of our models, and leave `carefree-learn` to handle other stuffs.

> The contrary of 'one-stage' models will be the [Complex Models](#Complex-Models). A typical 'complex' model is the `GAN` models, which in general should perform a generator optimizing step **AND** a discriminator optimizing step in one **SINGLE** training step.

## ML Models

In `carefree-learn`, Machine Learning Models will be slightly different to other models because:
- We have integrated some common data-preprocessing methods into the ML pipeline (e.g. `one_hot` encoding, `embedding`).
- There are some shared arguments that should be used by all ML models: input dimension, output dimension and number of history steps (this is used in timeseries tasks).

Therefore, `carefree-learn` has:
- Wrapped the registered `nn.Module` internally to make it suitable for ML pipeline.
- Introduced three (optional) pre-defined arguments for all ML Models: `input_dim`, `output_dim` and `num_history`.
  - We call it the 'dimension system' of `carefree-learn`.

We will dive into these details in the following sections step by step.

In [2]:
@cflearn.register_ml_module("my_linear0", allow_duplicate=False)
class MyLinear0(nn.Module):
    def __init__(self, input_dim, output_dim):
        super().__init__()
        self.net = nn.Linear(input_dim, output_dim)
    
    def forward(self, net: Tensor) -> Tensor:
        return self.net(net)

And that's it! We can now integrate them into our ML pipeline with the `fit_ml` API:

In [3]:
n       = 100
in_dim  = 5
out_dim = 2

x = np.random.random([n, in_dim])
y = np.random.randint(0, 2, [n, 1])

m = cflearn.api.fit_ml(
    x,
    y,
    core_name="my_linear0",
    output_dim=out_dim,
    is_classification=True,
    # debug setting, indicating that we only train for one step
    fixed_steps=1,
)

Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
MLModel                                                                                                           
  _                                                                                                               
    MyLinear0                                [-1, 5]                                  [-1, 2]                   12
      Linear                                 [-1, 5]                                  [-1, 2]                   12
Total params: 12
Trainable params: 12
Non-trainable params: 0
------------------------------------------------------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
------

- The `output_dim` passed to `fit_ml` will be passed into your model as well.
- The `input_dim` is not provided, and `carefree-learn` will use `x.shape[1]` as `input_dim`.

If the `input_dim` is specified, we will use it regardless of `x.shape[1]`:

In [4]:
@cflearn.register_ml_module("my_linear1", allow_duplicate=False)
class MyLinear1(nn.Module):
    def __init__(self, input_dim, output_dim):
        super().__init__()
        self.net = nn.Linear(input_dim, output_dim)
    
    def forward(self, net: Tensor) -> Tensor:
        # duplicate the input
        return self.net(torch.cat([net, net], dim=1))

m = cflearn.api.fit_ml(
    x,
    y,
    core_name="my_linear1",
    # the input is duplicated, so we need to specify the `input_dim`
    input_dim=5 * 2,
    output_dim=out_dim,
    is_classification=True,
    # debug setting, indicating that we only train for one step
    fixed_steps=1,
)

Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
MLModel                                                                                                           
  _                                                                                                               
    MyLinear1                                [-1, 5]                                  [-1, 2]                   22
      Linear                                [-1, 10]                                  [-1, 2]                   22
Total params: 22
Trainable params: 22
Non-trainable params: 0
------------------------------------------------------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
------

You might notice that in the 'summary' panel shown above, the `MyLinear1` module is 'wrapped' by `MLModel` and `_`. This is what `carefree-learn` does internally to make your model compatible for the ML pipeline.

Here's an example, with `one_hot` encoding and `embedding` considered, to show you why this kind of 'wrapping' is useful and powerful:

In [5]:
from cflearn import MERGED_KEY
from cflearn import ONE_HOT_KEY
from cflearn import EMBEDDING_KEY
from cflearn import NUMERICAL_KEY

@cflearn.register_ml_module("my_linear2", allow_duplicate=False)
class MyLinear2(nn.Module):
    def __init__(self, input_dim, output_dim):
        super().__init__()
        print("> input_dim", input_dim)
        self.net = nn.Linear(input_dim, output_dim)
    
    # notice that we use `batch` here, and the naming is important!
    def forward(self, batch: Dict[str, Tensor]) -> Tensor:
        merged = batch[MERGED_KEY]
        one_hot = batch[ONE_HOT_KEY]
        embedding = batch[EMBEDDING_KEY]
        numerical = batch[NUMERICAL_KEY]
        print()
        print(">>> merged", merged.shape)
        if one_hot is not None:
            print(">>> one_hot", one_hot.shape)
        if embedding is not None:
            print(">>> embedding", embedding.shape)
        if numerical is not None:
            print(">>> numerical", numerical.shape)
        print()
        return self.net(merged)

> As the comment says, the naming of the `forward` argument, `batch`, is important! Because `carefree-learn` will then know that you require the full batch, instead of a single `Tensor`.

The newly defined `my_linear1` model can be used as usual:

In [6]:
m = cflearn.api.fit_ml(
    x,
    y,
    core_name="my_linear2",
    output_dim=out_dim,
    is_classification=True,
    # debug setting, indicating that we only train for one step
    fixed_steps=1,
)

> input_dim 5

>>> merged torch.Size([1, 5])
>>> numerical torch.Size([1, 5])

Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
MLModel                                                                                                           
  _                                                                                                               
    MyLinear2                                                                                                     
      Linear                                 [-1, 5]                                  [-1, 2]                   12
Total params: 12
Trainable params: 12
Non-trainable params: 0
------------------------------------------------------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pas

But the powerful part is that, it can now utilize the encoding methods (`one_hot` / `embedding`) provided by `carefree-learn`:

> We will use some encoding settings in a few following blocks. Please refer to the [Appendix](#ML-Encodings) section for more details.

In [7]:
n                 = 100
in_dim            = 5
one_hot_dim       = 13
embedding_dim     = 7
out_dim           = 2
# some encoding settings. Please refer to the `ML Encodings` section in the `Appendix` section for more details.
one_hot_setting   = dict(dim=one_hot_dim, methods="one_hot")
embedding_setting = dict(dim=embedding_dim, methods="embedding")
encoding_settings = {
    # one hot columns   : [6]
    6: one_hot_setting,
    # embedding columns : [5, 7] 
    5: embedding_setting,
    7: embedding_setting,
}

x = np.hstack([
    np.random.random([n, in_dim]),
    np.random.randint(0, embedding_dim, [n, 1]),
    np.random.randint(0, one_hot_dim, [n, 1]),
    np.random.randint(0, embedding_dim, [n, 1]),
])
y = np.random.randint(0, 2, [n, 1])

m = cflearn.api.fit_ml(
    x,
    y,
    core_name="my_linear2",
    output_dim=out_dim,
    is_classification=True,
    # encoding settings
    encoding_settings=encoding_settings,
    # debug setting, indicating that we only train for one step
    fixed_steps=1,
)

> input_dim 26

>>> merged torch.Size([1, 26])
>>> one_hot torch.Size([1, 13])
>>> embedding torch.Size([1, 8])
>>> numerical torch.Size([1, 5])

Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
MLModel                                                                                                           
  Encoder                                    [-1, 8]                      [[-1, 13], [-1, 8]]                   56
    ModuleDict-1                                                                                                  
      OneHot                                    [-1]                                 [-1, 13]                    0
        Lambda                                  [-1]                                 [-1, 13]                    0
    ModuleDict-0                           

- We don't need to specify the `input_dim`. In this case, `carefree-learn` will use `merged_dim` as `input_dim`.
- `merged_dim` = `one_hot_dim` + `embedding_dim` + `numerical_dim`, because `carefree-learn` will simply concat every kind of inputs together to create the `merged` input.
- In the 'summary' panel, we can find that the `Embedding` module output `4` dimension `Tensor` with `56` trainable params. That's because we have `2` columns for embedding, each has `7` different values, so `56 = 2 * 7 * 4`.

Although it is already very powerful to have access to every part of the inputs, it is still pretty hard to utilize them, because currently we can only get the `merged_dim` in our `__init__` method. `carefree-learn` therefore provides a `dimensions` argument that gives you all you want.

For example, let's implement the famous [Wide & Deep](https://arxiv.org/pdf/1606.07792.pdf)-like model, which feeds the `one_hot` part to a `Linear` model, and feeds the `numerical` and `embedding` part to a `MLP` model:

In [8]:
@cflearn.register_ml_module("my_wide_and_deep", allow_duplicate=True)
class MyWideAndDeep(nn.Module):
    # notice that we use `dimensions` here, and the naming is important!
    def __init__(self, dimensions, output_dim):
        super().__init__()
        print(">", dimensions)
        self.wide = nn.Linear(dimensions.one_hot_dim, output_dim)
        self.deep = nn.Sequential(
            nn.Linear(dimensions.embedding_dim + dimensions.numerical_dim, 128),
            nn.ReLU(),
            nn.Linear(128, output_dim),
        )

    def forward(self, batch: Dict[str, Tensor]) -> Tensor:
        one_hot = batch[ONE_HOT_KEY]
        embedding = batch[EMBEDDING_KEY]
        numerical = batch[NUMERICAL_KEY]
        wide_output = self.wide(one_hot)
        deep_output = self.deep(torch.cat([embedding, numerical], dim=1))
        return wide_output + deep_output

> As the comment says, the naming of the `__init__` argument, `dimensions`, is important! Because `carefree-learn` will then know that you require the full `dimensions` information, instead of a single `input_dim`.

We can run it and see if it works as expected:

In [9]:
m = cflearn.api.fit_ml(
    x,
    y,
    core_name="my_wide_and_deep",
    output_dim=out_dim,
    is_classification=True,
    # encoding settings
    encoding_settings=encoding_settings,
    # debug setting, indicating that we only train for one step
    fixed_steps=1,
)

> Dimensions(
    merged_dim    = 26
    one_hot_dim   = 13
    embedding_dim = 8
    numerical_dim = 5
)
Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
MLModel                                                                                                           
  Encoder                                    [-1, 8]                      [[-1, 13], [-1, 8]]                   56
    ModuleDict-1                                                                                                  
      OneHot                                    [-1]                                 [-1, 13]                    0
        Lambda                                  [-1]                                 [-1, 13]                    0
    ModuleDict-0                                                                   

Bravo! Everything works like a charm! 🥳

### Custom Configurations

So far we've introduced the 'dimension system' of the ML Models in `carefree-learn`, but you might want to know how to use custom hyper-parameters in your own models. For example, to specify `use_bias` in `my_linear`:

In [10]:
@cflearn.register_ml_module("my_linear3", allow_duplicate=False)
class MyLinear3(nn.Module):
    def __init__(self, input_dim, output_dim, *, use_bias: bool):
        super().__init__()
        self.net = nn.Linear(input_dim, output_dim, bias=use_bias)
    
    def forward(self, net):
        return self.net(net)

If you use `my_linear3` directly without specifying configurations, `carefree-learn` will throw an error:

In [11]:
try:
    m = cflearn.api.fit_ml(
        x,
        y,
        core_name="my_linear3",
        output_dim=out_dim,
        is_classification=True,
        # debug setting, indicating that we only train for one step
        fixed_steps=1,
    )
except TypeError as err:
    print(err)

__init__() missing 1 required keyword-only argument: 'use_bias'


As the error indicates, we are missing `use_bias` to initialize the `MyLinear3` module. To fix it, we can add a `core_config` part to the `fit_ml` API:

In [12]:
m = cflearn.api.fit_ml(
    x,
    y,
    core_name="my_linear3",
    # Add This!
    core_config=dict(use_bias=False),
    output_dim=out_dim,
    is_classification=True,
    # debug setting, indicating that we only train for one step
    fixed_steps=1,
)

Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
MLModel                                                                                                           
  _                                                                                                               
    MyLinear3                                [-1, 8]                                  [-1, 2]                   16
      Linear                                 [-1, 8]                                  [-1, 2]                   16
Total params: 16
Trainable params: 16
Non-trainable params: 0
------------------------------------------------------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
------

As shown above, the trainable parameters of `Linear` is only `16`, which means the `bias` has indeed been set to `False`.

> We put `use_bias` after a `*` to make it a keyword-only argument. It is not forced to do so, but in general it's recommended because:
> - It will make your module easier to understand when it is used by others.
> - It can separate the `carefree-learn`'s 'dimension system' from your own custom configurations.

## Other Models

Besides ML Models, customizing other models (e.g. CV models) with `register_module` API is almost the same as writing custom `nn.Module`. For example, let's build a simple image classification model from scratch:

In [13]:
@cflearn.register_module("my_image_classifier", allow_duplicate=True)
class MyImageClassifier(nn.Module):
    def __init__(self, in_channels, img_size, output_dim):
        super().__init__()
        flat_dim = in_channels * img_size ** 2
        self.net = nn.Sequential(
            nn.Linear(flat_dim, 128),
            nn.ReLU(),
            nn.Linear(128, output_dim),
        )
    
    def forward(self, net):
        # flatten the input first
        net = net.view(net.shape[0], -1)
        return self.net(net)

Unlike ML Models, for any other models, `carefree-learn` will not 'inject' any pre-defined arguments to your `nn.Module`, so it will be safe & clean! 😉

> In fact, even for ML Models, you can ignore the 'dimension system' completely! Just avoid using names like `in_dim` / `input_dim` / `out_dim` / `output_dim` and everything will be fine.

We can play around with the `my_image_classifier` model on the famous `MNIST` dataset with `fit_cv` API:

In [14]:
# get MNIST data with `cflearn`'s predefined API
data = cflearn.cv.MNISTData(batch_size=2, transform="to_tensor")
# use `fit_cv` API for training
cflearn.api.fit_cv(
    # This first argument passed to `fit_cv` is complicated and hard to explain briefly
    # So we will cover its details in another article (the `Data Recipes`)
    data,
    # this is the name of your model
    model_name="my_image_classifier",
    # these (and only these) settings will go into your model's __init__ method
    model_config={"in_channels": 1, "img_size": 28, "output_dim": 10},
    # these are some training settings
    loss_name="cross_entropy",
    metric_names="acc",
    # debug setting, indicating that we only use a small portion of data to do validation
    valid_portion=1.0e-5,
    # debug setting, indicating that we only train for one step
    fixed_steps=1,
)

Layer (type)                             Input Shape                             Output Shape    Trainable Param #
------------------------------------------------------------------------------------------------------------------------
_                                                                                                                 
  MyImageClassifier                  [-1, 1, 28, 28]                                 [-1, 10]              101,770
    Sequential                             [-1, 784]                                 [-1, 10]              101,770
      Linear-0                             [-1, 784]                                [-1, 128]              100,480
      ReLU                                 [-1, 128]                                [-1, 128]                    0
      Linear-1                             [-1, 128]                                 [-1, 10]                1,290
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
--

<cflearn.api.cv.pipeline.CarefreePipeline at 0x21a6fcc2940>

You might notice that the naming is slightly different from ML Models:
- `core_name` -> `model_name`
- `core_config` -> `model_config`

This is because ML Models will 'wrap' customized models under the `MLModel`, which means they serve as the `core` of `MLModel`. That's why we use `core_name` & `core_config`. But for other situations, the customized models will be left as-is, so we use `model_name` & `model_config`.

# Complex Models

# Appendix

## ML Encodings

What makes Machine Learning tasks different from other tasks is that some of the data-preprocessing methods are 'trainable' (e.g. embedding). In this case, we need to integrate these methods into our models rather than simply put them in a separate place.

It is OK to implement these methods in our custom models every time when we need them, but that will cause a **LOT** of boilerplate codes. `carefree-learn` therefore extracted them into an `Encoder` module, and exposed its settings in the APIs for you to utilize it easily.

The definitions related to the `Encoder` are pretty simple:

```python
class EncodingSettings(NamedTuple):
    dim: int
    methods: Union[str, List[str]] = "embedding"
    method_configs: Optional[Dict[str, Any]] = None

class Encoder(nn.Module):
    def __init__(
        self,
        settings: Dict[int, EncodingSettings],
        *,
        # there are a few more kwargs here, which will be covered in the `Optimizations` section
        ...
    ):
        ...
```

The `settings` of the `Encoder` is what we should mainly pay attention to: it's a mapping that maps the column index to its corresponding `EncodingSettings`.

For example, if:
- Our input features, `x`, has 10 columns, then the column index should be [0, 1, 2, ..., 9]
- The `0`th, `5`th & `7`th column are categorical columns, and they have `5`, `7` & `11` unique values respectively.
- We want to apply `one_hot` to the `0`th & `5`th column.
- We want to apply `embedding` to the `5`th & `7`th column.

Then the corresponding setup should be:

In [15]:
from cflearn.models.ml.encoders import Encoder
from cflearn.models.ml.encoders import EncodingSettings

Encoder(
    {
        0: EncodingSettings(5),
        5: EncodingSettings(7, ["one_hot", "embedding"]),
        7: EncodingSettings(11, "one_hot"),
    },
)

Encoder(
  (embeddings): ModuleDict(
    (-1): Embedding(
      (core): Lambda(embedding: 12 -> 4)
    )
  )
  (one_hot_encoders): ModuleDict(
    (5): OneHot(
      (core): Lambda(one_hot_7)
    )
    (7): OneHot(
      (core): Lambda(one_hot_11)
    )
  )
  (embedding_dropout): Dropout(p=0.2, inplace=False)
)

You might notice that the `one_hot_encoders` part is pretty straight forward: we initialized a `ModuleDict` which mapped the column index into its corresponding `OneHot` encoder, and the dimension matches exactly to the number of unique values. However, the `embeddings` part is a little weird: we only initialized one `Embedding` module, and the dimension is `12`, which is exactly `5+7` - the sum of the number of unique values.

This is due to a special mechanism in `carefree-learn` - the `fast_embedding` mechanism. We will cover the details in the [next section](#Optimizations), for now let's just see how to disable this mechanism and make our `Encoder` looks 'normal':

In [16]:
Encoder(
    {
        0: EncodingSettings(5),
        5: EncodingSettings(7, ["one_hot", "embedding"]),
        7: EncodingSettings(11, "one_hot"),
    },
    config={
        "use_fast_embedding": False,
    }
)

Encoder(
  (embeddings): ModuleDict(
    (0): Embedding(
      (core): Lambda(embedding: 5 -> 4)
    )
    (5): Embedding(
      (core): Lambda(embedding: 7 -> 4)
    )
  )
  (one_hot_encoders): ModuleDict(
    (5): OneHot(
      (core): Lambda(one_hot_7)
    )
    (7): OneHot(
      (core): Lambda(one_hot_11)
    )
  )
  (embedding_dropout): Dropout(p=0.2, inplace=False)
)

Great! Now the `embedding` part looks exactly the same as the `one_hot_encoders` part: we initialized a `ModuleDict` which mapped the column index into its corresponding `Embedding` encoder, and the dimension matches exactly to the number of unique values.

### Optimizations