# Config File Structure

To enable debug features, create a `config.yaml` file to specify the desired behavior, such as determining which GEMMs (General Matrix Multiply operations) should run in higher precision rather than FP8 and defining which statistics to log. Below, we outline how to structure the `config.yaml` file.

## General Format

A config file can have one or more sections, each containing settings for specific layers and features:

```yaml
section_name_1:
  enabled: ...
  layers:
    # Specify layers here...
  transformer_engine:
    Feature1Name:
      enabled: ...
      # Feature details...
    Feature2Name:
      enabled: ...
      # Feature details...

section_name_2:
  enabled: ...
  layers:
    # Specify layers here...
  transformer_engine:
    Feature1Name: # If feature has no namespace, then it is in default namespace.
      enabled: ...
      # Feature details...

section_name_3:
  enabled: ...
  layers:
    # Specify layers here...
  transformer_engine:
    Feature1Name:
      enabled: ...
      # Feature details...
    Feature2Name:
      enabled: ...
      # Feature details...
```

Each section can have any name and must contain:

1. An `enabled` field that specifies whether the features in that section will be active.
2. A `layers` field specifying which layers the section applies to. Each layer can belong to only one section.
3. Additional fields describing features for those layers.

## Layer Specification

Debug layers can be identified by a `debug_name` parameter:

```python
linear = transformer_engine.debug.pytorch.Linear(in_features, out_features, debug_name="linear1")
```

This name is used in the config file to identify the layer. To specify the `layers` field, you can use one of the following methods:

1. **`layer_name_regex_pattern`**: Use a regex to match layer names.
2. **`layer_types`**: Provide a list of strings, where a layer will be selected if any string matches part of its name.

Examples:

```yaml
# Example 1: Using regex to select layers
my_section:
  enabled: ...
  layers:
    layer_name_regex_pattern: 'self_attn.*'
  transformer_engine:
    (...)

# Example 2: Using layer type to select layers
another_section:
  enabled: ...
  layers:
    layer_types: ['fc1', 'layernorm_linear']
  transformer_engine:
    (...)
```

## Names in Transformer Layers

The `TransformerLayer` in Transformer Engine is a composition of multiple sub-layers. We can modify some of these layers using precision debug tools, particularly those that contain exactly one linear layer. To see the names of all such layers, we can inspect log files. For instance, a `TransformerLayer` named `transformer_layer` might consist of:

- `transformer_layer.self_attn.layernorm_linear_qkv` / `transformer_layer.self_attn.linear_qkv` / `transformer_layer.self_attn.layernorm_linear_q` / `transformer_layer.self_attn.linear_q` / `transformer_layer.self_attn.linear_kv`,
- `transformer_layer.self_attn.proj`,
- `transformer_layer.inter_attn.*` for `layer_type="decoder"`,
- `transformer_layer.layernorm_mlp.fc1`,
- `transformer_layer.layernorm_mlp.fc2`,

depending on the configuration. Some layers, like `LayerNormLinear`, are fusions of two layers: `LayerNorm` and `Linear`. When referring to such layers in precision debug tools, only the `Linear` part is affected.

Below is an example `TransformerLayer` with four linear layers that can be influenced by the precision debug tools.

<figure align="center">
<img src="./img/names.svg" style="width:50%">
<figcaption> Fig 1: Names of layers in an example configuration of TransformerLayer. The most nested blocks represent the most basic layers, each containing one linear layer. Layers that do not contain linear layers, such as `DotProductAttention`, are omitted. </figcaption>
</figure>

**Configuration File Example**

```yaml
# Disables wgrad in all 4 GEMMs
section1:
  enabled: True
  layers:
    layer_types: [transformer_layer]
  transformer_engine:
    DisableFp8Gemm:
      enabled: True
      gemms: [wgrad]

# Disables all GEMMs in layernorm_mlp layer
section2:
  enabled: True
  layers:
    layer_types: [layernorm_mlp]
  transformer_engine:
    DisableFp8Layer:
      enabled: True
  
# Logs wgrad stats in fc1
section3:
  enabled: True
  layers:
    layer_types: [fc1]
  transformer_engine:
    LogTensorStats:
      enabled: True
      stats: [min]
      tensors: [wgrad]
      freq: 1
      start_step: 0
      end_step: 50
```


## Gemms_struct and tensors_struct

Sometimes a feature is parameterized by a list of tensors or by a list of GEMMs.
There are multiple ways of describing this parametrization.

We can pass lists, as below.
```yaml
Feature:
  enabled: ...
  gemms: [gemm1, gemm2]
  tensors: [tensor1, tensor2]
  ...
```

We can use struct for tensors.
```yaml
Feature:
  gemms: [gemm1, gemm2]
  tensors_struct:
  - tensor: tensor1
    feature_param1: value
  - tensor: tensor2
    feature_param1: value
  gemm_feature_param1: value
```

Similarly, we can use struct for GEMMs.

<div class="alert alert-info">

<b>Warning</b>

If we want to use structs both for tensors and GEMMs,
tensors_struct should be inside GEMMs_struct.

</div>


```yaml 
Feature:
  enabled: ...
  gemms_struct:
    - gemm: gemm1
      tensors: [tensor1, tensor2]
      tensor_feature_param1: value
      gemm_feature_param1: value
    - gemm: gemm2
      tensors_struct:
      - tensor: tensor1
        tensor_feature_param1: value
      - tensor: tensor2
        tensor_feature_param2: value
      gemm_feature_param1: value
```

## Enabling or Disabling Sections and Features

Debug features can be enabled or disabled with the `enabled` keyword:

```yaml
section1:
  enabled: True
  layers:
    layer_types: [self_attention]
  transformer_engine:
    LogTensorStats:
      enabled: False # Disables the LogTensorStats feature
      stats: [max, min, mean, std, l1_norm]

section2:
  enabled: False # Disables entire section2
  transformer_engine:
    LogFp8TensorStats:
      enabled: True
      stats: [underflows, overflows]
```

By organizing your `config.yaml` properly, you can easily manage debugging features, ensuring a more streamlined and customizable debugging experience.

