# Методы исследования характеристических свойств нейронных сетей с применением теоретико-игрового подхода

- **Теория**: И.В.Томилов ivan-tomilov3@yandex.ru
- **Реализация**: М.А.Зайцева maria@tail-call.ru
- **Поддержка CUDA**: А.Е.Григорьева admin@linkennt.ru
- **Ревизия**: 9

- **Другие ревизии**:
  - С 1 по 7: [Яндекс Диск](https://disk.yandex.ru/d/aZozDpBlzh_z1A)
  - 8 и далее: [GitHub releases page](https://github.com/LISA-ITMO/CGT4NN/releases)
<!-- please do not append text into this block -->

## TODO

> Gradient dropout - не все игроки не всегда получают выигрыш

- [ ] Больше скрытых слоёв!
    - [ ] В скрытых слоях 100-200 нейронов

Для этого мы вводим класс `AugmentedReLUNetworkMultilayer`, который расширяет
интерфейс `AugmentedReLUNetwork` параметром `hidden_layers_count`.

- [ ] Взять датасет с плохим перфомансом
- [ ] Применять gradient dropout к половине сети и ко всей сети
- [ ] Варьировать шум от 0 до 1 (а не 2)
- [x] $\alpha \in \{ 1, 1.12, 2 \}$
- [ ] На задаче классификации заменять класс $q$ на класс $r$ с заданной вероятностью, а вещественный шум использовать только для регрессии.
- [ ] Посмотреть, какой архитектурой решали MNIST character recognition на полносвязных сетях
- [ ] Использовать sMAPE вместо $R^2$
- [ ] Использовать инициализацию *He* вместо *Xavier*
  - [ ] Сравнить



## Setup

In [1]:
from cgtnnlib.constants import LEARNING_RATE, RANDOM_STATE
import cgtnnlib.training as tr
import cgtnnlib.datasets as ds
from cgtnnlib.NoiseGenerator import target_dispersion_scaled_noise, stable_noise, no_noise_generator

iterations = 10
epochs = 10
pp = [0.0, 0.5, 0.9]
inner_layer_size = 150

# datasets = ds.datasets

datasets = [
    # ds.datasets[0], # 1
    ds.datasets['StudentPerformanceFactors'], # 3
    # ds.datasets['allhyper'], # 4
    # ds.datasets['wine_quality'], # 6
]

ng_makers = [
    lambda _: no_noise_generator,
    lambda dataset: target_dispersion_scaled_noise(
        dataset=dataset,
        factor=0.03,
        random_seed=RANDOM_STATE + 1,
    ),
    lambda dataset: stable_noise(
        dataset=dataset,
        factor=0.03,
        alpha=1,
        beta=0,
    ),
    lambda dataset: stable_noise(
        dataset=dataset,
        factor=0.03,
        alpha=1.12,
        beta=0,
    ),
    lambda dataset: stable_noise(
        dataset=dataset,
        factor=0.03,
        alpha=2.0,
        beta=1,
    ),
]

TORCH_DEVICE is cpu


## Training

### Model B

- both take ~5m 06s to train 10 iterations
- both on both noises: about ~10m

<hr>

- `ds.datasets['StudentPerformanceFactors']` takes ~2m 30s to train 10 iterations
- `ds.datasets['wine_quality']` takes ~2m 36s to train 10 iterations

<hr>

- `ds.datasets['allhyper']` takes ~36m to train on all noise generators

### Model B*

~20m: Dataset #3, 5 ng_makers, 3 pps

In [2]:
import os

from cgtnnlib.Report import Report
from cgtnnlib.nn.AugmentedReLUNetworkMultilayer import AugmentedReLUNetworkMultilayer

for i, dataset in enumerate(datasets):
    for ng_maker in ng_makers:
        for p in pp:
            noise_generator = ng_maker(dataset)
            for iteration in range(iterations):
                os.makedirs(f'rev9/dataset{dataset.number}_p{p}_noise{noise_generator.name}/', exist_ok=True)
                report = Report(
                    dir='rev9',
                    filename=f'dataset{dataset.number}_p{p}_noise{noise_generator.name}/report.json'
                )
                tr.super_train_model(
                    make_model=lambda: AugmentedReLUNetworkMultilayer(
                        inputs_count=dataset.features_count,
                        outputs_count=dataset.classes_count,
                        p=p,
                        inner_layer_size=inner_layer_size,
                        hidden_layers_count=3,
                    ),
                    model_path=f'rev9/dataset{dataset.number}_p{p}_noise{noise_generator.name}/{iteration}.pth',
                    dataset=dataset,
                    report=report,
                    epochs=epochs,
                    learning_rate=LEARNING_RATE,
                    dry_run=False,
                    iteration=iteration,
                    noise_generator=noise_generator,
                )

N=9 #3 gStable3A2.0B1F0.03 p=0.9 E9/10 S436 Loss=0.5415@AugmentedReLUNetworkMultilayer
create_and_train_model(): saved model to rev9/dataset3_p0.9_noiseStable3A2.0B1F0.03/9.pth
Report saved to rev9/dataset3_p0.9_noiseStable3A2.0B1F0.03/report.json.


```python
IndexError: Target 106 is out of bounds.

        TORCH_CHECK_INDEX(
            cur_target >= 0 && cur_target < n_classes,
            "Target ",
            cur_target,
            " is out of bounds.");
```

https://github.com/pytorch/pytorch/blob/4106aa33eb2946bf67f4ffd2ad9f2dcb52ed2384/aten/src/ATen/native/LossNLL.cpp#L192

## Evaluation & Analysis

Выполнить эту ячейку для eval или следующую для loss.
Чтобы всё отрисовалось, нужно запустить два раза

In [None]:
## Analysis

import json

import pandas as pd

import matplotlib.pyplot as plt

from cgtnnlib.LearningTask import is_classification_task
from cgtnnlib.analyze import plot_deviant_curves_on_ax_or_plt
from cgtnnlib.constants import NOISE_FACTORS
from cgtnnlib.evaluate import eval_report_at_path
from cgtnnlib.nn.AugmentedReLUNetwork import AugmentedReLUNetwork



def read_json(path: str) -> dict:
    with open(path) as file:
        return json.load(file)

def summarize_series_list(series_list: list[pd.Series]):
    df = pd.DataFrame(series_list).T

    summary_df = pd.DataFrame({
        0.25: df.quantile(0.25, axis=1),
        0.75: df.quantile(0.75, axis=1),
        'mean': df.mean(axis=1),
    })

    return summary_df


def make_ax_drawer(
    read_json,
    dataset,
    ng_maker,
    p,
):
    noise_generator = ng_maker(dataset)
    prefix = (
        f'cgtnn-{dataset.number}Y-AugmentedReLUNetwork'
        +f'-g{noise_generator.name}-P{p}_'
    )

    def report_path(n):
        return f'pth/{prefix}N{n}.json'

    def model_path(n):
        return f'pth/{prefix}N{n}.pth'

    def read_eval_from_iteration(n) -> pd.DataFrame:
        path = report_path(n)
        eval_report_at_path(
                    report_path=path,
                    model_path=model_path(n),
                    constructor=AugmentedReLUNetwork,
                    dataset=dataset,
                    p=p,
                )
        print('read_eval_from_iteration', path, n)
        return pd.DataFrame(read_json(path)['eval'])
    
    def read_loss_from_iteration(n) -> pd.DataFrame:
        path = report_path(n)
        json = read_json(path)
        return pd.DataFrame(json['loss'])

    if is_classification_task(dataset.learning_task):
        metric = 'roc_auc'
    else:
        metric = 'mse'

    files = [
        read_eval_from_iteration(n)
        for n in range(iterations)
    ]
            
    print(report_path(0))

    curve = summarize_series_list([file[metric] for file in files])

    def draw_ax(ax):
        plot_deviant_curves_on_ax_or_plt(
            ax_or_plt=ax,
            models=[{
                'curve': curve,
                'color': 'purple',
                'label': 'Среднее',
                'quantiles_color': 'pink',
                'quantiles_label': 'Квартили 0,25; 0,75', 
            }],
            X=NOISE_FACTORS,
            title='\n'.join([
                f'{noise_generator.name}, p = {p}',
            ]),
            xlabel='Шум на входе',
            ylabel=metric,
            quantiles_alpha=0.5,
        )
    
    return draw_ax

ax_drawers = [
    [
        [make_ax_drawer(read_json, dataset, ng_maker, p) for p in pp]
        for ng_maker in ng_makers
    ]
    for dataset in datasets
]


In [None]:
## Analysis

import json

import pandas as pd

import matplotlib.pyplot as plt

from cgtnnlib.LearningTask import is_classification_task
from cgtnnlib.analyze import plot_deviant_curves_on_ax_or_plt
from cgtnnlib.constants import NOISE_FACTORS
from cgtnnlib.evaluate import eval_report_at_path
from cgtnnlib.nn.AugmentedReLUNetwork import AugmentedReLUNetwork



def read_json(path: str) -> dict:
    with open(path) as file:
        return json.load(file)

def summarize_series_list(series_list: list[pd.Series]):
    df = pd.DataFrame(series_list).T

    summary_df = pd.DataFrame({
        0.25: df.quantile(0.25, axis=1),
        0.75: df.quantile(0.75, axis=1),
        'mean': df.mean(axis=1),
    })

    return summary_df


def make_ax_drawer(
    read_json,
    dataset,
    ng_maker,
    p,
):
    noise_generator = ng_maker(dataset)
    prefix = (
        f'cgtnn-{dataset.number}Y-AugmentedReLUNetwork'
        +f'-g{noise_generator.name}-P{p}_'
    )

    def report_path(n):
        return f'pth/{prefix}N{n}.json'

    def model_path(n):
        return f'pth/{prefix}N{n}.pth'

    def read_eval_from_iteration(n) -> pd.DataFrame:
        path = report_path(n)
        eval_report_at_path(
                    report_path=path,
                    model_path=model_path(n),
                    constructor=AugmentedReLUNetwork,
                    dataset=dataset,
                    p=p,
                )
        print('read_eval_from_iteration', path, n)
        return pd.DataFrame(read_json(path)['eval'])
    
    def read_loss_from_iteration(n) -> pd.DataFrame:
        path = report_path(n)
        json = read_json(path)
        return pd.DataFrame({ 'loss': json['loss'] })

    metric = 'loss'

    files = [
        read_loss_from_iteration(n)
        for n in range(iterations)
    ]
            
    print(f'Processing {report_path(0)}...')

    curve = summarize_series_list([file[metric] for file in files])

    def draw_ax(ax):
        plot_deviant_curves_on_ax_or_plt(
            ax_or_plt=ax,
            models=[{
                'curve': curve,
                'color': 'purple',
                'label': 'Среднее',
                'quantiles_color': 'pink',
                'quantiles_label': 'Квартили 0,25; 0,75', 
            }],
            X=curve.index,
            title='\n'.join([
                f'{noise_generator.name}, p = {p}',
            ]),
            xlabel='Итерация',
            ylabel=metric,
            quantiles_alpha=0.5,
        )
    
    return draw_ax

ax_drawers = [
    [
        [make_ax_drawer(read_json, dataset, ng_maker, p) for p in pp]
        for ng_maker in ng_makers
    ]
    for dataset in datasets
]

print(ax_drawers)

In [None]:
(nrows, ncols) = (
    len(ax_drawers[0]),
    len(ax_drawers[0][0]),
)

(nrows, ncols)

In [None]:
for i, dataset in enumerate(datasets):
    fig, axes = plt.subplots(nrows, ncols, figsize=(15, 40))

    for j in range(nrows):
        for k in range(ncols):
            ax_drawers[i][j][k](axes[j, k])
    
    fig.suptitle(f'Датасет #{dataset.number}: {dataset.name}\nAugmentedReLUNetwork', fontsize=16)
    plt.tight_layout(rect=[0, 0.01, 1, 0.95]) # rect adjusts space for suptitle
    plt.show()

## References

1. Chambers, J. M., Mallows, C. L., & Stuck, B. W. (1976). A method for simulating stable random variables. *Journal of the American Statistical Association*, *71*(354), 340-344.
2. M. Firouzi, A. Mohammadpour. A Survey on Simulating Stable Random Variables. URL: https://www.semanticscholar.org/reader/11a1e93642dc0a5c94e6906bcca5e4d25d4e9d46