# Основы глубинного обучения, майнор ИАД

## Домашнее задание 1. Введение в PyTorch. Полносвязные нейронные сети.

### Общая информация

Дата выдачи: 06.10.2021

Мягкий дедлайн: 23:59MSK 25.10.2021

Жесткий дедлайн: 23:59MSK 28.10.2021

### Оценивание и штрафы
Максимально допустимая оценка за работу — 10 баллов. За каждый день просрочки снимается 1 балл. Сдавать задание после жёсткого дедлайна сдачи нельзя.

Задание выполняется самостоятельно. «Похожие» решения считаются плагиатом и все задействованные студенты (в том числе те, у кого списали) не могут получить за него больше 0 баллов. Если вы нашли решение какого-то из заданий (или его часть) в открытом источнике, необходимо указать ссылку на этот источник в отдельном блоке в конце вашей работы (скорее всего вы будете не единственным, кто это нашел, поэтому чтобы исключить подозрение в плагиате, необходима ссылка на источник).

Неэффективная реализация кода может негативно отразиться на оценке.
Также оценка может быть снижена за плохо читаемый код и плохо оформленные графики. Все ответы должны сопровождаться кодом или комментариями о том, как они были получены.

### О задании

В этом задании вам предстоит предсказывать год выпуска песни по некоторым звуковым признакам: [данные](https://archive.ics.uci.edu/ml/datasets/yearpredictionmsd). В ячейках ниже находится код для загрузки данных. Обратите внимание, что обучающая и тестовая выборки располагаются в одном файле, поэтому НЕ меняйте ячейку, в которой производится деление данных.

# Инициализация, загрузка данных 👾

In [170]:
# %%capture
!pip install wandb

!wget -O data.txt.zip https://archive.ics.uci.edu/ml/machine-learning-databases/00203/YearPredictionMSD.txt.zip

import torch
from torch import nn
import torch.nn.functional as F
import torchvision
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
import wandb
from tqdm.auto import tqdm
from typing import Tuple, Dict
device = "cuda" if torch.cuda.is_available() else "cpu"
wandb.login()

def set_random_seed(seed):
    torch.backends.cudnn.deterministic = True
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    np.random.seed(seed)
    random.seed(seed)

set_random_seed(42)

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
--2022-07-23 11:51:52--  https://archive.ics.uci.edu/ml/machine-learning-databases/00203/YearPredictionMSD.txt.zip
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 211011981 (201M) [application/x-httpd-php]
Saving to: ‘data.txt.zip’


2022-07-23 11:51:55 (93.2 MB/s) - ‘data.txt.zip’ saved [211011981/211011981]



## Класс набора данных

In [171]:
TRAINSIZE = 463715

class YearPredictionDataSet(torch.utils.data.Dataset):
    def __init__(self, csv_name: str, train: bool):
        """

        :param csv_name:
        :param train:
        """
        # TODO: если потребуется то добавить во-сть трансформации данных
        
        if train:
            self.sample_target = pd.read_csv(csv_name, header = None).to_numpy()[:TRAINSIZE]
        else:
            self.sample_target = pd.read_csv(csv_name, header=None).to_numpy()[TRAINSIZE:]

        self.data = np.copy(self.sample_target[:, list(np.arange(1, 91))])
        self.label = np.copy(self.sample_target[:, 0])

    def __len__(self):
        return self.sample_target.shape[0]

    def __getitem__(self, item):
        sample = {'sample': torch.tensor(self.data[item], dtype=torch.float),
              'target': torch.tensor(self.label[item], dtype=torch.float)}

        return sample

# Шаг 1️⃣. Определение `sweep-config`

__sweep-config__ -  это словарь словарей, содержащий информации о подборе гиперпараметров для модели.

## Использование __random__. 

В первой части мы будем использовать случайный метод генерации параметров из нужного диапазона и распределения.

In [172]:
sweep_config_first_test = {
    'method': 'random',

    'metric': {
        'name': 'val_loss',
        'goal': 'minimize'
    },

    'parameters': {
        'architecture': {
            'values': ['linear', 'lbad']
        },

        # кол-во lbad слоев
        'lbad_count': {
            'values': [1, 2, 3, 4, 5]
        },

        'layer_count': {
            'values': [2, 3, 4]
        },

        'optimizer': {
            'values': ['sgd', 'adam']
        },

        'nesterov': {
            'values': [True, False]
        },

        'amsgrad': {
            'values': [True, False]
        },

        'bias': {
            'values': [True, False]
        },

        'dropout': {
            'values': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
        },

        'fc_size': {
            'values': [90, 128, 256]
        },

        'epochs': {
            'values': [6, 7, 8, 9]
        },

        'learning_rate': {
            'distribution': 'uniform',
            'min': 1e-3,
            'max': 0.1
        },

        'momentum': {
            'distribution': 'uniform',
            'min': 0.25,
            'max': 0.75
        },

        'weight_decay': {
            'distribution': 'uniform',
            'min': 0.01,
            'max': 0.1
        },

        'batch_size': {
            'distribution': 'q_log_uniform_values',
            'q': 2,
            'min': 2,
            'max': 64,
        }
    }
}

import pprint

pprint.pprint(sweep_config_first_test)

{'method': 'random',
 'metric': {'goal': 'minimize', 'name': 'val_loss'},
 'parameters': {'amsgrad': {'values': [True, False]},
                'architecture': {'values': ['linear', 'lbad']},
                'batch_size': {'distribution': 'q_log_uniform_values',
                               'max': 64,
                               'min': 2,
                               'q': 2},
                'bias': {'values': [True, False]},
                'dropout': {'values': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]},
                'epochs': {'values': [6, 7, 8, 9]},
                'fc_size': {'values': [90, 128, 256]},
                'layer_count': {'values': [2, 3, 4]},
                'lbad_count': {'values': [1, 2, 3, 4, 5]},
                'learning_rate': {'distribution': 'uniform',
                                  'max': 0.1,
                                  'min': 0.001},
                'momentum': {'distribution': 'uniform',
                             'max': 0.75,
                  

# Шаг 2️⃣. Инициализируем `sweep-id`

In [146]:
sweep_id_first_test = wandb.sweep(sweep_config_first_test, project="hw1")

Create sweep with ID: 66bhu7ka
Sweep URL: https://wandb.ai/shuffle-krakens/hw1/sweeps/66bhu7ka


# Шаг 3️⃣. Пишем интерфейс для построения сеток по `sweep-config`

### Подгрузка данных

In [173]:
def build_dataset(_batch_size: int):
  train_set = YearPredictionDataSet("data.txt.zip", train = True)
  train_loader = torch.utils.data.DataLoader(train_set, batch_size = _batch_size, shuffle=True)

  test_set = YearPredictionDataSet("data.txt.zip", train = False)
  test_loader = torch.utils.data.DataLoader(test_set, batch_size = _batch_size)
  return train_loader, test_loader

### Построение сети

#### Определение классов сетей

In [174]:
class LinearModel(nn.Module):
  def __init__(self, layer_count: int, dropout: float, fc_size: int, _bias: bool):
    super(LinearModel, self).__init__()
    self.linear_stack = nn.Sequential()

    for i in range(layer_count):

      if i == 0:
        self.linear_stack.append(nn.Linear(90, fc_size, bias = _bias))
      else:
        self.linear_stack.append(nn.Linear(fc_size, fc_size, bias = _bias))
      self.linear_stack.append(nn.ReLU())

      if i % 3 == 0:
        self.linear_stack.append(nn.Dropout(dropout))

    self.last_hidden_layers = nn.Sequential(
        nn.Linear(fc_size, 10, bias = _bias), nn.ReLU(),
        nn.Linear(10, 1, bias = _bias)
    )
  
  def forward(self, x):
    x = self.linear_stack(x)
    logits = self.last_hidden_layers(x)
    return logits

In [175]:
class LBADModel(nn.Module):
  def __init__(self, lbad_count: int, dropout: float, fc_size: int, _bias: bool):
    super(LBADModel, self).__init__()
    self.lbad_stack = nn.Sequential()

    for i in range(lbad_count):
      if i == 0:
        self.lbad_stack.append(nn.Linear(90, fc_size, bias = _bias))
      else:
        self.lbad_stack.append(nn.Linear(fc_size, fc_size, bias = _bias))
      self.lbad_stack.append(nn.BatchNorm1d(fc_size))
      self.lbad_stack.append(nn.LeakyReLU())
      self.lbad_stack.append(nn.Dropout(dropout))

    self.last_hidden_layers = nn.Sequential(
        nn.Linear(fc_size, 10, bias = _bias), nn.LeakyReLU(),
        nn.Linear(10, 1, bias = _bias)
    )

  def forward(self, x):
    x = self.lbad_stack(x)
    logits = self.last_hidden_layers(x)
    return logits

#### Получение объектов нужных классов нейросетей

In [176]:
def build_network(architecture: str, 
                  layer_count: int, 
                  lbad_count: int,
                  dropout: float, 
                  fc_size: int,
                  bias: bool,
                  activation_type: str, 
                  activation_param: Dict):
  
  if architecture == 'linear':
    neuralnet = LinearModel(layer_count, dropout, fc_size, bias)
    return neuralnet
  elif architecture == 'lbad':
    neuralnet = LBADModel(lbad_count, dropout, fc_size, bias)
    return neuralnet
  elif architecture == 'linear_with_activate':
    neuralnet = LinearModelActivate(layer_count, dropout, fc_size, bias, activation_type, activation_param) 
    return neuralnet

### Построение оптимизатора

In [177]:
def build_optimizer(neuralnet: nn.Module, _optimizer: str, _lr: float, _momentum: float, _weight_decay: float, _nesterov: bool, _amsgrad: bool):
  if _optimizer == 'sgd': 
    return torch.optim.SGD(neuralnet.parameters(), lr = _lr, momentum = _momentum, weight_decay = _weight_decay, nesterov = _nesterov)
  if _optimizer == 'adam':
    return torch.optim.Adam(neuralnet.parameters(), lr = _lr, weight_decay = _weight_decay, amsgrad = _amsgrad)

### Функция обучения

In [178]:
def train(config = None):
  with wandb.init(config = config):
    config = wandb.config

    activation_param_dict = build_activate(config)

    train_loader, val_loader = build_dataset(config.batch_size)

    net = build_network(config.architecture, 
                        config.layer_count,
                        config.lbad_count, 
                        config.dropout,
                        config.fc_size, 
                        config.bias,
                        config.activate_type, 
                        activation_param_dict)
    
    optimizer = build_optimizer(net,
                                config.optimizer, 
                                config.learning_rate,
                                config.momentum, 
                                config.weight_decay, 
                                config.nesterov, 
                                config.amsgrad)
    
    criterion = nn.MSELoss(reduction = 'mean')

    # print(train_loader, val_loader, net, optimizer, criterion, sep = '\n')

    for epoch in range(config.epochs):
      cumu_train_loss = 0
      train_batch_num = 0
      net.train()
      for batch in tqdm(train_loader,
                        desc=f"Train in process..., epoch {epoch + 1}",
                        leave=False):

        data, label = batch['sample'].to(device), batch['target'].to(device)
        y_pred = net(data)
        loss = criterion(y_pred.squeeze(1), label)
        loss = torch.sqrt(loss)

        if torch.isnan(loss) or torch.isinf(loss) or torch.isneginf(loss):
          return

        cumu_train_loss += loss.item()

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if train_batch_num % 50 == 0:
          wandb.log(
              {
                  "TRAIN batch loss": loss.item(),
              }
          )
        train_batch_num += 1

        

      train_loss = cumu_train_loss / len(train_loader)

      net.eval()
      cumu_val_loss = 0
      val_batch_num = 0

      with torch.no_grad():
        for batch in tqdm(val_loader,
                          desc=f"Evaluate in process..., epoch {epoch + 1}",
                          leave=False):
          data, label = batch['sample'].to(device), batch['target'].to(device)
          y_pred = net(data)
          loss = criterion(y_pred.squeeze(1), label)
          loss = torch.sqrt(loss)
          cumu_val_loss += loss.item()

          if val_batch_num % 50 == 0:
            wandb.log(
                {
                    "VAL batch loss": loss.item(),
                }
            )
          val_batch_num += 1
      
      val_loss = cumu_val_loss / len(val_loader)

      wandb.log(
          {
              'train_loss': train_loss,
              'val_loss': val_loss
          }
      )

# Старт экспериментов

In [None]:
wandb.agent(sweep_id_first_test, train, count=15)

# Добавление инициализации слоев. 



## Определения классов сетей

In [179]:
class LinearModelActivate(nn.Module):
  def __init__(self, layer_count: int, dropout: float, fc_size: int, _bias: bool, activation_type: str, activation_param: Dict):
    super(LinearModelActivate, self).__init__()
    self.activation_type = activation_type
    self.activation_param = activation_param
    self.linear_stack = nn.Sequential()
    
    for i in range(layer_count):

      if i == 0:
        self.linear_stack.append(nn.Linear(90, fc_size, bias = _bias))
      else:
        self.linear_stack.append(nn.Linear(fc_size, fc_size, bias = _bias))
      self.linear_stack.append(nn.ReLU())

      if i % 3 == 0:
        self.linear_stack.append(nn.Dropout(dropout))

    self.last_hidden_layers = nn.Sequential(
        nn.Linear(fc_size, 10, bias = _bias), nn.ReLU(),
        nn.Linear(10, 1, bias = _bias)
    )
  

  def __activate_layer(self, layer):
    if self.activation_type == 'uniform':
      torch.nn.init.uniform_(next(iter(layer.parameters())), **self.activation_param)
    elif self.activation_type == 'normal':
      torch.nn.init.normal_(next(iter(layer.parameters())), **self.activation_param)
    elif self.activation_type == 'constant':
      torch.nn.init.constant_(next(iter(layer.parameters())), **self.activation_param)
    elif self.activation_type == 'ones':
      torch.nn.init.ones_(next(iter(layer.parameters())), **self.activation_param)
    elif self.activation_type == 'zeros':
      torch.nn.init.zeros_(next(iter(layer.parameters())), **self.activation_param)
    elif self.activation_type == 'eye':
      torch.nn.init.eye_(next(iter(layer.parameters())), **self.activation_param)

  def activate(self):
    for layer in self.linear_stack:

      if isinstance(layer, torch.nn.modules.linear.Linear):
        self.__activate_layer(layer)
      
      
    for layer in self.last_hidden_layers:

      if isinstance(layer, torch.nn.modules.linear.Linear):
        self.__activate_layer(layer)


  def forward(self, x):
    x = self.linear_stack(x)
    logits = self.last_hidden_layers(x)
    return logits

## Построение словаря с параметрами активации

In [180]:
def build_activate(config):
  activate_type = config.activate_type
  
  if activate_type == 'uniform':
    return {'a': config.a, 'b': config.b}

  if activate_type == 'normal':
    return {'mean': config.mean, 'std': config.std}
  
  if activate_type == 'constant':
    return {'constant': config.constant}

  if activate_type == 'ones':
    return {}
  
  if activate_type == 'zeros':
    return {}
  
  if activate_type == 'eye':
    return {}

## Добавляем некоторые параметры в конфиг обучения

### Обновляем параметры

In [181]:
sweep_config_first_test['parameters'].update({
        'a': {
            'distribution': 'uniform',
            'min': -1.0,
            'max': 1.0
        },

        'b': {
            'distribution': 'uniform',
            'min': 1.0,
            'max': 3.0
        },

        'mean': {
            'distribution': 'uniform',
            'min': -1.0,
            'max': 1.0
        },

        'std': {
            'distribution': 'uniform',
            'min': 0.001,
            'max': 1.42
        },

        'constant': {
            'distribution': 'uniform',
            'min': -2.0,
            'max': 2.0
        }
})

pprint.pprint(sweep_config_first_test)

{'method': 'random',
 'metric': {'goal': 'minimize', 'name': 'val_loss'},
 'parameters': {'a': {'distribution': 'uniform', 'max': 1.0, 'min': -1.0},
                'amsgrad': {'values': [True, False]},
                'architecture': {'values': ['linear', 'lbad']},
                'b': {'distribution': 'uniform', 'max': 3.0, 'min': 1.0},
                'batch_size': {'distribution': 'q_log_uniform_values',
                               'max': 64,
                               'min': 2,
                               'q': 2},
                'bias': {'values': [True, False]},
                'constant': {'distribution': 'uniform',
                             'max': 2.0,
                             'min': -2.0},
                'dropout': {'values': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]},
                'epochs': {'values': [6, 7, 8, 9]},
                'fc_size': {'values': [90, 128, 256]},
                'layer_count': {'values': [2, 3, 4]},
                'lbad_count': {'values': [

### Добавляем и убираем некоторые категориальные параметры обучения

#### Меняем архитектуру

In [182]:
sweep_config_first_test['parameters'].update({
    'architecture': {
        'values': ['linear_with_activate']
    }
})
pprint.pprint(sweep_config_first_test)

{'method': 'random',
 'metric': {'goal': 'minimize', 'name': 'val_loss'},
 'parameters': {'a': {'distribution': 'uniform', 'max': 1.0, 'min': -1.0},
                'amsgrad': {'values': [True, False]},
                'architecture': {'values': ['linear_with_activate']},
                'b': {'distribution': 'uniform', 'max': 3.0, 'min': 1.0},
                'batch_size': {'distribution': 'q_log_uniform_values',
                               'max': 64,
                               'min': 2,
                               'q': 2},
                'bias': {'values': [True, False]},
                'constant': {'distribution': 'uniform',
                             'max': 2.0,
                             'min': -2.0},
                'dropout': {'values': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]},
                'epochs': {'values': [6, 7, 8, 9]},
                'fc_size': {'values': [90, 128, 256]},
                'layer_count': {'values': [2, 3, 4]},
                'lbad_count': {'valu

#### Добавляем тип активации

In [183]:
sweep_config_first_test['parameters'].update({
    'activate_type': {
        'values': ['uniform', 'normal', 'zeros', 'eye', 'constant', 'ones']
    }
})
pprint.pprint(sweep_config_first_test)

{'method': 'random',
 'metric': {'goal': 'minimize', 'name': 'val_loss'},
 'parameters': {'a': {'distribution': 'uniform', 'max': 1.0, 'min': -1.0},
                'activate_type': {'values': ['uniform',
                                             'normal',
                                             'zeros',
                                             'eye',
                                             'constant',
                                             'ones']},
                'amsgrad': {'values': [True, False]},
                'architecture': {'values': ['linear_with_activate']},
                'b': {'distribution': 'uniform', 'max': 3.0, 'min': 1.0},
                'batch_size': {'distribution': 'q_log_uniform_values',
                               'max': 64,
                               'min': 2,
                               'q': 2},
                'bias': {'values': [True, False]},
                'constant': {'distribution': 'uniform',
                       

# Старт экспериментов #2

In [None]:
sweep_id_first_test = wandb.sweep(sweep_config_first_test, project="hw1")
wandb.agent(sweep_id_first_test, train, count=15)

Create sweep with ID: vh85wn6k
Sweep URL: https://wandb.ai/shuffle-krakens/hw1/sweeps/vh85wn6k


[34m[1mwandb[0m: Agent Starting Run: qofublth with config:
[34m[1mwandb[0m: 	a: -0.10515352658685127
[34m[1mwandb[0m: 	activate_type: normal
[34m[1mwandb[0m: 	amsgrad: True
[34m[1mwandb[0m: 	architecture: linear_with_activate
[34m[1mwandb[0m: 	b: 2.577977839588173
[34m[1mwandb[0m: 	batch_size: 20
[34m[1mwandb[0m: 	bias: False
[34m[1mwandb[0m: 	constant: 1.1225225539268346
[34m[1mwandb[0m: 	dropout: 0.5
[34m[1mwandb[0m: 	epochs: 7
[34m[1mwandb[0m: 	fc_size: 128
[34m[1mwandb[0m: 	layer_count: 2
[34m[1mwandb[0m: 	lbad_count: 3
[34m[1mwandb[0m: 	learning_rate: 0.05566979570907099
[34m[1mwandb[0m: 	mean: 0.12607033793375888
[34m[1mwandb[0m: 	momentum: 0.603390456481208
[34m[1mwandb[0m: 	nesterov: False
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	std: 1.2096070042771354
[34m[1mwandb[0m: 	weight_decay: 0.06471277563396993


Train in process..., epoch 1:   0%|          | 0/23186 [00:00<?, ?it/s]

Evaluate in process..., epoch 1:   0%|          | 0/2582 [00:00<?, ?it/s]

Train in process..., epoch 2:   0%|          | 0/23186 [00:00<?, ?it/s]

Evaluate in process..., epoch 2:   0%|          | 0/2582 [00:00<?, ?it/s]

Train in process..., epoch 3:   0%|          | 0/23186 [00:00<?, ?it/s]

Evaluate in process..., epoch 3:   0%|          | 0/2582 [00:00<?, ?it/s]

Train in process..., epoch 4:   0%|          | 0/23186 [00:00<?, ?it/s]

Evaluate in process..., epoch 4:   0%|          | 0/2582 [00:00<?, ?it/s]

Train in process..., epoch 5:   0%|          | 0/23186 [00:00<?, ?it/s]

Evaluate in process..., epoch 5:   0%|          | 0/2582 [00:00<?, ?it/s]

Train in process..., epoch 6:   0%|          | 0/23186 [00:00<?, ?it/s]

Evaluate in process..., epoch 6:   0%|          | 0/2582 [00:00<?, ?it/s]

Train in process..., epoch 7:   0%|          | 0/23186 [00:00<?, ?it/s]

Evaluate in process..., epoch 7:   0%|          | 0/2582 [00:00<?, ?it/s]

VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
TRAIN batch loss,▇▁▂▅▅▂▁▇▃▆▇▂▆▂▄▁▅█▅▄▄▃▇▇▇▂▅▃▅█▄▁▇▅▅▅▃▄▂▃
VAL batch loss,▁▂▂▅▂▄▅▄▅▆▄▅▇▅█▆▆▆▅▅▇▆▆▆▄▄▅▄▅▃▂▃▁▃▅▆▆▄▆▆
train_loss,▂▇▇▆█▁▂
val_loss,▁▆██▅▁▇

0,1
TRAIN batch loss,549.30621
VAL batch loss,1033.50879
train_loss,655.63579
val_loss,967.26633


[34m[1mwandb[0m: Agent Starting Run: wj7kenum with config:
[34m[1mwandb[0m: 	a: 0.6023008697279166
[34m[1mwandb[0m: 	activate_type: normal
[34m[1mwandb[0m: 	amsgrad: True
[34m[1mwandb[0m: 	architecture: linear_with_activate
[34m[1mwandb[0m: 	b: 1.264052019588859
[34m[1mwandb[0m: 	batch_size: 2
[34m[1mwandb[0m: 	bias: False
[34m[1mwandb[0m: 	constant: -1.0518754719548769
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	epochs: 6
[34m[1mwandb[0m: 	fc_size: 128
[34m[1mwandb[0m: 	layer_count: 3
[34m[1mwandb[0m: 	lbad_count: 1
[34m[1mwandb[0m: 	learning_rate: 0.04331563943388528
[34m[1mwandb[0m: 	mean: 0.863203941653536
[34m[1mwandb[0m: 	momentum: 0.7281773889944797
[34m[1mwandb[0m: 	nesterov: True
[34m[1mwandb[0m: 	optimizer: sgd
[34m[1mwandb[0m: 	std: 0.9629720721810444
[34m[1mwandb[0m: 	weight_decay: 0.08321940333687894


Train in process..., epoch 1:   0%|          | 0/231858 [00:00<?, ?it/s]

VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
TRAIN batch loss,▁

0,1
TRAIN batch loss,1997.19446


[34m[1mwandb[0m: Agent Starting Run: xat8fky4 with config:
[34m[1mwandb[0m: 	a: -0.030922303872923385
[34m[1mwandb[0m: 	activate_type: constant
[34m[1mwandb[0m: 	amsgrad: True
[34m[1mwandb[0m: 	architecture: linear_with_activate
[34m[1mwandb[0m: 	b: 1.120854019729755
[34m[1mwandb[0m: 	batch_size: 6
[34m[1mwandb[0m: 	bias: False
[34m[1mwandb[0m: 	constant: -0.5678601255235227
[34m[1mwandb[0m: 	dropout: 0.1
[34m[1mwandb[0m: 	epochs: 6
[34m[1mwandb[0m: 	fc_size: 90
[34m[1mwandb[0m: 	layer_count: 3
[34m[1mwandb[0m: 	lbad_count: 1
[34m[1mwandb[0m: 	learning_rate: 0.07433866416207868
[34m[1mwandb[0m: 	mean: -0.13446991613754689
[34m[1mwandb[0m: 	momentum: 0.3887537744027552
[34m[1mwandb[0m: 	nesterov: True
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	std: 0.6844641340913028
[34m[1mwandb[0m: 	weight_decay: 0.08360857207358766


Train in process..., epoch 1:   0%|          | 0/77286 [00:00<?, ?it/s]

Evaluate in process..., epoch 1:   0%|          | 0/8605 [00:00<?, ?it/s]

Train in process..., epoch 2:   0%|          | 0/77286 [00:00<?, ?it/s]

Evaluate in process..., epoch 2:   0%|          | 0/8605 [00:00<?, ?it/s]

Train in process..., epoch 3:   0%|          | 0/77286 [00:00<?, ?it/s]

Evaluate in process..., epoch 3:   0%|          | 0/8605 [00:00<?, ?it/s]

Train in process..., epoch 4:   0%|          | 0/77286 [00:00<?, ?it/s]

Evaluate in process..., epoch 4:   0%|          | 0/8605 [00:00<?, ?it/s]

Train in process..., epoch 5:   0%|          | 0/77286 [00:00<?, ?it/s]

Evaluate in process..., epoch 5:   0%|          | 0/8605 [00:00<?, ?it/s]

Train in process..., epoch 6:   0%|          | 0/77286 [00:00<?, ?it/s]

Evaluate in process..., epoch 6:   0%|          | 0/8605 [00:00<?, ?it/s]

VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
TRAIN batch loss,▆▇▅▆▆▅▇█▇▇▆█▆▅▃▆▅▁▃▆▇█▇▅▃▇▇▇█▅▇▇▆▅▆▆▇▄▅▅
VAL batch loss,▄▇▇▂▃▆▆▄█▆▆▇▇▆▇▇██▃▅▇▅▅▆▇▂█▅▂▁▅▃▃▃▇▇▆██▆
train_loss,▁█████
val_loss,▁▁▁▁▁▁

0,1
TRAIN batch loss,1998.51831
VAL batch loss,1996.33936
train_loss,1998.41106
val_loss,1998.50312


[34m[1mwandb[0m: Agent Starting Run: ghwl4ec7 with config:
[34m[1mwandb[0m: 	a: -0.664609972101059
[34m[1mwandb[0m: 	activate_type: eye
[34m[1mwandb[0m: 	amsgrad: True
[34m[1mwandb[0m: 	architecture: linear_with_activate
[34m[1mwandb[0m: 	b: 2.1915251766852677
[34m[1mwandb[0m: 	batch_size: 6
[34m[1mwandb[0m: 	bias: False
[34m[1mwandb[0m: 	constant: -1.0309775621722386
[34m[1mwandb[0m: 	dropout: 0.5
[34m[1mwandb[0m: 	epochs: 8
[34m[1mwandb[0m: 	fc_size: 90
[34m[1mwandb[0m: 	layer_count: 4
[34m[1mwandb[0m: 	lbad_count: 4
[34m[1mwandb[0m: 	learning_rate: 0.04587594825644093
[34m[1mwandb[0m: 	mean: 0.8370948384250783
[34m[1mwandb[0m: 	momentum: 0.5040177667391108
[34m[1mwandb[0m: 	nesterov: False
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	std: 0.8929386725344878
[34m[1mwandb[0m: 	weight_decay: 0.08235378463834217


Train in process..., epoch 1:   0%|          | 0/77286 [00:00<?, ?it/s]

Evaluate in process..., epoch 1:   0%|          | 0/8605 [00:00<?, ?it/s]

Train in process..., epoch 2:   0%|          | 0/77286 [00:00<?, ?it/s]

Evaluate in process..., epoch 2:   0%|          | 0/8605 [00:00<?, ?it/s]

Train in process..., epoch 3:   0%|          | 0/77286 [00:00<?, ?it/s]

Evaluate in process..., epoch 3:   0%|          | 0/8605 [00:00<?, ?it/s]

Train in process..., epoch 4:   0%|          | 0/77286 [00:00<?, ?it/s]

Evaluate in process..., epoch 4:   0%|          | 0/8605 [00:00<?, ?it/s]

Train in process..., epoch 5:   0%|          | 0/77286 [00:00<?, ?it/s]

Evaluate in process..., epoch 5:   0%|          | 0/8605 [00:00<?, ?it/s]

Train in process..., epoch 6:   0%|          | 0/77286 [00:00<?, ?it/s]

Evaluate in process..., epoch 6:   0%|          | 0/8605 [00:00<?, ?it/s]

Train in process..., epoch 7:   0%|          | 0/77286 [00:00<?, ?it/s]

Evaluate in process..., epoch 7:   0%|          | 0/8605 [00:00<?, ?it/s]

Train in process..., epoch 8:   0%|          | 0/77286 [00:00<?, ?it/s]

Evaluate in process..., epoch 8:   0%|          | 0/8605 [00:00<?, ?it/s]

VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
TRAIN batch loss,▄▇▅█▅▆▆██▄▆▅▅▄▇▆▆▃▆▆▇▅▆▁▆▇▇█▇▇▅▄▅▅▆█▇█▅▄
VAL batch loss,▅▂▂▄▆▃█▄▇▇▆▅▆▅▅▇▆▆▅▇▆▃▇▄▁▆▇▆▇▆▆▇▇▇▆▅█▆▇▄
train_loss,█▁▁▁▁▁▁▁
val_loss,▁▁▁▁▁▁▁▁

0,1
TRAIN batch loss,1997.5188
VAL batch loss,1996.33936
train_loss,1998.41108
val_loss,1998.50312


[34m[1mwandb[0m: Agent Starting Run: c6wlu0us with config:
[34m[1mwandb[0m: 	a: -0.46989893288798834
[34m[1mwandb[0m: 	activate_type: ones
[34m[1mwandb[0m: 	amsgrad: False
[34m[1mwandb[0m: 	architecture: linear_with_activate
[34m[1mwandb[0m: 	b: 1.6546892617914073
[34m[1mwandb[0m: 	batch_size: 46
[34m[1mwandb[0m: 	bias: False
[34m[1mwandb[0m: 	constant: 0.614361283230707
[34m[1mwandb[0m: 	dropout: 0.6
[34m[1mwandb[0m: 	epochs: 6
[34m[1mwandb[0m: 	fc_size: 90
[34m[1mwandb[0m: 	layer_count: 3
[34m[1mwandb[0m: 	lbad_count: 4
[34m[1mwandb[0m: 	learning_rate: 0.05463158212595565
[34m[1mwandb[0m: 	mean: -0.15275847875457993
[34m[1mwandb[0m: 	momentum: 0.6541413243070073
[34m[1mwandb[0m: 	nesterov: True
[34m[1mwandb[0m: 	optimizer: sgd
[34m[1mwandb[0m: 	std: 0.5282031877986306
[34m[1mwandb[0m: 	weight_decay: 0.01636970853246225


Train in process..., epoch 1:   0%|          | 0/10081 [00:00<?, ?it/s]

VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
TRAIN batch loss,▁

0,1
TRAIN batch loss,1999.91284


[34m[1mwandb[0m: Agent Starting Run: ban0m5rn with config:
[34m[1mwandb[0m: 	a: 0.33101121135819533
[34m[1mwandb[0m: 	activate_type: zeros
[34m[1mwandb[0m: 	amsgrad: True
[34m[1mwandb[0m: 	architecture: linear_with_activate
[34m[1mwandb[0m: 	b: 1.602761018191115
[34m[1mwandb[0m: 	batch_size: 18
[34m[1mwandb[0m: 	bias: True
[34m[1mwandb[0m: 	constant: 0.3133115585776358
[34m[1mwandb[0m: 	dropout: 0.6
[34m[1mwandb[0m: 	epochs: 9
[34m[1mwandb[0m: 	fc_size: 128
[34m[1mwandb[0m: 	layer_count: 3
[34m[1mwandb[0m: 	lbad_count: 4
[34m[1mwandb[0m: 	learning_rate: 0.08456549318846927
[34m[1mwandb[0m: 	mean: 0.10380328179842536
[34m[1mwandb[0m: 	momentum: 0.4307078096999638
[34m[1mwandb[0m: 	nesterov: True
[34m[1mwandb[0m: 	optimizer: sgd
[34m[1mwandb[0m: 	std: 1.3470570461260452
[34m[1mwandb[0m: 	weight_decay: 0.0868241748888123


Train in process..., epoch 1:   0%|          | 0/25762 [00:00<?, ?it/s]

Evaluate in process..., epoch 1:   0%|          | 0/2869 [00:00<?, ?it/s]

Train in process..., epoch 2:   0%|          | 0/25762 [00:00<?, ?it/s]