Будем практиковаться на датасете: https://www.kaggle.com/c/avito-demand-prediction  
  
Ваша задача:  
  
Создать Dataset для загрузки данных (используем только числовые данные)
Обернуть его в Dataloader  
  
Написать архитектуру сети, которая предсказывает число показов на основании числовых данных (вы всегда можете нагенерить дополнительных факторов). Сеть должна включать BatchNorm слои и Dropout (или НЕ включать, но нужно обосновать)  
  
Учить будем на функцию потерь с кагла (log RMSE) - нужно её реализовать
Сравните сходимость Adam, RMSProp и SGD, сделайте вывод по качеству работы модели  
  
train-test разделение нужно сделать с помощью sklearn random_state=13, test_size = 0.25  

In [105]:
import math
import torch
import torch.nn.functional as F
import torch.nn as nn
import torch.utils.data as data_utils
from torch.utils.data import TensorDataset, DataLoader

from sklearn.model_selection import train_test_split

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [61]:
class RangeDataset(torch.utils.data.Dataset):
    def __init__(self, start, end, step=1):
        self.start = start
        self.end = end
        self.step = step

    def __len__(self):
        return math.ceil((self.end - self.start) / self.step)

    def __getitem__(self, index):
        if index > len(self):
          raise RuntimeError
        value = self.start + index * self.step
        return np.array([value, value])

In [93]:
RANDOM_STATE=13

In [63]:
dataset = RangeDataset(0, 29)
data_loader = torch.utils.data.DataLoader( dataset, batch_size=6, shuffle=True, num_workers=0, drop_last=True)

for i, batch in enumerate(data_loader):
    print(i, batch)

0 tensor([[ 6,  6],
        [ 0,  0],
        [27, 27],
        [17, 17],
        [21, 21],
        [ 5,  5]], dtype=torch.int32)
1 tensor([[14, 14],
        [15, 15],
        [25, 25],
        [18, 18],
        [ 4,  4],
        [12, 12]], dtype=torch.int32)
2 tensor([[23, 23],
        [ 7,  7],
        [19, 19],
        [ 9,  9],
        [24, 24],
        [11, 11]], dtype=torch.int32)
3 tensor([[26, 26],
        [10, 10],
        [ 2,  2],
        [16, 16],
        [13, 13],
        [20, 20]], dtype=torch.int32)


item_id - Идентификатор объявления.  
user_id - Идентификатор пользователя.  
region - Рекламный регион.  
city - Рекламный город.  
parent_category_name - Категория объявлений верхнего уровня, классифицированная по рекламной модели Avito.  
category_name - Категория мелкозернистых объявлений, классифицированная по рекламной модели Avito.  
param_1 - Необязательный параметр из рекламной модели Avito.  
param_2 - Необязательный параметр из рекламной модели Avito.  
param_3 - Необязательный параметр из рекламной модели Avito.  
title - Название объявления.  
description - Описание объявления.  
price - Цена объявления.  
item_seq_number - Порядковый номер объявления для пользователя.  
activation_date- Дата размещения объявления.  
user_type - Тип пользователя.  
image - Идентификационный код изображения. Привязка к файлу jpg в train_jpg. Не в каждой рекламе есть изображение.  
image_top_1 - Классификационный код Авито для изображения.  
deal_probability - Целевая переменная. Это вероятность того, что объявление действительно что-то продало. Невозможно с уверенностью проверить каждую транзакцию, поэтому значение этого столбца может быть любым с плавающей точкой от нуля до единицы.  

In [64]:
data = pd.read_csv("train.csv")

In [65]:
data = data.head(300000)

In [66]:
data.shape

(300000, 18)

In [67]:
data.head()

Unnamed: 0,item_id,user_id,region,city,parent_category_name,category_name,param_1,param_2,param_3,title,description,price,item_seq_number,activation_date,user_type,image,image_top_1,deal_probability
0,b912c3c6a6ad,e00f8ff2eaf9,Свердловская область,Екатеринбург,Личные вещи,Товары для детей и игрушки,Постельные принадлежности,,,Кокоби(кокон для сна),"Кокон для сна малыша,пользовались меньше месяц...",400.0,2,2017-03-28,Private,d10c7e016e03247a3bf2d13348fe959fe6f436c1caf64c...,1008.0,0.12789
1,2dac0150717d,39aeb48f0017,Самарская область,Самара,Для дома и дачи,Мебель и интерьер,Другое,,,Стойка для Одежды,"Стойка для одежды, под вешалки. С бутика.",3000.0,19,2017-03-26,Private,79c9392cc51a9c81c6eb91eceb8e552171db39d7142700...,692.0,0.0
2,ba83aefab5dc,91e2f88dd6e3,Ростовская область,Ростов-на-Дону,Бытовая электроника,Аудио и видео,"Видео, DVD и Blu-ray плееры",,,Philips bluray,"В хорошем состоянии, домашний кинотеатр с blu ...",4000.0,9,2017-03-20,Private,b7f250ee3f39e1fedd77c141f273703f4a9be59db4b48a...,3032.0,0.43177
3,02996f1dd2ea,bf5cccea572d,Татарстан,Набережные Челны,Личные вещи,Товары для детей и игрушки,Автомобильные кресла,,,Автокресло,Продам кресло от0-25кг,2200.0,286,2017-03-25,Company,e6ef97e0725637ea84e3d203e82dadb43ed3cc0a1c8413...,796.0,0.80323
4,7c90be56d2ab,ef50846afc0b,Волгоградская область,Волгоград,Транспорт,Автомобили,С пробегом,ВАЗ (LADA),2110.0,"ВАЗ 2110, 2003",Все вопросы по телефону.,40000.0,3,2017-03-16,Private,54a687a3a0fc1d68aed99bdaaf551c5c70b761b16fd0a2...,2264.0,0.20797


In [68]:
data.isna().sum()

item_id                      0
user_id                      0
region                       0
city                         0
parent_category_name         0
category_name                0
param_1                  11940
param_2                 130893
param_3                 172370
title                        0
description              23084
price                    17048
item_seq_number              0
activation_date              0
user_type                    0
image                    22544
image_top_1              22544
deal_probability             0
dtype: int64

In [69]:
data.drop(['param_1', 'param_2', 'param_3', 'description', 'image', 'image_top_1'], 1, inplace= True)

  """Entry point for launching an IPython kernel.


In [70]:
data.drop(['item_id', 'user_id'], 1, inplace= True)

  """Entry point for launching an IPython kernel.


In [71]:
data['price'] = data['price'].fillna(data['price'].median())

In [72]:
columns = data.columns

In [73]:
for i in columns:
    print(i,   len(data[i].unique()), data[i].dtype)
# data[[columns]].unique()

region 28 object
city 1511 object
parent_category_name 9 object
category_name 47 object
title 186560 object
price 6570 float64
item_seq_number 10670 int64
activation_date 18 object
user_type 3 object
deal_probability 9599 float64


In [74]:
data.drop('title', 1, inplace= True)

  """Entry point for launching an IPython kernel.


In [75]:
data = pd.get_dummies(data)

In [76]:
data.shape

(300000, 1619)

In [77]:
X = data.drop('deal_probability', 1)

  """Entry point for launching an IPython kernel.


In [78]:
y = data['deal_probability']

In [80]:
from sklearn.preprocessing import StandardScaler

In [81]:
scaler = StandardScaler()
X_features = scaler.fit_transform(X)

In [82]:
from sklearn.decomposition import PCA

In [83]:
pca = PCA(n_components=15)
pca.fit(X)

PCA(n_components=15)

In [84]:
x_pca = pca.transform(X)

In [85]:
x_pca.shape

(300000, 15)

In [87]:
xx = pd.DataFrame(x_pca)

In [88]:
xx[:5]

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,-242238.663885,-720.835603,0.603805,-0.016663,0.015591,-0.027468,0.058606,-0.231576,-0.004248,1.342951,0.36055,0.681332,-0.177958,-0.169296,-0.276237
1,-239638.6621,-704.110856,-0.307752,-0.727026,-0.038502,0.921677,0.311521,-0.086605,-0.234771,-0.1378,-0.130984,0.015779,0.02858,0.102972,-0.047039
2,-238638.663165,-714.216716,-0.311985,-0.637023,-0.024123,-0.664669,0.485251,-0.11525,-0.167258,-0.24386,0.029639,0.602146,1.07831,-0.156643,0.051157
3,-240438.633829,-437.026155,-0.232047,1.107202,0.047632,0.066343,0.11109,-0.09974,0.180636,-0.061306,0.170635,0.815387,-0.776894,-0.529646,-0.181332
4,-202638.664001,-724.028028,-0.219064,-0.624202,-0.011004,-0.08654,-0.101278,-0.089342,0.937362,0.037055,-0.912984,-0.050167,-0.064724,-0.194204,0.128335


In [89]:
xx.to_csv('X_new.csv')

In [112]:
X = pd.read_csv('X_new.csv', index_col=0)

In [113]:
X.shape

(300000, 15)

In [114]:
X[:5]

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,-242238.663885,-720.835603,0.603805,-0.016663,0.015591,-0.027468,0.058606,-0.231576,-0.004248,1.342951,0.36055,0.681332,-0.177958,-0.169296,-0.276237
1,-239638.6621,-704.110856,-0.307752,-0.727026,-0.038502,0.921677,0.311521,-0.086605,-0.234771,-0.1378,-0.130984,0.015779,0.02858,0.102972,-0.047039
2,-238638.663165,-714.216716,-0.311985,-0.637023,-0.024123,-0.664669,0.485251,-0.11525,-0.167258,-0.24386,0.029639,0.602146,1.07831,-0.156643,0.051157
3,-240438.633829,-437.026155,-0.232047,1.107202,0.047632,0.066343,0.11109,-0.09974,0.180636,-0.061306,0.170635,0.815387,-0.776894,-0.529646,-0.181332
4,-202638.664001,-724.028028,-0.219064,-0.624202,-0.011004,-0.08654,-0.101278,-0.089342,0.937362,0.037055,-0.912984,-0.050167,-0.064724,-0.194204,0.128335


In [115]:
dim = X.shape[1]
dim

15

In [155]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state=RANDOM_STATE)

In [163]:
class Perceptron(nn.Module):
    def __init__(self, input_dim, output_dim, activation="relu"):
        super(Perceptron, self).__init__()
        self.fc = nn.Linear(input_dim, output_dim)
        self.activation = activation
        
    def forward(self, x):
        x = self.fc(x)
        if self.activation=="relu":
            return F.relu(x)
        if self.activation=="sigmoid":
            return F.sigmoid(x)
        raise RuntimeError


class FeedForward(nn.Module):
    def __init__(self, input_dim, hidden_dim):
        super(FeedForward, self).__init__()
        self.bn1 = nn.BatchNorm1d(input_dim)
        self.fc1 = Perceptron(input_dim, hidden_dim)
        self.bn2 = nn.BatchNorm1d(hidden_dim)
        self.dp = nn.Dropout(0.25)
        self.fc2 = Perceptron(hidden_dim, 10, "relu")
        
    def forward(self, x):
        x = self.bn1(x)
        x = self.fc1(x)
        x = self.dp(x)
        x = self.fc2(x)
        return x

In [164]:
net = FeedForward(dim, 8)

optimizer = torch.optim.Adam(net.parameters(), lr = 0.01)
criterion =  nn.MSELoss()

In [158]:
# torch_tensor = torch.tensor(data['deal_probability'].values)

In [159]:
# X_train, X_test, y_train, y_test

X_train = torch.DoubleTensor(X_train.values)
X_test = torch.DoubleTensor(X_test.values)
y_train = torch.DoubleTensor(y_train.values)
y_test = torch.DoubleTensor(y_test.values)

In [167]:
# torch.set_default_dtype(torch.float64)
num_epochs = 10
# target = pd.DataFrame(y)
train = data_utils.TensorDataset(X_train, y_train)
train_loader = data_utils.DataLoader(train, batch_size=10, shuffle=True, drop_last=True)

In [168]:
from tqdm import tqdm

In [169]:
for epoch in tqdm(range(num_epochs)):  
    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
        inputs, labels = data[0], data[1]

        # обнуляем градиент
        optimizer.zero_grad()

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # выводим статистику о процессе обучения
        running_loss += loss.item()
        if i % 300 == 0:    # печатаем каждые 300 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 300))
            running_loss = 0.0

print('Training is finished!')

  0%|                                                                                           | 0/10 [00:00<?, ?it/s]

[1,     1] loss: 0.000
[1,   301] loss: 0.073
[1,   601] loss: 0.075
[1,   901] loss: 0.070
[1,  1201] loss: 0.072
[1,  1501] loss: 0.077
[1,  1801] loss: 0.074
[1,  2101] loss: 0.073
[1,  2401] loss: 0.080
[1,  2701] loss: 0.077
[1,  3001] loss: 0.083
[1,  3301] loss: 0.078
[1,  3601] loss: 0.076
[1,  3901] loss: 0.078
[1,  4201] loss: 0.075
[1,  4501] loss: 0.076
[1,  4801] loss: 0.079
[1,  5101] loss: 0.079
[1,  5401] loss: 0.075
[1,  5701] loss: 0.069
[1,  6001] loss: 0.076
[1,  6301] loss: 0.074
[1,  6601] loss: 0.077
[1,  6901] loss: 0.078
[1,  7201] loss: 0.077
[1,  7501] loss: 0.075
[1,  7801] loss: 0.075
[1,  8101] loss: 0.075
[1,  8401] loss: 0.075
[1,  8701] loss: 0.077
[1,  9001] loss: 0.078
[1,  9301] loss: 0.076
[1,  9601] loss: 0.075
[1,  9901] loss: 0.079
[1, 10201] loss: 0.082
[1, 10501] loss: 0.075
[1, 10801] loss: 0.079
[1, 11101] loss: 0.077
[1, 11401] loss: 0.079
[1, 11701] loss: 0.075
[1, 12001] loss: 0.079
[1, 12301] loss: 0.076
[1, 12601] loss: 0.078
[1, 12901] 

 10%|████████▎                                                                          | 1/10 [00:19<02:58, 19.81s/it]

[2,     1] loss: 0.000
[2,   301] loss: 0.076
[2,   601] loss: 0.078
[2,   901] loss: 0.078
[2,  1201] loss: 0.076
[2,  1501] loss: 0.082
[2,  1801] loss: 0.072
[2,  2101] loss: 0.072
[2,  2401] loss: 0.083
[2,  2701] loss: 0.075
[2,  3001] loss: 0.076
[2,  3301] loss: 0.073
[2,  3601] loss: 0.079
[2,  3901] loss: 0.078
[2,  4201] loss: 0.076
[2,  4501] loss: 0.073
[2,  4801] loss: 0.072
[2,  5101] loss: 0.078
[2,  5401] loss: 0.075
[2,  5701] loss: 0.076
[2,  6001] loss: 0.074
[2,  6301] loss: 0.072
[2,  6601] loss: 0.070
[2,  6901] loss: 0.080
[2,  7201] loss: 0.078
[2,  7501] loss: 0.081
[2,  7801] loss: 0.069
[2,  8101] loss: 0.073
[2,  8401] loss: 0.075
[2,  8701] loss: 0.077
[2,  9001] loss: 0.069
[2,  9301] loss: 0.073
[2,  9601] loss: 0.079
[2,  9901] loss: 0.075
[2, 10201] loss: 0.076
[2, 10501] loss: 0.076
[2, 10801] loss: 0.078
[2, 11101] loss: 0.077
[2, 11401] loss: 0.080
[2, 11701] loss: 0.078
[2, 12001] loss: 0.075
[2, 12301] loss: 0.078
[2, 12601] loss: 0.080
[2, 12901] 

 20%|████████████████▌                                                                  | 2/10 [00:36<02:23, 17.92s/it]

[3,     1] loss: 0.000
[3,   301] loss: 0.077
[3,   601] loss: 0.077
[3,   901] loss: 0.074
[3,  1201] loss: 0.075
[3,  1501] loss: 0.074
[3,  1801] loss: 0.079
[3,  2101] loss: 0.080
[3,  2401] loss: 0.073
[3,  2701] loss: 0.077
[3,  3001] loss: 0.076
[3,  3301] loss: 0.075
[3,  3601] loss: 0.074
[3,  3901] loss: 0.074
[3,  4201] loss: 0.071
[3,  4501] loss: 0.073
[3,  4801] loss: 0.077
[3,  5101] loss: 0.075
[3,  5401] loss: 0.081
[3,  5701] loss: 0.074
[3,  6001] loss: 0.072
[3,  6301] loss: 0.074
[3,  6601] loss: 0.077
[3,  6901] loss: 0.076
[3,  7201] loss: 0.082
[3,  7501] loss: 0.072
[3,  7801] loss: 0.076
[3,  8101] loss: 0.072
[3,  8401] loss: 0.080
[3,  8701] loss: 0.076
[3,  9001] loss: 0.076
[3,  9301] loss: 0.080
[3,  9601] loss: 0.073
[3,  9901] loss: 0.073
[3, 10201] loss: 0.077
[3, 10501] loss: 0.076
[3, 10801] loss: 0.073
[3, 11101] loss: 0.077
[3, 11401] loss: 0.083
[3, 11701] loss: 0.073
[3, 12001] loss: 0.074
[3, 12301] loss: 0.077
[3, 12601] loss: 0.075
[3, 12901] 

 30%|████████████████████████▉                                                          | 3/10 [00:53<02:01, 17.37s/it]

[4,     1] loss: 0.000
[4,   301] loss: 0.075
[4,   601] loss: 0.071
[4,   901] loss: 0.075
[4,  1201] loss: 0.074
[4,  1501] loss: 0.072
[4,  1801] loss: 0.075
[4,  2101] loss: 0.075
[4,  2401] loss: 0.073
[4,  2701] loss: 0.077
[4,  3001] loss: 0.073
[4,  3301] loss: 0.076
[4,  3601] loss: 0.077
[4,  3901] loss: 0.078
[4,  4201] loss: 0.080
[4,  4501] loss: 0.081
[4,  4801] loss: 0.079
[4,  5101] loss: 0.076
[4,  5401] loss: 0.076
[4,  5701] loss: 0.080
[4,  6001] loss: 0.071
[4,  6301] loss: 0.072
[4,  6601] loss: 0.072
[4,  6901] loss: 0.075
[4,  7201] loss: 0.074
[4,  7501] loss: 0.080
[4,  7801] loss: 0.073
[4,  8101] loss: 0.081
[4,  8401] loss: 0.073
[4,  8701] loss: 0.071
[4,  9001] loss: 0.079
[4,  9301] loss: 0.076
[4,  9601] loss: 0.075
[4,  9901] loss: 0.080
[4, 10201] loss: 0.078
[4, 10501] loss: 0.074
[4, 10801] loss: 0.076
[4, 11101] loss: 0.073
[4, 11401] loss: 0.078
[4, 11701] loss: 0.078
[4, 12001] loss: 0.074
[4, 12301] loss: 0.072
[4, 12601] loss: 0.074
[4, 12901] 

 40%|█████████████████████████████████▏                                                 | 4/10 [01:10<01:43, 17.25s/it]

[5,     1] loss: 0.000
[5,   301] loss: 0.076
[5,   601] loss: 0.078
[5,   901] loss: 0.082
[5,  1201] loss: 0.074
[5,  1501] loss: 0.077
[5,  1801] loss: 0.078
[5,  2101] loss: 0.076
[5,  2401] loss: 0.075
[5,  2701] loss: 0.080
[5,  3001] loss: 0.069
[5,  3301] loss: 0.073
[5,  3601] loss: 0.075
[5,  3901] loss: 0.076
[5,  4201] loss: 0.082
[5,  4501] loss: 0.076
[5,  4801] loss: 0.073
[5,  5101] loss: 0.080
[5,  5401] loss: 0.068
[5,  5701] loss: 0.075
[5,  6001] loss: 0.075
[5,  6301] loss: 0.075
[5,  6601] loss: 0.074
[5,  6901] loss: 0.078
[5,  7201] loss: 0.077
[5,  7501] loss: 0.074
[5,  7801] loss: 0.070
[5,  8101] loss: 0.080
[5,  8401] loss: 0.077
[5,  8701] loss: 0.072
[5,  9001] loss: 0.079
[5,  9301] loss: 0.074
[5,  9601] loss: 0.076
[5,  9901] loss: 0.075
[5, 10201] loss: 0.071
[5, 10501] loss: 0.079
[5, 10801] loss: 0.081
[5, 11101] loss: 0.075
[5, 11401] loss: 0.076
[5, 11701] loss: 0.077
[5, 12001] loss: 0.072
[5, 12301] loss: 0.077
[5, 12601] loss: 0.078
[5, 12901] 

 50%|█████████████████████████████████████████▌                                         | 5/10 [01:27<01:26, 17.26s/it]

[6,     1] loss: 0.000
[6,   301] loss: 0.075
[6,   601] loss: 0.077
[6,   901] loss: 0.074
[6,  1201] loss: 0.074
[6,  1501] loss: 0.076
[6,  1801] loss: 0.079
[6,  2101] loss: 0.076
[6,  2401] loss: 0.075
[6,  2701] loss: 0.084
[6,  3001] loss: 0.077
[6,  3301] loss: 0.074
[6,  3601] loss: 0.074
[6,  3901] loss: 0.077
[6,  4201] loss: 0.077
[6,  4501] loss: 0.074
[6,  4801] loss: 0.074
[6,  5101] loss: 0.079
[6,  5401] loss: 0.074
[6,  5701] loss: 0.078
[6,  6001] loss: 0.078
[6,  6301] loss: 0.073
[6,  6601] loss: 0.079
[6,  6901] loss: 0.079
[6,  7201] loss: 0.071
[6,  7501] loss: 0.071
[6,  7801] loss: 0.075
[6,  8101] loss: 0.076
[6,  8401] loss: 0.078
[6,  8701] loss: 0.073
[6,  9001] loss: 0.075
[6,  9301] loss: 0.070
[6,  9601] loss: 0.077
[6,  9901] loss: 0.074
[6, 10201] loss: 0.073
[6, 10501] loss: 0.076
[6, 10801] loss: 0.075
[6, 11101] loss: 0.080
[6, 11401] loss: 0.079
[6, 11701] loss: 0.073
[6, 12001] loss: 0.074
[6, 12301] loss: 0.077
[6, 12601] loss: 0.081
[6, 12901] 

 60%|█████████████████████████████████████████████████▊                                 | 6/10 [01:44<01:09, 17.28s/it]

[7,     1] loss: 0.000
[7,   301] loss: 0.074
[7,   601] loss: 0.077
[7,   901] loss: 0.072
[7,  1201] loss: 0.071
[7,  1501] loss: 0.075
[7,  1801] loss: 0.074
[7,  2101] loss: 0.079
[7,  2401] loss: 0.076
[7,  2701] loss: 0.076
[7,  3001] loss: 0.075
[7,  3301] loss: 0.078
[7,  3601] loss: 0.075
[7,  3901] loss: 0.077
[7,  4201] loss: 0.076
[7,  4501] loss: 0.074
[7,  4801] loss: 0.080
[7,  5101] loss: 0.077
[7,  5401] loss: 0.076
[7,  5701] loss: 0.077
[7,  6001] loss: 0.082
[7,  6301] loss: 0.073
[7,  6601] loss: 0.074
[7,  6901] loss: 0.076
[7,  7201] loss: 0.073
[7,  7501] loss: 0.069
[7,  7801] loss: 0.079
[7,  8101] loss: 0.078
[7,  8401] loss: 0.079
[7,  8701] loss: 0.078
[7,  9001] loss: 0.074
[7,  9301] loss: 0.074
[7,  9601] loss: 0.071
[7,  9901] loss: 0.072
[7, 10201] loss: 0.077
[7, 10501] loss: 0.073
[7, 10801] loss: 0.076
[7, 11101] loss: 0.071
[7, 11401] loss: 0.078
[7, 11701] loss: 0.075
[7, 12001] loss: 0.073
[7, 12301] loss: 0.075
[7, 12601] loss: 0.080
[7, 12901] 

 70%|██████████████████████████████████████████████████████████                         | 7/10 [02:02<00:51, 17.30s/it]

[8,     1] loss: 0.000
[8,   301] loss: 0.078
[8,   601] loss: 0.083
[8,   901] loss: 0.072
[8,  1201] loss: 0.077
[8,  1501] loss: 0.072
[8,  1801] loss: 0.072
[8,  2101] loss: 0.072
[8,  2401] loss: 0.074
[8,  2701] loss: 0.077
[8,  3001] loss: 0.076
[8,  3301] loss: 0.075
[8,  3601] loss: 0.076
[8,  3901] loss: 0.072
[8,  4201] loss: 0.070
[8,  4501] loss: 0.072
[8,  4801] loss: 0.077
[8,  5101] loss: 0.077
[8,  5401] loss: 0.081
[8,  5701] loss: 0.077
[8,  6001] loss: 0.073
[8,  6301] loss: 0.080
[8,  6601] loss: 0.074
[8,  6901] loss: 0.075
[8,  7201] loss: 0.072
[8,  7501] loss: 0.084
[8,  7801] loss: 0.078
[8,  8101] loss: 0.072
[8,  8401] loss: 0.082
[8,  8701] loss: 0.075
[8,  9001] loss: 0.076
[8,  9301] loss: 0.077
[8,  9601] loss: 0.079
[8,  9901] loss: 0.078
[8, 10201] loss: 0.073
[8, 10501] loss: 0.078
[8, 10801] loss: 0.078
[8, 11101] loss: 0.077
[8, 11401] loss: 0.077
[8, 11701] loss: 0.075
[8, 12001] loss: 0.074
[8, 12301] loss: 0.074
[8, 12601] loss: 0.078
[8, 12901] 

 80%|██████████████████████████████████████████████████████████████████▍                | 8/10 [02:19<00:34, 17.32s/it]

[9,     1] loss: 0.001
[9,   301] loss: 0.077
[9,   601] loss: 0.076
[9,   901] loss: 0.075
[9,  1201] loss: 0.075
[9,  1501] loss: 0.075
[9,  1801] loss: 0.073
[9,  2101] loss: 0.078
[9,  2401] loss: 0.080
[9,  2701] loss: 0.078
[9,  3001] loss: 0.075
[9,  3301] loss: 0.078
[9,  3601] loss: 0.078
[9,  3901] loss: 0.077
[9,  4201] loss: 0.075
[9,  4501] loss: 0.082
[9,  4801] loss: 0.080
[9,  5101] loss: 0.078
[9,  5401] loss: 0.074
[9,  5701] loss: 0.076
[9,  6001] loss: 0.081
[9,  6301] loss: 0.073
[9,  6601] loss: 0.072
[9,  6901] loss: 0.075
[9,  7201] loss: 0.078
[9,  7501] loss: 0.079
[9,  7801] loss: 0.075
[9,  8101] loss: 0.073
[9,  8401] loss: 0.081
[9,  8701] loss: 0.074
[9,  9001] loss: 0.078
[9,  9301] loss: 0.074
[9,  9601] loss: 0.078
[9,  9901] loss: 0.077
[9, 10201] loss: 0.071
[9, 10501] loss: 0.070
[9, 10801] loss: 0.073
[9, 11101] loss: 0.075
[9, 11401] loss: 0.077
[9, 11701] loss: 0.077
[9, 12001] loss: 0.077
[9, 12301] loss: 0.079
[9, 12601] loss: 0.076
[9, 12901] 

 90%|██████████████████████████████████████████████████████████████████████████▋        | 9/10 [02:36<00:17, 17.30s/it]

[10,     1] loss: 0.000
[10,   301] loss: 0.075
[10,   601] loss: 0.074
[10,   901] loss: 0.079
[10,  1201] loss: 0.071
[10,  1501] loss: 0.078
[10,  1801] loss: 0.074
[10,  2101] loss: 0.080
[10,  2401] loss: 0.081
[10,  2701] loss: 0.075
[10,  3001] loss: 0.074
[10,  3301] loss: 0.073
[10,  3601] loss: 0.082
[10,  3901] loss: 0.082
[10,  4201] loss: 0.075
[10,  4501] loss: 0.075
[10,  4801] loss: 0.070
[10,  5101] loss: 0.074
[10,  5401] loss: 0.075
[10,  5701] loss: 0.076
[10,  6001] loss: 0.079
[10,  6301] loss: 0.076
[10,  6601] loss: 0.076
[10,  6901] loss: 0.073
[10,  7201] loss: 0.073
[10,  7501] loss: 0.073
[10,  7801] loss: 0.083
[10,  8101] loss: 0.071
[10,  8401] loss: 0.082
[10,  8701] loss: 0.078
[10,  9001] loss: 0.079
[10,  9301] loss: 0.081
[10,  9601] loss: 0.078
[10,  9901] loss: 0.075
[10, 10201] loss: 0.075
[10, 10501] loss: 0.075
[10, 10801] loss: 0.077
[10, 11101] loss: 0.073
[10, 11401] loss: 0.072
[10, 11701] loss: 0.076
[10, 12001] loss: 0.076
[10, 12301] loss

100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [02:54<00:00, 17.40s/it]

Training is finished!



