Реализовать с помощью `Numpy` класс `MyMLP`, моделирующий работу полносвязной нейронной сети.

Реализуемый класс должен

1. Поддерживать создание любого числа слоев с любым числом нейронов. Тип инициализации весов не регламентируется.
2. Обеспечивать выбор следующих функции активации в рамках каждого слоя: `ReLU`, `sigmoid`, `linear`.
3. Поддерживать решение задачи классификации и регрессии (выбор соответствующего лосса, в том числе для задачи многоклассовой классификации).
4. В процессе обучения использовать самостоятельно реализованный механизм обратного распространения (вывод формул в формате markdown) для применения градиентного и стохастического градиентного спусков (с выбором размера батча)
5. Поддерживать использование `l1`, `l2` и `l1l2` регуляризаций.

Самостоятельно выбрать наборы данных (классификация и регрессия). Провести эксперименты (различные конфигурации сети: количество слоев, нейронов, функции активации, скорость обучения и тп. — минимум 5 различных конфигураций) и сравнить результаты работы (оценка качества модели + время обучения и инференса) реализованного класса `MyMLP` со следующими моделям (в одинаковых конфигурациях):

*   MLPClassifier/MLPRegressor из sklearn
*   TensorFlow
*   Keras
*   PyTorch

Результат представить в виде .ipynb блокнота, содержащего весь необходимый код и визуализации сравнения реализаций для рассмотренных конфигураций.


# Формулы обратного распространения ошибки для GD и SGD

## 1. Градиентный спуск

#### Прямой проход:
Для каждого слоя нейронной сети вычисляется линейная комбинация входных данных:
$$
z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]}
$$
где:
- $W^{[l]}$ — веса слоя $l$,
- $a^{[l-1]}$ — активации предыдущего слоя,
- $b^{[l]}$ — смещения слоя $l$.

#### Активизация:
Применение функции активации $f$:
$$
a^{[l]} = f(z^{[l]})
$$
где $f$ — функция активации (например, ReLU, sigmoid).

#### Обратный проход:
Для выходного слоя ($L$-го слоя) вычисляется ошибка:
$$
\delta^{[L]} = \frac{\partial \mathcal{L}}{\partial z^{[L]}} = \frac{\partial \mathcal{L}}{\partial a^{[L]}} \cdot f'(z^{[L]})
$$
где $\mathcal{L}$ — функция потерь (например, MSE или кросс-энтропия), $f'(z^{[L]})$ — производная функции активации.

Для скрытых слоев ($l = L-1, L-2, \dots, 1$):
$$
\delta^{[l]} = \left(W^{[l+1]}\right)^T \delta^{[l+1]} \cdot f'(z^{[l]})
$$
где $\delta^{[l]}$ — ошибка на слое $l$.

#### Градиенты для весов и смещений:
Градиенты для весов:
$$
\frac{\partial \mathcal{L}}{\partial W^{[l]}} = \frac{1}{m} a^{[l-1]} \delta^{[l]^T}
$$
где $m$ — количество примеров в обучающей выборке.
Градиенты для смещений:
$$
\frac{\partial \mathcal{L}}{\partial b^{[l]}} = \frac{1}{m} \sum_{i=1}^{m} \delta^{[l]}
$$

#### Обновление параметров:
Обновление параметров с использованием градиентного спуска:
$$
W^{[l]} = W^{[l]} - \alpha \cdot \frac{\partial \mathcal{L}}{\partial W^{[l]}}
$$
$$
b^{[l]} = b^{[l]} - \alpha \cdot \frac{\partial \mathcal{L}}{\partial b^{[l]}}
$$
где $\alpha$ — это скорость обучения.

### Итоговая формула для обновления:
$$
W^{[l]} = W^{[l]} - \frac{\alpha}{m} \sum_{i=1}^{m} \frac{\partial \mathcal{L}}{\partial W^{[l]}_i}
$$
$$
b^{[l]} = b^{[l]} - \frac{\alpha}{m} \sum_{i=1}^{m} \frac{\partial \mathcal{L}}{\partial b^{[l]}_i}
$$

## 2. Стохастический градиентный спуск

#### Прямой проход: как и в случае с обычным градиентным спуском.

#### Обратный проход: аналогично обычному градиентному спуску, но градиенты вычисляются только по одному примеру (или батчу).

#### Градиенты для весов и смещений:
Для одного примера или батча $i$ (в случае батча градиенты усредняются по батчу):
- Градиенты для весов:
$$
\frac{\partial \mathcal{L}}{\partial W^{[l]}_i} = a^{[l-1]} \delta^{[l]^T}
$$
- Градиенты для смещений:
$$
\frac{\partial \mathcal{L}}{\partial b^{[l]}_i} = \delta^{[l]}
$$

#### Обновление параметров:
Для каждого примера $i$ параметры обновляются как:
$$
W^{[l]} = W^{[l]} - \alpha \cdot \frac{\partial \mathcal{L}}{\partial W^{[l]}_i}
$$
$$
b^{[l]} = b^{[l]} - \alpha \cdot \frac{\partial \mathcal{L}}{\partial b^{[l]}_i}
$$

### Итоговая формула для обновления (SGD):
$$
W^{[l]} = W^{[l]} - \alpha \cdot \frac{\partial \mathcal{L}}{\partial W^{[l]}}
$$
$$
b^{[l]} = b^{[l]} - \alpha \cdot \frac{\partial \mathcal{L}}{\partial b^{[l]}}
$$
где градиенты вычисляются для одного примера (или батча).

# Класс MyMLP

In [6]:
import numpy as np

class MyMLP:
    def __init__(self, layers, activations, loss='mse', lr=0.01, reg=None, reg_lambda=0.01):
        self.layers = layers
        self.activations = activations
        self.loss = loss
        self.lr = lr
        self.reg = reg
        self.reg_lambda = reg_lambda
        self.weights = []
        self.biases = []
        for i in range(len(layers) - 1):
            # Инициализация весов методом He для ReLU и Xavier для сигмоиды
            if activations[i] == 'relu':
                self.weights.append(np.random.randn(layers[i], layers[i + 1]) * np.sqrt(2. / layers[i]))
            else:
                self.weights.append(np.random.randn(layers[i], layers[i + 1]) * np.sqrt(1. / layers[i]))
            self.biases.append(np.zeros((1, layers[i + 1])))

    def _activation(self, x, func):
        if func == 'relu':
            return np.maximum(0, x)
        elif func == 'sigmoid':
            return 1 / (1 + np.exp(-x))
        elif func == 'linear':
            return x
        elif func == 'softmax':
            exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
            return exp_x / np.sum(exp_x, axis=1, keepdims=True)

    def _activation_derivative(self, z, func):
        if func == 'relu':
            return (z > 0).astype(float)
        elif func == 'sigmoid':
            sig = 1 / (1 + np.exp(-z))
            return sig * (1 - sig)
        elif func == 'linear':
            return np.ones_like(z)
        elif func == 'softmax':
            # Производная softmax учитывается отдельно при использовании crossentropy
            return np.ones_like(z)

    def _loss(self, y_true, y_pred):
        if self.loss == 'mse':
            return np.mean((y_true - y_pred) ** 2)
        elif self.loss == 'crossentropy':
            return -np.mean(y_true * np.log(y_pred + 1e-9))

    def _loss_derivative(self, y_true, y_pred):
        if self.loss == 'mse':
            return (y_pred - y_true) * 2 / y_true.shape[0]
        elif self.loss == 'crossentropy':
            return (y_pred - y_true) / y_true.shape[0]

    def forward(self, X):
        activations = [X]
        pre_activations = []
        for i in range(len(self.weights)):
            z = np.dot(activations[-1], self.weights[i]) + self.biases[i]
            pre_activations.append(z)
            activations.append(self._activation(z, self.activations[i]))
        return activations, pre_activations

    def predict(self, X):
        activations, _ = self.forward(X)
        return activations[-1]

    def backward(self, X, y):
        m = X.shape[0]
        activations, pre_activations = self.forward(X)
        grads_w = []
        grads_b = []

        # Обработка выходного слоя
        if self.activations[-1] == 'softmax' and self.loss == 'crossentropy':
            dz = (activations[-1] - y) / m
        else:
            loss_deriv = self._loss_derivative(y, activations[-1])
            act_deriv = self._activation_derivative(pre_activations[-1], self.activations[-1])
            dz = loss_deriv * act_deriv

        # Обратное распространение
        for i in reversed(range(len(self.weights))):
            grad_w = np.dot(activations[i].T, dz)
            grad_b = np.sum(dz, axis=0, keepdims=True)
            grads_w.insert(0, grad_w)
            grads_b.insert(0, grad_b)

            if i > 0:
                dz = np.dot(dz, self.weights[i].T)
                dz *= self._activation_derivative(pre_activations[i-1], self.activations[i-1])

        return grads_w, grads_b

    def update_params(self, grads_w, grads_b):
        for i in range(len(self.weights)):
            grad_w = grads_w[i]
            grad_b = grads_b[i]

            # Gradient clipping
            clip_value = 1.0  # Adjust this value as needed
            grad_w = np.clip(grad_w, -clip_value, clip_value)
            grad_b = np.clip(grad_b, -clip_value, clip_value)

            # Apply regularization
            if self.reg == 'l1':
                grad_w += self.reg_lambda * np.sign(self.weights[i])
            elif self.reg == 'l2':
                grad_w += self.reg_lambda * self.weights[i]
            elif self.reg == 'l1l2':
                grad_w += self.reg_lambda * (np.sign(self.weights[i]) + self.weights[i])

            # Update parameters
            self.weights[i] -= self.lr * grad_w
            self.biases[i] -= self.lr * grad_b

    def train(self, X, y, epochs=1000, batch_size=None):
        m = X.shape[0]
        for epoch in range(epochs):
            if batch_size:
                indices = np.random.permutation(m)
                X_shuffled = X[indices]
                y_shuffled = y[indices]
                for i in range(0, m, batch_size):
                    X_batch = X_shuffled[i:i+batch_size]
                    y_batch = y_shuffled[i:i+batch_size]
                    grads_w, grads_b = self.backward(X_batch, y_batch)
                    self.update_params(grads_w, grads_b)
            else:
                grads_w, grads_b = self.backward(X, y)
                self.update_params(grads_w, grads_b)

            if epoch % 100 == 0 or epoch == epochs - 1:
                y_pred = self.predict(X)
                loss = self._loss(y, y_pred)
                print(f'Epoch {epoch}, Loss: {loss:.6f}')


# Классификация

In [21]:
configurations_classification = [
    {'layers': [4, 10, 3], 'activations': ['relu', 'softmax'], 'lr': 0.01},  # Конфигурация 1
    {'layers': [4, 20, 10, 3], 'activations': ['relu', 'relu', 'softmax'], 'lr': 0.01},  # Конфигурация 2
    {'layers': [4, 50, 50, 50, 3], 'activations': ['relu', 'relu', 'relu', 'softmax'], 'lr': 0.001},  # Конфигурация 3
    {'layers': [4, 5, 3], 'activations': ['relu', 'softmax'], 'lr': 0.1},  # Конфигурация 4
    {'layers': [4, 20, 3], 'activations': ['sigmoid', 'softmax'], 'lr': 0.01},  # Конфигурация 5
]


In [19]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.metrics import accuracy_score

# Загрузка данных
iris = load_iris()
X = iris.data
y = iris.target.reshape(-1, 1)

# Масштабирование
scaler = StandardScaler()
X = scaler.fit_transform(X)

# One-hot encoding
encoder = OneHotEncoder(sparse_output=False)
y_encoded = encoder.fit_transform(y)

# Разделение
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42)


## MLP_классификатор

In [25]:
for idx, config in enumerate(configurations_classification, 1):
    print(f"\nTraining configuration {idx}: {config}")
    model = MyMLP(layers=config['layers'], activations=config['activations'], lr=config['lr'])
    model.train(X_train, y_train, epochs=100, batch_size=32)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_pred, axis=1))
    print(f"Accuracy: {accuracy:.4f}")


Training configuration 1: {'layers': [4, 10, 3], 'activations': ['relu', 'softmax'], 'lr': 0.01}
Epoch 0, Loss: 0.309919
Epoch 99, Loss: 0.061677
Accuracy: 0.9333

Training configuration 2: {'layers': [4, 20, 10, 3], 'activations': ['relu', 'relu', 'softmax'], 'lr': 0.01}
Epoch 0, Loss: 0.248451
Epoch 99, Loss: 0.037798
Accuracy: 1.0000

Training configuration 3: {'layers': [4, 50, 50, 50, 3], 'activations': ['relu', 'relu', 'relu', 'softmax'], 'lr': 0.001}
Epoch 0, Loss: 0.244415
Epoch 99, Loss: 0.065695
Accuracy: 0.9000

Training configuration 4: {'layers': [4, 5, 3], 'activations': ['relu', 'softmax'], 'lr': 0.1}
Epoch 0, Loss: 0.151068
Epoch 99, Loss: 0.013250
Accuracy: 0.9667

Training configuration 5: {'layers': [4, 20, 3], 'activations': ['sigmoid', 'softmax'], 'lr': 0.01}
Epoch 0, Loss: 0.224271
Epoch 99, Loss: 0.093477
Accuracy: 0.9333


In [26]:
from sklearn.neural_network import MLPClassifier, MLPRegressor
for idx, config in enumerate(configurations_classification, 1):
    hidden_layers = tuple(config['layers'][1:-1])  # исключаем входной и выходной слои
    clf = MLPClassifier(hidden_layer_sizes=hidden_layers,
                        max_iter=1000,
                        learning_rate_init=config['lr'],
                        activation='relu',
                        solver='adam',
                        random_state=42)

    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Configuration {idx} | Hidden layers: {hidden_layers} | Accuracy: {accuracy:.4f}")


Configuration 1 | Hidden layers: (10,) | Accuracy: 1.0000
Configuration 2 | Hidden layers: (20, 10) | Accuracy: 1.0000
Configuration 3 | Hidden layers: (50, 50, 50) | Accuracy: 0.9333
Configuration 4 | Hidden layers: (5,) | Accuracy: 1.0000
Configuration 5 | Hidden layers: (20,) | Accuracy: 1.0000


## Tensorflow_классификатор

In [28]:
for idx, config in enumerate(configurations_classification, 1):
    print(f"\nTraining configuration {idx}: {config}")

    model = Sequential()
    for i in range(1, len(config['layers'])):
        input_dim = config['layers'][0] if i == 1 else None
        model.add(Dense(config['layers'][i],
                        activation=config['activations'][i - 1],
                        input_dim=input_dim if input_dim else None))

    model.compile(optimizer=Adam(learning_rate=config['lr']),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    start_time = time.time()
    model.fit(X_train, y_train, epochs=100, batch_size=32, verbose=0)
    training_time = time.time() - start_time

    y_pred = model.predict(X_test)
    accuracy = np.mean(np.argmax(y_test, axis=1) == np.argmax(y_pred, axis=1))

    print(f"TensorFlow/Keras accuracy: {accuracy:.4f}")



Training configuration 1: {'layers': [4, 10, 3], 'activations': ['relu', 'softmax'], 'lr': 0.01}
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 234ms/step
TensorFlow/Keras accuracy: 1.0000

Training configuration 2: {'layers': [4, 20, 10, 3], 'activations': ['relu', 'relu', 'softmax'], 'lr': 0.01}
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 248ms/step
TensorFlow/Keras accuracy: 1.0000

Training configuration 3: {'layers': [4, 50, 50, 50, 3], 'activations': ['relu', 'relu', 'relu', 'softmax'], 'lr': 0.001}
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 311ms/step
TensorFlow/Keras accuracy: 1.0000

Training configuration 4: {'layers': [4, 5, 3], 'activations': ['relu', 'softmax'], 'lr': 0.1}
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 224ms/step
TensorFlow/Keras accuracy: 1.0000

Training configuration 5: {'layers': [4, 20, 3], 'activations': ['sigmoid', 'softmax'], 'lr': 0.01}




[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 190ms/step
TensorFlow/Keras accuracy: 1.0000


## PyTorch_классификатор

In [30]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32)


# Функция активации
def get_activation(name):
    if name == 'relu':
        return nn.ReLU()
    elif name == 'sigmoid':
        return nn.Sigmoid()
    elif name == 'softmax':
        return nn.Softmax(dim=1)
    else:
        raise ValueError(f"Unknown activation: {name}")


# Цикл обучения по конфигурациям
for idx, config in enumerate(configurations_classification, 1):
    print(f"\nTraining configuration {idx}: {config}")

    class DynamicMLP(nn.Module):
        def __init__(self, layers, activations):
            super(DynamicMLP, self).__init__()
            self.layers = nn.ModuleList()
            for i in range(1, len(layers)):
                self.layers.append(nn.Linear(layers[i-1], layers[i]))
                if i <= len(activations):
                    self.layers.append(get_activation(activations[i-1]))

        def forward(self, x):
            for layer in self.layers:
                x = layer(x)
            return x

    model = DynamicMLP(config['layers'], config['activations'])
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=config['lr'])

    # Преобразуем one-hot y_train в индексы классов
    y_train_classes = torch.argmax(y_train, dim=1)
    y_test_classes = torch.argmax(y_test, dim=1)

    # Обучение
    model.train()
    start_time = time.time()
    for epoch in range(100):
        optimizer.zero_grad()
        outputs = model(X_train)
        loss = criterion(outputs, y_train_classes)
        loss.backward()
        optimizer.step()
    training_time = time.time() - start_time

    # Предсказание
    model.eval()
    with torch.no_grad():
        outputs = model(X_test)
        predicted = torch.argmax(outputs, dim=1)
        accuracy = (predicted == y_test_classes).float().mean().item()

    print(f"PyTorch accuracy: {accuracy:.4f}")
    print(f"Training time: {training_time:.2f}s")



Training configuration 1: {'layers': [4, 10, 3], 'activations': ['relu', 'softmax'], 'lr': 0.01}
PyTorch accuracy: 1.0000
Training time: 0.18s

Training configuration 2: {'layers': [4, 20, 10, 3], 'activations': ['relu', 'relu', 'softmax'], 'lr': 0.01}
PyTorch accuracy: 1.0000
Training time: 0.12s

Training configuration 3: {'layers': [4, 50, 50, 50, 3], 'activations': ['relu', 'relu', 'relu', 'softmax'], 'lr': 0.001}
PyTorch accuracy: 1.0000
Training time: 0.15s

Training configuration 4: {'layers': [4, 5, 3], 'activations': ['relu', 'softmax'], 'lr': 0.1}
PyTorch accuracy: 1.0000
Training time: 0.09s

Training configuration 5: {'layers': [4, 20, 3], 'activations': ['sigmoid', 'softmax'], 'lr': 0.01}
PyTorch accuracy: 1.0000
Training time: 0.10s


## Итог_классификация

In [33]:
#Резы по PyTorch - на 300 эпохах ( на 10 было несравнимо много с keras), по логам обучения можно посмотреть, что где-то даже можно больше эпох накинуть
import pandas as pd
results_df = pd.DataFrame(columns=['Layers', 'Activations', 'LR', 'Epochs', 'Acc_torch', 'Acc_keras', 'Acc_MLP'])

new_data = [
    {'Layers': [8, 10, 1], 'Activations': ['relu', 'linear'], 'LR': 0.01, 'Epochs': 100, 'Acc_torch': 1, 'Acc_keras': 1, 'Acc_MLP': 1, 'Acc_MyMLP': 0.9333},
    {'Layers': [8, 20, 10, 1], 'Activations': ['relu', 'relu', 'linear'], 'LR': 0.001, 'Epochs': 100, 'Acc_torch': 1, 'Acc_keras': 1, 'Acc_MLP': 0.9333, 'Acc_MyMLP': 1},
    {'Layers': [8, 50, 50, 50, 1], 'Activations': ['relu', 'relu', 'relu', 'linear'], 'LR': 0.001, 'Epochs': 100, 'Acc_torch': 1,'Acc_keras': 1, 'Acc_MLP': 1, 'Acc_MyMLP': 0.900},
    {'Layers': [8, 5, 1], 'Activations': ['relu', 'linear'], 'LR': 0.1, 'Epochs': 100, 'Acc_torch': 1, 'Acc_keras': 1, 'Acc_MLP': 1, 'Acc_MyMLP': 0.9667},
    {'Layers': [8, 15, 1], 'Activations': ['relu', 'linear'], 'LR': 0.01, 'Epochs': 100, 'Acc_torch': 1, 'Acc_keras': 1, 'Acc_MLP': 1, 'Acc_MyMLP': 0.9333},
]

results_df = pd.concat([results_df, pd.DataFrame(new_data)], ignore_index=True)
results_df.head(4)

  results_df = pd.concat([results_df, pd.DataFrame(new_data)], ignore_index=True)


Unnamed: 0,Layers,Activations,LR,Epochs,Acc_torch,Acc_keras,Acc_MLP,Acc_MyMLP
0,"[8, 10, 1]","[relu, linear]",0.01,100,1,1,1.0,0.9333
1,"[8, 20, 10, 1]","[relu, relu, linear]",0.001,100,1,1,0.9333,1.0
2,"[8, 50, 50, 50, 1]","[relu, relu, relu, linear]",0.001,100,1,1,1.0,0.9
3,"[8, 5, 1]","[relu, linear]",0.1,100,1,1,1.0,0.9667


# Регрессия

In [7]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import time

data = fetch_california_housing()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
y_train = y_train.reshape(-1, 1)
y_test = y_test.reshape(-1, 1)


In [8]:
configurations_regression = [
    {'layers': [8, 10, 1], 'activations': ['relu', 'linear'], 'lr': 0.01},  # Конфигурация 1
    {'layers': [8, 20, 10, 1], 'activations': ['relu', 'relu', 'linear'], 'lr': 0.001},  # Конфигурация 2
    {'layers': [8, 50, 50, 50, 1], 'activations': ['relu', 'relu', 'relu', 'linear'], 'lr': 0.001},  # Конфигурация 3
    {'layers': [8, 5, 1], 'activations': ['relu', 'linear'], 'lr': 0.1},  # Конфигурация 4
    {'layers': [8, 15, 1], 'activations': ['relu', 'linear'], 'lr': 0.01},  # Конфигурация 5
]


## My_MLP_регрессия

In [9]:
# Использование конфигураций для регрессии
for config in configurations_regression:
    print(f"Training configuration: {config}")
    model = MyMLP(layers=config['layers'], activations=config['activations'], lr=config['lr'])
    model.train(X_train, y_train, epochs=1000, batch_size=32)
    mse = mean_squared_error(y_test, model.predict(X_test))
    print(f"MSE: {mse}")


Training configuration: {'layers': [8, 10, 1], 'activations': ['relu', 'linear'], 'lr': 0.01}
Epoch 0, Loss: 130.492173
Epoch 100, Loss: 277.488368
Epoch 200, Loss: 275.954520
Epoch 300, Loss: 277.616920
Epoch 400, Loss: 281.818455
Epoch 500, Loss: 276.396736
Epoch 600, Loss: 278.244826
Epoch 700, Loss: 278.584287
Epoch 800, Loss: 261.692521
Epoch 900, Loss: 340.881378
Epoch 999, Loss: 344.727979
MSE: 333.812538564317
Training configuration: {'layers': [8, 20, 10, 1], 'activations': ['relu', 'relu', 'linear'], 'lr': 0.001}
Epoch 0, Loss: 20.601574
Epoch 100, Loss: 10.091526
Epoch 200, Loss: 12.805505
Epoch 300, Loss: 12.736185
Epoch 400, Loss: 12.611076
Epoch 500, Loss: 12.440230
Epoch 600, Loss: 12.543719
Epoch 700, Loss: 12.559076
Epoch 800, Loss: 10.288020
Epoch 900, Loss: 10.610666
Epoch 999, Loss: 10.285530
MSE: 9.946234540504909
Training configuration: {'layers': [8, 50, 50, 50, 1], 'activations': ['relu', 'relu', 'relu', 'linear'], 'lr': 0.001}
Epoch 0, Loss: 3468.881168
Epoch 1

In [13]:
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error

for i, config in enumerate(configurations_regression, 1):
    # Extract hidden layers (excluding input and output)
    hidden_layers = tuple(config['layers'][1:-1])  # [8, 10, 1] -> (10,)

    activation = config['activations'][0]  # Only one activation can be set
    activation_map = {'relu': 'relu', 'tanh': 'tanh', 'sigmoid': 'logistic', 'linear': 'identity'}
    activation_sklearn = activation_map.get(activation, 'relu')  # Default to relu if unknown

    reg = MLPRegressor(
        hidden_layer_sizes=hidden_layers,
        activation=activation_sklearn,
        learning_rate_init=config['lr'],
        max_iter=1000,
        random_state=42
    )

    reg.fit(X_train, y_train)
    y_pred = reg.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    print(f"Config {i} MSE: {mse:.4f}")


  y = column_or_1d(y, warn=True)


Config 1 MSE: 1.0401


  y = column_or_1d(y, warn=True)


Config 2 MSE: 0.5855


  y = column_or_1d(y, warn=True)


Config 3 MSE: 0.8006


  y = column_or_1d(y, warn=True)


Config 4 MSE: 1.3127


  y = column_or_1d(y, warn=True)


Config 5 MSE: 0.8812


## Keras_регрессия

In [None]:
import plotly.graph_objects as go
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from sklearn.metrics import mean_squared_error
import numpy as np

configurations_regression = [
    {'layers': [8, 10, 1], 'activations': ['relu', 'linear'], 'lr': 0.01},
    {'layers': [8, 20, 10, 1], 'activations': ['relu', 'relu', 'linear'], 'lr': 0.001},
    {'layers': [8, 50, 50, 50, 1], 'activations': ['relu', 'relu', 'relu', 'linear'], 'lr': 0.001},
    {'layers': [8, 5, 1], 'activations': ['relu', 'linear'], 'lr': 0.1},
    {'layers': [8, 15, 1], 'activations': ['relu', 'linear'], 'lr': 0.01},
]

# Для хранения потерь и MSE на тестовом наборе
history_dict = {}
mse_dict = {}

for i, config in enumerate(configurations_regression):
    print(f"\nTraining Keras model for configuration {i + 1}: {config}")

    # Создание модели
    model = Sequential()
    for j in range(1, len(config['layers'])):
        if j == 1:
            # Первый слой с input_dim
            model.add(Dense(config['layers'][j], activation=config['activations'][j - 1],
                            input_dim=config['layers'][j - 1]))
        else:
            model.add(Dense(config['layers'][j], activation=config['activations'][j - 1]))

    # Компиляция модели
    model.compile(optimizer=Adam(learning_rate=config['lr']), loss='mean_squared_error')

    # Обучение модели с сохранением истории
    history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_test, y_test), verbose=0)

    # Сохраняем историю
    history_dict[i] = history.history

    # Прогнозирование и вывод MSE
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_dict[i] = mse
    print(f"MSE_keras: {mse:.4f}")


Training Keras model for configuration 1: {'layers': [8, 10, 1], 'activations': ['relu', 'linear'], 'lr': 0.01}



Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



[1m194/194[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
MSE_keras: 0.5229

Training Keras model for configuration 2: {'layers': [8, 20, 10, 1], 'activations': ['relu', 'relu', 'linear'], 'lr': 0.001}



Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



[1m194/194[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
MSE_keras: 0.7026

Training Keras model for configuration 3: {'layers': [8, 50, 50, 50, 1], 'activations': ['relu', 'relu', 'relu', 'linear'], 'lr': 0.001}



Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



[1m194/194[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
MSE_keras: 0.4863

Training Keras model for configuration 4: {'layers': [8, 5, 1], 'activations': ['relu', 'linear'], 'lr': 0.1}



Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



[1m194/194[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
MSE_keras: 1.3648

Training Keras model for configuration 5: {'layers': [8, 15, 1], 'activations': ['relu', 'linear'], 'lr': 0.01}



Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



[1m194/194[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
MSE_keras: 0.5220


In [None]:
for i, history in history_dict.items():
    # Проверим, что в истории есть нужные ключи
    if 'loss' in history and 'val_loss' in history:
        fig = go.Figure()

        # График для Training Loss
        fig.add_trace(go.Scatter(x=np.arange(1, 101), y=history['loss'], mode='lines', name=f"Train Loss Config {i+1}"))

        # График для Validation Loss
        fig.add_trace(go.Scatter(x=np.arange(1, 101), y=history['val_loss'], mode='lines', name=f"Validation Loss Config {i+1}"))

        # Обновляем макет
        fig.update_layout(
            title=f"Training and Validation Loss - Configuration {i+1}",
            xaxis_title="Epochs",
            yaxis_title="Loss",
            showlegend=True,
            yaxis=dict(
                range=[0, 5]  # Ограничиваем ось Y в диапазоне от 0 до 1 (вы можете настроить значения)
            )
        )

        fig.show()
    else:
        print(f"Warning: Missing loss or val_loss for configuration {i+1}")


## PyTorch_регрессия

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.metrics import mean_squared_error
import numpy as np
import plotly.graph_objects as go

# Преобразуем данные
X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32).view(-1, 1)
X_test_t = torch.tensor(X_test, dtype=torch.float32)
y_test_np = y_test  # Для метрики

# Построение модели из конфигурации
def build_model(layer_sizes, activations):
    layers = []
    for i in range(len(layer_sizes) - 1):
        layers.append(nn.Linear(layer_sizes[i], layer_sizes[i + 1]))
        act = activations[i]
        if act == 'relu':
            layers.append(nn.ReLU())
        elif act == 'sigmoid':
            layers.append(nn.Sigmoid())
        elif act == 'tanh':
            layers.append(nn.Tanh())
        elif act == 'linear':
            pass  # ничего не добавляем
    return nn.Sequential(*layers)

# Для хранения потерь для каждой конфигурации
history_dict = {}

# Обучение по конфигурациям
for i, config in enumerate(configurations_regression):
    print(f"\nTraining PyTorch model for configuration {i + 1}: {config}")

    model = build_model(config['layers'], config['activations'])
    optimizer = optim.Adam(model.parameters(), lr=config['lr'])
    loss_fn = nn.MSELoss()

    # Список для сохранения потерь
    epoch_losses = []

    # Обучение
    for epoch in range(300):
        model.train()
        optimizer.zero_grad()
        y_pred = model(X_train_t)
        loss = loss_fn(y_pred, y_train_t)
        loss.backward()
        optimizer.step()

        epoch_losses.append(loss.item())

        if epoch % 20 == 0:
            print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

    # Прогноз
    model.eval()
    with torch.no_grad():
        y_pred_test = model(X_test_t).numpy()
        mse = mean_squared_error(y_test_np, y_pred_test)
        print(f"Test MSE: {mse:.4f}")

    # Сохраняем потери в history_dict
    history_dict[i] = epoch_losses



Training PyTorch model for configuration 1: {'layers': [8, 10, 1], 'activations': ['relu', 'linear'], 'lr': 0.01}
Epoch 0, Loss: 12115.7939
Epoch 20, Loss: 317.1199
Epoch 40, Loss: 18.9272
Epoch 60, Loss: 11.0098
Epoch 80, Loss: 5.8865
Epoch 100, Loss: 5.3141
Epoch 120, Loss: 4.7932
Epoch 140, Loss: 4.3627
Epoch 160, Loss: 3.9887
Epoch 180, Loss: 3.6698
Epoch 200, Loss: 3.3956
Epoch 220, Loss: 3.1577
Epoch 240, Loss: 2.9492
Epoch 260, Loss: 2.7638
Epoch 280, Loss: 2.5972
Test MSE: 2.3095

Training PyTorch model for configuration 2: {'layers': [8, 20, 10, 1], 'activations': ['relu', 'relu', 'linear'], 'lr': 0.001}
Epoch 0, Loss: 392.1939
Epoch 20, Loss: 6.6562
Epoch 40, Loss: 4.2861
Epoch 60, Loss: 2.3199
Epoch 80, Loss: 1.8164
Epoch 100, Loss: 1.7009
Epoch 120, Loss: 1.6549
Epoch 140, Loss: 1.6204
Epoch 160, Loss: 1.5903
Epoch 180, Loss: 1.5617
Epoch 200, Loss: 1.5337
Epoch 220, Loss: 1.5057
Epoch 240, Loss: 1.4775
Epoch 260, Loss: 1.4488
Epoch 280, Loss: 1.4192
Test MSE: 1.3483

Trai

In [None]:
#На 300 эпохах
# Теперь строим графики для каждой конфигурации
for i, epoch_losses in history_dict.items():
    # График потерь для каждой конфигурации
    fig = go.Figure()

    # График для потерь
    fig.add_trace(go.Scatter(x=np.arange(1, len(epoch_losses) + 1), y=epoch_losses, mode='lines', name=f"Loss Config {i + 1}"))

    fig.update_layout(
        title=f"Training Loss - Configuration {i + 1}",
        xaxis_title="Epochs",
        yaxis_title="Loss",
        yaxis=dict(
        range=[0, 10]
            ),
        showlegend=True
    )

    fig.show()

## Итог_регрессия

In [15]:
#Резы по PyTorch - на 300 эпохах ( на 10 было несравнимо много с keras), по логам обучения можно посмотреть, что где-то даже можно больше эпох накинуть
import pandas as pd
results_df = pd.DataFrame(columns=['Layers', 'Activations', 'LR', 'Epochs', 'MSE_torch', 'MSE_keras', 'MSE_MLP'])

new_data = [
    {'Layers': [8, 10, 1], 'Activations': ['relu', 'linear'], 'LR': 0.01, 'Epochs': 100, 'MSE_torch': 1.1269, 'MSE_keras': 0.5494, 'MSE_MyMLP': 333.8125, 'MSE_MLP': 1.0401},
    {'Layers': [8, 20, 10, 1], 'Activations': ['relu', 'relu', 'linear'], 'LR': 0.001, 'Epochs': 100, 'MSE_torch': 1.2756, 'MSE_keras': 0.5897, 'MSE_MyMLP': 9.9462, 'MSE_MLP': 0.5855},
    {'Layers': [8, 50, 50, 50, 1], 'Activations': ['relu', 'relu', 'relu', 'linear'], 'LR': 0.001, 'Epochs': 100, 'MSE_torch': 0.6986, 'MSE_keras': 0.4967, 'MSE_MyMLP': 1.5595, 'MSE_MLP': 0.8006},
    {'Layers': [8, 5, 1], 'Activations': ['relu', 'linear'], 'LR': 0.1, 'Epochs': 100, 'MSE_torch': 1.3125, 'MSE_keras': 1.3253, 'MSE_MyMLP': 22505.8188, 'MSE_MLP': 1.3127},
    {'Layers': [8, 15, 1], 'Activations': ['relu', 'linear'], 'LR': 0.01, 'Epochs': 100, 'MSE_torch': 3.0764, 'MSE_keras': 1.3126, 'MSE_MyMLP': 2097.4168, 'MSE_MLP': 0.8812},
]

results_df = pd.concat([results_df, pd.DataFrame(new_data)], ignore_index=True)
results_df.head(4)

  results_df = pd.concat([results_df, pd.DataFrame(new_data)], ignore_index=True)


Unnamed: 0,Layers,Activations,LR,Epochs,MSE_torch,MSE_keras,MSE_MLP,MSE_MyMLP
0,"[8, 10, 1]","[relu, linear]",0.01,100,1.1269,0.5494,1.0401,333.8125
1,"[8, 20, 10, 1]","[relu, relu, linear]",0.001,100,1.2756,0.5897,0.5855,9.9462
2,"[8, 50, 50, 50, 1]","[relu, relu, relu, linear]",0.001,100,0.6986,0.4967,0.8006,1.5595
3,"[8, 5, 1]","[relu, linear]",0.1,100,1.3125,1.3253,1.3127,22505.8188
