# Linear Regression, Regularization урок

### Завдання: 

Для датасету:<br>
https://archive.ics.uci.edu/ml/datasets/wine+quality<br>

побудувати модель лінійної регресії<br>

Обов'язкові кроки:<br>

* первинний аналіз даних (відстуність пропусків, наявність категоріальних фіч, ...)<br>
* фича інжиніринг (побудувати  1-2 нові фічі)<br>
* масштабування фіч<br>
* поділ датасету на тренувальну, валідаційну та тестову частини<br>
* тренування базової моделі із дефолтними гіперпараметрами<br>
* підбір гіперпараметрів <br>
* оцінка результатів<br>

---
<a name="0"/>

### Зміст: <br>
* 1. [Імпорт данних](#1)
* 2. [Первинний аналіз данних](#2)
* 3. [Фіча інжинірінг](#3)
* 4. [Масштабування фіч](#4)
* 5. [Поділ датасету на train, valid, test](#5)
* 6. [Тренування базової моделі(гіперпараметри=дефолтні)](#6)
* 7. [Подбір гіперпараметрів](#7)
* 8. [Оцінка результатів](#8)

Імпорт бібліотек

In [24]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen

from sklearn.preprocessing import (MaxAbsScaler,
                                    MinMaxScaler,
                                    StandardScaler,
                                    PowerTransformer,
                                    RobustScaler)
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, SGDRegressor, Ridge
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

Налаштування

In [2]:
sns.set()

# Графіка у форматі SVG більш чітка та розбірлива
%config InlineBackend.figure_format = 'svg'

---
<a name="1"/>

### 1. Імпорт данних
([зміст](#0))

In [47]:
# Посилання на архів
url = 'https://archive.ics.uci.edu/static/public/186/wine+quality.zip'

In [5]:
resp = urlopen(url)
myzip = ZipFile(BytesIO(resp.read()))
with myzip.open('winequality-red.csv') as myfile:
    #print(myfile.readline())
    red_wine = pd.read_csv(myfile, delimiter=';')

with myzip.open('winequality-white.csv') as myfile:
    #print(myfile.readline())
    white_wine = pd.read_csv(myfile, delimiter=';')

---
<a name="2"/>

### 2. Первинний аналіз данних
([зміст](#0))

In [6]:
red_wine.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [7]:
white_wine.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


Фічі (на основі фізико-хімічних тестів):<br>

| Фіча | Опис |
|---  |---
|fixed acidity |фіксована кислотність
|volatile acidity |летка кислотність
|citric acid |лимонна кислота
|residual sugar |залишковий цукор
|chlorides |хлориди
|free sulfur dioxide |вільний діоксид сірки
|total sulfur dioxide|загальний діоксид сірки
|density |щільність
|pH |pH
|sulphates|сульфати
|alcohol |спирт Вихідна змінна (на основі сенсорних даних)
|quality |якість (оцінка від 0 до 10)


In [8]:
red_wine.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         1599 non-null   float64
 1   volatile acidity      1599 non-null   float64
 2   citric acid           1599 non-null   float64
 3   residual sugar        1599 non-null   float64
 4   chlorides             1599 non-null   float64
 5   free sulfur dioxide   1599 non-null   float64
 6   total sulfur dioxide  1599 non-null   float64
 7   density               1599 non-null   float64
 8   pH                    1599 non-null   float64
 9   sulphates             1599 non-null   float64
 10  alcohol               1599 non-null   float64
 11  quality               1599 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 150.0 KB


In [9]:
white_wine.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4898 entries, 0 to 4897
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         4898 non-null   float64
 1   volatile acidity      4898 non-null   float64
 2   citric acid           4898 non-null   float64
 3   residual sugar        4898 non-null   float64
 4   chlorides             4898 non-null   float64
 5   free sulfur dioxide   4898 non-null   float64
 6   total sulfur dioxide  4898 non-null   float64
 7   density               4898 non-null   float64
 8   pH                    4898 non-null   float64
 9   sulphates             4898 non-null   float64
 10  alcohol               4898 non-null   float64
 11  quality               4898 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 459.3 KB


In [10]:
red_wine.describe()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
count,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0
mean,8.319637,0.527821,0.270976,2.538806,0.087467,15.874922,46.467792,0.996747,3.311113,0.658149,10.422983,5.636023
std,1.741096,0.17906,0.194801,1.409928,0.047065,10.460157,32.895324,0.001887,0.154386,0.169507,1.065668,0.807569
min,4.6,0.12,0.0,0.9,0.012,1.0,6.0,0.99007,2.74,0.33,8.4,3.0
25%,7.1,0.39,0.09,1.9,0.07,7.0,22.0,0.9956,3.21,0.55,9.5,5.0
50%,7.9,0.52,0.26,2.2,0.079,14.0,38.0,0.99675,3.31,0.62,10.2,6.0
75%,9.2,0.64,0.42,2.6,0.09,21.0,62.0,0.997835,3.4,0.73,11.1,6.0
max,15.9,1.58,1.0,15.5,0.611,72.0,289.0,1.00369,4.01,2.0,14.9,8.0


In [11]:
white_wine.describe()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
count,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0
mean,6.854788,0.278241,0.334192,6.391415,0.045772,35.308085,138.360657,0.994027,3.188267,0.489847,10.514267,5.877909
std,0.843868,0.100795,0.12102,5.072058,0.021848,17.007137,42.498065,0.002991,0.151001,0.114126,1.230621,0.885639
min,3.8,0.08,0.0,0.6,0.009,2.0,9.0,0.98711,2.72,0.22,8.0,3.0
25%,6.3,0.21,0.27,1.7,0.036,23.0,108.0,0.991723,3.09,0.41,9.5,5.0
50%,6.8,0.26,0.32,5.2,0.043,34.0,134.0,0.99374,3.18,0.47,10.4,6.0
75%,7.3,0.32,0.39,9.9,0.05,46.0,167.0,0.9961,3.28,0.55,11.4,6.0
max,14.2,1.1,1.66,65.8,0.346,289.0,440.0,1.03898,3.82,1.08,14.2,9.0


Виходячи з отриманної інформації можна побачити що пропусків немає. Всі фічі кількістні, `quality` - `target`. Єдине що, на мою думку, треба зробити, це поміняти тип данних фіч на `float32`, а `quality` на `int16` так буде трохи швидше працювати.

In [13]:
features = ['fixed acidity', 'volatile acidity', 'citric acid',
            'residual sugar', 'chlorides', 'free sulfur dioxide',
            'total sulfur dioxide', 'density', 'pH', 'sulphates',
            'alcohol']

def change_type(feature, data, datatype):
    data[feature] = data[feature].astype(datatype)

for feature in features:
    change_type(feature, red_wine, 'float32')
    change_type(feature, white_wine, 'float32')

change_type('quality', red_wine, 'int16')
change_type('quality', white_wine, 'int16')

---
<a name="3"/>

### 3. Фіча інжинірінг
([зміст](#0))

Створимо нові фічі. Наприклад: <br>

| Фіча | Опис | Формула
|---  |--- |---
|**other sulfur dioxide**| Інші діоксиди сульфату | **other sulfur dioxide = total sulfur dioxide - free sulfur dioxide**
| **pH/density**| Кислотність на щільність |**pH/density =  pH / density**

In [15]:
# Для червоного вина
red_wine['other sulfur dioxide'] = red_wine['total sulfur dioxide'] - red_wine['free sulfur dioxide']
red_wine['pH/density'] = red_wine['pH'] - red_wine['density']

# Для білого вина
white_wine['other sulfur dioxide'] = white_wine['total sulfur dioxide'] - white_wine['free sulfur dioxide']
white_wine['pH/density'] = white_wine['pH'] - white_wine['density']

In [16]:
red_wine.head(2)

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,other sulfur dioxide,pH/density
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,23.0,2.5122
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,42.0,2.2032


In [17]:
white_wine.head(2)

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,other sulfur dioxide,pH/density
0,7.0,0.27,0.36,20.700001,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6,125.0,1.999
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6,118.0,2.306


Перед закінченням, зробимо ще один об'єднанний датасет: **rw_wine**.

In [18]:
rw_wine = pd.concat([red_wine, white_wine], ignore_index=True)
rw_wine.shape

(6497, 14)

---
<a name="4"/>

### 4. Масштабування фіч
([зміст](#0))

Для виявлення кращого Scaler'a протестуємо декілька: <br>
* MaxAbsScaler      (MAS);
* MinMaxScaler      (MMS);
* StandardScaler    (SS);
* RobustScaler      (RS).

Функція спліту датасету на фічі та таргет

In [20]:
def get_split_dataset(dataset):
    X_value = dataset.iloc[:, [1,2,3,4,5,6,7,8,9,10,12,13]].values
    y_value = dataset.iloc[:, [-3]].values.ravel()
    return X_value, y_value

Функція маштабуваня фічів датасету

In [21]:
def get_scalered_dataset(dataset, scaler):
    data = scaler.fit_transform(dataset)
    return data

---
<a name="5"/>

### 5. Поділ датасету на train, valid, test
([зміст](#0))

Поділимо кожен наш датасет на 3 частини `train`, `valid`, `test`

In [23]:
# Функція розділення датасету на 3 частини: train, valid, test
def get_TVT_parts(X_value, y_value, test_size=0.2, valid_size=0.25):
    X_traine_valid, X_test, y_traine_valid, y_test = train_test_split(X_value, y_value, test_size=test_size)
    X_train, X_valid, y_train, y_valid = train_test_split(X_traine_valid, y_traine_valid, test_size=valid_size)
    return X_train, X_valid, X_test, y_train, y_valid, y_test

---
<a name="6"/>

### 6. Тренування базової моделі(гіперпараметри=дефолтні)
([зміст](#0))

Побудуємо декілька моделей:
* LinearRegression(аналітичне рішення);
* SGDRegressor;
* Ridge;

In [25]:
# Функція отримання помилки
def get_error(y_test, y_predict, error):
    error_value = error(y_test, y_predict)

    return error_value

---

### Лінійна регресія (Linear Regression)

In [27]:
# Функція отримання предикту Linear Regression
def get_predict_LR(X_train, y_train, X_test):
    LR = LinearRegression()
    LR = LR.fit(X_train, y_train)
    y_predict = LR.predict(X_test)

    return y_predict

---

### Стохастичний градієнтний спуск (SGDRegressor)

In [29]:
# Функція отримання предикту для SGDRegressor
def get_predict_SGD(X_train, y_train, X_test, loss='squared_error', penalty='l2', alpha=0.0001):
    SGD = SGDRegressor(loss=loss, penalty=penalty, alpha=alpha)
    SGD = SGD.fit(X_train, y_train)

    y_predict = SGD.predict(X_test)

    return y_predict

---

### Ridge

In [30]:
# Функція отримання Ridge предікту
def get_predict_Ridge(X_train, y_train, X_test, solver='auto', tol=1e-4, alpha=1.0):
    RDG = Ridge(solver=solver, tol=tol, alpha=alpha)
    RDG = RDG.fit(X_train, y_train)

    y_predict = RDG.predict(X_test)

    return y_predict

---
<a name="7"/>

### 7. Подбір гіперпараметрів
([зміст](#0))

### Стохастичний градієнтний спуск (SGDRegressor)

In [32]:
# Функція підбору гіперпараметрів для SGDRegressor
def tuneParams_SGDRegressor(X_traine, y_traine, X_valid, y_valid, error_func):
    loss_functions = ['squared_error', 'huber', 'epsilon_insensitive', 'squared_epsilon_insensitive']
    penaltis = ['l2', 'l1']
    alphas = [0.00001, 0.0001,0.001,0.01,0.1]
    params = {}
    for lf in loss_functions:
        for p in penaltis:
            for a in alphas:
                y_predict = get_predict_SGD(X_train, y_train, X_test, loss=lf, penalty=p, alpha=a)
                error_value = get_error(y_valid, y_predict, error_func)
                params[error_value] = [lf, p, a]
    all_params = dict(sorted(params.items()))
    best_SGD_error = list(all_params.keys())[0]
    best_SGD_params = all_params[best_SGD_error]
    return best_SGD_error, best_SGD_params[0], best_SGD_params[1], best_SGD_params[2]

---

### Ridge

In [34]:
def tuneParam_Ridge(X_traine, y_traine, X_valid, y_valid, error_func):
    alphas = [0.01,0.1, 1, 2, 5, 10]
    tols = [0.0001, 0.001,0.01,0.01]
    solverses = ['auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga']
    params = {}
    for a in alphas:
        for t in tols:
            for s in solverses:
                y_predict = get_predict_Ridge(X_traine, y_traine, X_valid, solver=s, tol=t, alpha=a)
                error_value = get_error(y_valid, y_predict, error_func)
                params[error_value] = [s, t, a]

    all_params = dict(sorted(params.items()))
    best_Ridgeerror = list(all_params.keys())[0]
    best_Ridge_params = all_params[best_Ridgeerror]
    return best_Ridgeerror, best_Ridge_params[0], best_Ridge_params[1], best_Ridge_params[2]

---
<a name="8"/>

### 8. Оцінка результатів
([зміст](#0))

### Комбайн
Зробимо досліницьку роботу. Переберемо по датасетах різні скалєри, моделі, підберемо оптимальні гіперпараметри для моделей.

In [35]:
result = pd.DataFrame(columns = ["Wine", "Scaler", "Model", "MSE", "MAE", "R2"])
wine_datasets = [red_wine, white_wine, rw_wine]
scaler_list = [ MaxAbsScaler(), MinMaxScaler(), StandardScaler(), RobustScaler()]

# Для кожного типу вина:
for wine in wine_datasets:

    if wine.equals(red_wine):
        wine_name = 'Red wine'
    elif wine.equals(white_wine):
        wine_name = 'White wine'
    elif wine.equals(rw_wine):
        wine_name = 'Red and white wine'
    # Спліт датасету на фічі та таргет
    X_value, y_value = get_split_dataset(wine)
    # Для кожного скалєра:
    for scal in scaler_list:
        # Маштабуємо фічі
        scalered_dataset = get_scalered_dataset(X_value, scal)

        # Сплітимо датасет на 3 частини
        X_train, X_valid, X_test, y_train, y_valid, y_test = get_TVT_parts(scalered_dataset, y_value)

        # Тринуэмо базові моделі
        # Linear Regression
        LR_y_predict = get_predict_LR(X_train, y_train, X_test)
        # SGDRegressor
        SGD_y_predict_default = get_predict_SGD(X_train, y_train, X_test)
        # Ridge
        Ridge_y_predict_default = get_predict_Ridge(X_train, y_train, X_test)

        # Підбераємо параметри
        # SGDRegressor
        SGD_best_error, SGD_best_loss, SGD_best_penalty, SGD_best_alpha = tuneParams_SGDRegressor(X_train, y_train, X_valid, y_valid, mean_squared_error)
        # Ridge
        Ridge_best_error, Ridge_best_solver, Ridge_best_tol, Ridge_best_alpha = tuneParam_Ridge(X_train, y_train, X_valid, y_valid, mean_squared_error)

        # Вчимо моделі на найкращіх параметрах
        # SGDRegressor
        SGD_y_predict = get_predict_SGD(X_train, y_train, X_test, loss=SGD_best_loss, penalty=SGD_best_penalty, alpha=SGD_best_alpha)
        # Ridge
        Ridge_y_predict = get_predict_Ridge(X_train, y_train, X_test, solver=Ridge_best_solver, tol=Ridge_best_tol, alpha=Ridge_best_alpha)

        predicts = [LR_y_predict, SGD_y_predict_default, Ridge_y_predict_default, SGD_y_predict, Ridge_y_predict]

        # Для предіктів кожної моделі
        for y_pred in predicts:
            if np.array_equal(y_pred, LR_y_predict):
                model_name = "Linear Regression"
            elif np.array_equal(y_pred, SGD_y_predict_default):
                model_name = "SGD(default)"
            elif np.array_equal(y_pred, Ridge_y_predict_default):
                model_name = "Ridge(default)"
            elif np.array_equal(y_pred, SGD_y_predict):
                model_name = f"SGD(loss={SGD_best_loss}, penalty={SGD_best_penalty}, alpha={SGD_best_alpha})"
            elif np.array_equal(y_pred, Ridge_y_predict):
                model_name = f"Ridge(solvers={Ridge_best_solver}, tol={Ridge_best_tol}, alpha={Ridge_best_alpha})"
            # Вираховуємо різні функції мінімізації
            MSE = get_error(y_test, y_pred, mean_squared_error)
            MAE = get_error(y_test, y_pred, mean_absolute_error)
            R2 = get_error(y_test, y_pred, r2_score)

            #print(f"Wine dataset: {wine_name}, scaler={scal}, model={model_name}, MSE: {MSE}, MAE: {MAE}, R2: {R2}")
            result.loc[len(result.index)] = [wine_name, scal, model_name, MSE, MAE, R2]

In [36]:
result.head()

Unnamed: 0,Wine,Scaler,Model,MSE,MAE,R2
0,Red wine,MaxAbsScaler(),Linear Regression,0.470785,0.532016,0.314745
1,Red wine,MaxAbsScaler(),SGD(default),0.575528,0.613785,0.162286
2,Red wine,MaxAbsScaler(),Ridge(default),0.477019,0.537691,0.305672
3,Red wine,MaxAbsScaler(),"SGD(loss=squared_error, penalty=l1, alpha=0.1)",0.67494,0.680545,0.017585
4,Red wine,MaxAbsScaler(),"Ridge(solvers=lsqr, tol=0.01, alpha=0.01)",0.472824,0.531539,0.311777


### Результати

In [38]:
# Найкращій показник MSE для червоного вина
result[(result['Wine'] == 'Red wine') & (result[result['Wine'] == 'Red wine']['MSE'] == result[result['Wine'] == 'Red wine']['MSE'].min())]

Unnamed: 0,Wine,Scaler,Model,MSE,MAE,R2
9,Red wine,MinMaxScaler(),"Ridge(solvers=sag, tol=0.01, alpha=0.01)",0.38269,0.492338,0.41823


In [39]:
# Найкращій показник MAE для червоного вина
result[(result['Wine'] == 'Red wine') & (result[result['Wine'] == 'Red wine']['MAE'] == result[result['Wine'] == 'Red wine']['MAE'].min())]

Unnamed: 0,Wine,Scaler,Model,MSE,MAE,R2
19,Red wine,RobustScaler(),"Ridge(solvers=sparse_cg, tol=0.01, alpha=0.1)",0.385372,0.482514,0.320509


In [40]:
# Найкращій показник R2 для червоного вина
result[(result['Wine'] == 'Red wine') & (result[result['Wine'] == 'Red wine']['R2'] == result[result['Wine'] == 'Red wine']['R2'].min())]

Unnamed: 0,Wine,Scaler,Model,MSE,MAE,R2
3,Red wine,MaxAbsScaler(),"SGD(loss=squared_error, penalty=l1, alpha=0.1)",0.67494,0.680545,0.017585


In [41]:
# Найкращій показник MSE для білого вина
result[(result['Wine'] == 'White wine') & (result[result['Wine'] == 'White wine']['MSE'] == result[result['Wine'] == 'White wine']['MSE'].min())]

Unnamed: 0,Wine,Scaler,Model,MSE,MAE,R2
29,White wine,MinMaxScaler(),"Ridge(solvers=sag, tol=0.01, alpha=0.01)",0.525642,0.569465,0.279251


In [42]:
# Найкращій показник MAE для білого вина
result[(result['Wine'] == 'White wine') & (result[result['Wine'] == 'White wine']['MAE'] == result[result['Wine'] == 'White wine']['MAE'].min())]

Unnamed: 0,Wine,Scaler,Model,MSE,MAE,R2
29,White wine,MinMaxScaler(),"Ridge(solvers=sag, tol=0.01, alpha=0.01)",0.525642,0.569465,0.279251


In [43]:
# Найкращій показник R2 для білого вина
result[(result['Wine'] == 'White wine') & (result[result['Wine'] == 'White wine']['R2'] == result[result['Wine'] == 'White wine']['R2'].min())]

Unnamed: 0,Wine,Scaler,Model,MSE,MAE,R2
23,White wine,MaxAbsScaler(),"SGD(loss=huber, penalty=l1, alpha=0.1)",0.767721,0.628173,-0.009617


In [44]:
# Найкращій показник MSE для червоного та білого вина
result[(result['Wine'] == 'Red and white wine') & (result[result['Wine'] == 'Red and white wine']['MSE'] == result[result['Wine'] == 'Red and white wine']['MSE'].min())]

Unnamed: 0,Wine,Scaler,Model,MSE,MAE,R2
49,Red and white wine,MinMaxScaler(),"Ridge(solvers=lsqr, tol=0.01, alpha=2)",0.53715,0.568143,0.272653


In [45]:
# Найкращій показник MAE для червоного та білого вина
result[(result['Wine'] == 'Red and white wine') & (result[result['Wine'] == 'Red and white wine']['MAE'] == result[result['Wine'] == 'Red and white wine']['MAE'].min())]

Unnamed: 0,Wine,Scaler,Model,MSE,MAE,R2
57,Red and white wine,RobustScaler(),Ridge(default),0.540903,0.563921,0.302904


In [46]:
# Найкращій показник R2 для червоного та білого вина
result[(result['Wine'] == 'Red and white wine') & (result[result['Wine'] == 'Red and white wine']['R2'] == result[result['Wine'] == 'Red and white wine']['R2'].min())]

Unnamed: 0,Wine,Scaler,Model,MSE,MAE,R2
53,Red and white wine,StandardScaler(),"SGD(loss=huber, penalty=l1, alpha=0.1)",0.805948,0.664907,-0.031651


---

Розглянемо MSE моделей.<br>
Для датасету червогоно вина MSE=0.381022, а для червоного та білого вина MSE=0.538056. Тому краще робити передбаченя для червоного вина окремо. *Скалєр: 	StandardScaler(), модель: SGD(loss=squared_error, penalty=l1, alpha=0.1)*<br>
Для датасету білого вина MSE=0.548435, а для червоного та білого вина MSE=0.538056. Тут кращє біло вино передбачає загальна модель, для червоного та білого. *Скалєр: MinMaxScaler(), модель: Linear Regression*.

### Дякую за увагу =)