# This is a sample Jupyter Notebook

Below is an example of a code cell. 
Put your cursor into the cell and press Shift+Enter to execute it and select the next one, or click 'Run Cell' button.

Press Double Shift to search everywhere for classes, files, tool windows, actions, and settings.

To learn more about Jupyter Notebooks in PyCharm, see [help](https://www.jetbrains.com/help/pycharm/ipython-notebook-support.html).
For an overview of PyCharm, go to Help -> Learn IDE features or refer to [our documentation](https://www.jetbrains.com/help/pycharm/getting-started.html).

In [16]:
import numpy as np
from sklearn import linear_model
import sklearn.metrics as sm
from sklearn.preprocessing import PolynomialFeatures

# ---------------------------------------------------
# Завантаження вхідних даних
# ---------------------------------------------------
input_file = 'data_multivar_regr.txt'
data = np.loadtxt(input_file, delimiter=',')

# Відокремлення ознак (X) та цільової змінної (y)
X, y = data[:, :-1], data[:, -1]

# ---------------------------------------------------
# Розбиття на навчальну та тестову вибірки
# ---------------------------------------------------
num_training = int(0.8 * len(X))
X_train, y_train = X[:num_training], y[:num_training]
X_test, y_test = X[num_training:], y[num_training:]

# ---------------------------------------------------
# Створення та навчання лінійного регресора
# ---------------------------------------------------
linear_regressor = linear_model.LinearRegression()
linear_regressor.fit(X_train, y_train)

# Прогнозування для тестової вибірки
y_test_pred = linear_regressor.predict(X_test)

# ---------------------------------------------------
# Оцінка ефективності лінійного регресора
# ---------------------------------------------------
print("Linear regressor performance:")
print("Mean absolute error =", round(sm.mean_absolute_error(y_test, y_test_pred), 2))
print("Mean squared error =", round(sm.mean_squared_error(y_test, y_test_pred), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test, y_test_pred), 2))
print("Explained variance score =", round(sm.explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))

# ---------------------------------------------------
# Поліноміальна регресія (ступінь 10)
# ---------------------------------------------------
polynomial = PolynomialFeatures(degree=10)
X_train_transformed = polynomial.fit_transform(X_train)
X_test_transformed = polynomial.transform(X_test)

poly_regressor = linear_model.LinearRegression()
poly_regressor.fit(X_train_transformed, y_train)

# Прогнозування для тестового набору
y_test_pred_poly = poly_regressor.predict(X_test_transformed)

# ---------------------------------------------------
# Оцінка ефективності поліноміального регресора
# ---------------------------------------------------
print("\nPolynomial regressor performance:")
print("Mean absolute error =", round(sm.mean_absolute_error(y_test, y_test_pred_poly), 2))
print("Mean squared error =", round(sm.mean_squared_error(y_test, y_test_pred_poly), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test, y_test_pred_poly), 2))
print("Explained variance score =", round(sm.explained_variance_score(y_test, y_test_pred_poly), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred_poly), 2))

# ---------------------------------------------------
# Прогнозування для вибіркової точки
# ---------------------------------------------------
datapoint = np.array([[7.75, 6.35, 5.56]])
poly_datapoint = polynomial.transform(datapoint)

linear_pred = linear_regressor.predict(datapoint)
poly_pred = poly_regressor.predict(poly_datapoint)

print("\nPrediction for datapoint", datapoint)
print("Linear regressor prediction:", round(float(linear_pred[0]), 2))
print("Polynomial regressor prediction:", round(float(poly_pred[0]), 2))


Linear regressor performance:
Mean absolute error = 3.58
Mean squared error = 20.31
Median absolute error = 2.99
Explained variance score = 0.86
R2 score = 0.86

Polynomial regressor performance:
Mean absolute error = 67.99
Mean squared error = 88448.6
Median absolute error = 5.78
Explained variance score = -584.54
R2 score = -587.72

Prediction for datapoint [[7.75 6.35 5.56]]
Linear regressor prediction: 36.05
Polynomial regressor prediction: 41.08
