# Regression Model

Since the target variable `target_xg` is continuous in the range [0, 1], the problem is a **regression task**.  
We begin with a **Linear Regression** model to establish a baseline, evaluating results with:

- **Mean Absolute Error (MAE)**  
- **Root Mean Squared Error (RMSE)**  
- **R² score**  

This baseline provides a first benchmark. It is possible also to try more advanced models – such as **Ridge, Lasso, and Tree-based regressors** – to improve predictive performance and better capture non-linear relationships in the data. 

### DS0 - Linear Regression

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

# Features and target
X = ds0.drop(columns=["target_xg"])
y = ds0["target_xg"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Linear Regression
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)

# Predictions
y_pred = lin_reg.predict(X_test)

# Evaluation
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)

print("Linear Regression Baseline")
print("MAE:", mae)
print("RMSE:", rmse)
print("R²:", r2)
