<a href="https://colab.research.google.com/github/anas1IA/Biblio_machine_learning/blob/main/Structure_risk_minimization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Structure risk minimization ⏰
 SRM aims to balance the fit of the model to the training data (as measured by the Mean Squared Error, MSE) with the complexity of the model (as measured by the number of parameters, in this case, the degree of the polynomial).

To summarize, the code performs the following steps:

Loop over different polynomial degrees.
For each degree, calculate polynomial features, perform linear regression, and calculate the Mean Squared Error (MSE).
Compute the MDL term and calculate the total MDL.
Update the best model if the total MDL is lower than the current best.
Return the weights, degree, and MSE of the best model.


In [15]:
def MDL(X, y, degrees, delta):
    m = len(y)
    best_degree = None
    best_w = None
    best_mse = float('inf')

    for degree in degrees:
        Z = poly_features(degree, X)
        A = Z.T @ Z
        b = Z.T @ y
        A_inverse = np.linalg.inv(A)
        w = A_inverse @ b

        # Calculate MSE
        mse = MSE(Z,y,w)

        # Calculate description length
        mdl_term = degree + np.log(2/delta) / (2 * m)
        total_mdl = mse + np.sqrt(mdl_term)

        # Update best parameters if this model has a lower MDL
        if total_mdl < best_mse:
            best_mse = total_mdl
            best_degree = degree
            best_w = w

    return best_w, best_degree, best_mse

In [16]:
def poly_features(degree, X):
    # Create polynomial features up to the given degree
    return np.column_stack([X ** i for i in range(1, degree + 1)])


In [17]:
def MSE(X, y, w):
    # Calculate Mean Squared Error
    m = len(y)
    predictions = X @ w
    mse = np.sum((predictions - y) ** 2)
    return mse/ m


In [18]:
import numpy as np

# Assuming you have defined the poly_features and MSE functions elsewhere in your code
# (Also assuming you have already defined the MDL function)

# Example data
my_list = [1, 2, 3, 4, 5, 6, 7, 8]
X = np.array(my_list)
y = np.array([2, 4, 5, 4, 5, 8, 9, 12])  # Replace this with your target values

# List of degrees to consider
degrees = [1, 2, 3, 4, 5, 6, 7, 8, 9]

# Delta parameter
delta = 0.05

# Call the MDL function
best_w, best_degree, best_mse = MDL(X, y, degrees, delta)

# Print the results
print("Best weights:", best_w)
print("Best degree:", best_degree)
print("Best MSE:", best_mse)


Best weights: [ 2.77944996 -0.61233766  0.05695187]
Best degree: 3
Best MSE: 2.204087604369658
