## 1. Setup

In [None]:
# Install required libraries (run this once if needed)
%pip install numpy pandas matplotlib

In [None]:
import numpy as np
import matplotlib.pyplot as plt


## 2. Dataset and Notation

- M: stellar mass (in units of solar mass, M⊙)
- T: effective stellar temperature (Kelvin, K)
- L: stellar luminosity (in units of solar luminosity, L⊙

M = [0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4]

L = [0.15, 0.35, 1.00, 2.30, 4.10, 7.00, 11.2, 17.5, 25.0, 35.0]

In [None]:
M = np.array([0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4])
L = np.array([0.15, 0.35, 1.00, 2.30, 4.10, 7.00, 11.2, 17.5, 25.0, 35.0])

m = len(M)
l = len(L)

### 2.1 DataSet Visualization: Plot M vs L. 



In [None]:
plt.figure()
plt.scatter(M, L)
plt.xlabel("Stellar Mass (M☉)")
plt.ylabel("Luminosity (L☉)")
plt.title("Stellar Luminosity vs Mass")
plt.grid(True)
plt.show()


The plot shows a strong positive correlation between stellar mass and luminosity, that is a good sign of some plausibility.

However, the relationship between both is clearly non linear: 
- luminosity increases slowly at low masses and much more rapidly at higher masses.


This suggests that a linear regression model will only provide a rough approximation and will underpredict luminosity for high-mass stars, as the relationship is not linear at higher masses.

## 3.Model and loss

The hypothesis models stellar luminosity as a linear function of mass with an explicit bias term.

The mean squared error is used as the loss function, measuring the average squared difference between predicted and observed luminosities.


Prediction

In [None]:
def predict(X, w, b):
    """Compute predictions f_{w,b}(x) for all examples.
    
    Linear regression model:
    L_hat = w * M + b
    
    """
    return X * w + b  # vectorized: matrix-vector product + scalar

MSE: mean squared error

In [None]:
def mse(M, L, w, b):
    """
    Mean Squared Error cost function
    """
    m = len(M)
    L_hat = predict(M, w, b)
    return (1 / (2 * m)) * np.sum((L_hat - L) ** 2)

## 4 Cost surface

Visualizing the cost surface before applying methods like gradient descent helps us understand the behavior of the model.
It allows us to see that there is a single global minimum, which is important to ensure that gradient descent will converge correctly.

Additionally, it helps us understand the sensitivity of the parameters: how small changes in w or b affect the cost.

In [None]:
from mpl_toolkits.mplot3d import Axes3D  # needed to register the 3D projection
from matplotlib import cm
 
# Choose reasonable ranges around the expected optimum
w_vals = [float(v) for v in np.linspace(-1.0, 7.0, 60)]
b_vals = [float(v) for v in np.linspace(-5.0, 10.0, 60)]
 
J = np.zeros((len(w_vals), len(b_vals)))
 
for i, w in enumerate(w_vals):
    for j, b in enumerate(b_vals):
        J[i, j] = mse(M, L, w, b)
W, B = np.meshgrid(w_vals, b_vals)
 
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection="3d")
ax.plot_surface(W, B, J, cmap=cm.viridis, linewidth=0, antialiased=True)
ax.set_xlabel("w")
ax.set_ylabel("b")
ax.set_zlabel("J(w,b)")
ax.set_title("Cost surface J(w,b)")
plt.show()


The 3D plot shows how the MSE varies with slope (w) and bias (b).
The lowest point on the surface represents the optimal parameters that best fit the data.
Its convex shape confirms that the cost function has a single global minimum, which is why gradient descent can reliably find the optimal w and b.