**Author:** Shahab Fatemi

**Email:** shahab.fatemi@umu.se   ;   shahab.fatemi@amitiscode.com

**Created:** 2024-04-08

**Last update:** 2025-08-17

**MIT License** — Shahab Fatemi (2025); For use in the *Machine Learning in Physics* course, Umeå University, Sweden; See the full license text in the parent folder.

**GitHub Copilot** was a valuable assistant in preparing Jupyter Notebooks for this course. It helped me generate code comments and organize the code more clearly and consistently throughout the course materials.

<hr>

📢 <span style="color:red"><strong> Note for Students:</strong></span>

* Before working on the labs, review your lecture notes.

* Please read all sections, code blocks, and comments **carefully** to fully understand the material. Throughout the labs, my instructions are provided to you in written form, guiding you through the materials step-by-step.

* All concepts covered in this lab are part of the course and may be included in the final exam.

* I strongly encourage you to work in pairs and discuss your findings, observations, and reasoning with each other.

* If something is unclear, don't hesitate to ask.

* Exercise submission is not required; these tasks are designed to help you practice, explore the concepts, and learn by doing.

* I have done my best to make the lab files as bug-free (and error-free) as possible, but remember: *there is no such thing as bug-free code.* If you observed any bugs, errors, typos, or other issues, I would greatly appreciate it if you report them to me by email. Verbal notifications are not work, as I will likely forget 🙂

ENJOY WORKING ON THIS LAB.
***

# Before you begin

### ⚠️ Note 1:
As you begin your programming journey in this course, it is important to recognize that there are many different ways to write codes. Many programming languages offer multiple functions or methods to achieve the same task, allowing you to choose the most suitable approach for your needs. Additionaly, a variety of libraries and frameworks exist, each with unique strengths (and limitations) that can enhance your coding efficiency.

### ⚠️ Note 2:
Individual coding styles also play an important role in how programmers approach their tasks. Each programmer develops a unique style influenced by personal preference and experience, which may prioritize conciseness or clarity. In team environments, the story is a lot more different, requires adopting coding standards that helps maintain consistency and readability within the team. Remember that the examples provided in this course are just one of many ways to solve programming challenges.

### ⚠️ Note 3:
What truly matters in programming are the accuracy and efficiency of your code as well as its readability and maintainability. A well-written code should produce correct results while being optimized for performance, making it easier for others (or yourself in the future) to understand, modify, and build. Providing a balance between these elements is essential for delivering high-quality software.

### ⚠️ Note 4:
I have tested all of these codes using Python >=3.12 in VS Code, and they should run without any errors. If you faced an error, please try to identify the root-cause and fix it on your own first. If the issue persists (e.g., you spent >=30 minutes on it and did not go away), do not hesitate to ask me for help.

***

# 🛠️ Your Tasks:

* During all lab sessions, you are not required to write code from scratch; instead, your main responsibility is to carefully read through the provided instructions, examine and understand the sample codes I have prepared, and connect them to the concepts being introduced in the class.

* You should read the text provided above the code lines, run the given code, observe how it behaves, and analyze the output in relation to the theoretical background discussed in class.

* However, only running the code is not enough; you should also read through the codes line by line to understand how the code is written.

* The goal is for you to **actively engage** with the material by experimenting with the code, asking yourself how the code is written, why it produces certain results, and drawing conclusions that will strengthen your understanding of both the practical and conceptual aspects of the topic.

***

# Linear Regression

## Overview

Regression is a supervised technique for estimating the relationships between a continuous dependent variable (output or target) and one or more independent variables (features or regressors).

Due to its simplicity, it has been widly used to solve linear problems.

- **Goal:** Find the best-fitting straight line through the data points.

- **Equation:**  
    $y = mx + b$
  where:
  - $y$ is the predicted output (target),
  - $x$ is the input feature,
  - $m$ is the line slope (or $w_1$),
  - $b$ is the y-intercept (or $w_0$).

- **How does it work?** The model minimizes the difference between the actual data points and the predicted values on the line (this difference is called the error or residual).

Linear regression can also be extended to multiple features, which is called **Multiple Linear Regression**. More advanced topics are covered in the upcoming notebooks.

## Simple Linear Regression

Let's begin simple. Very simple! 

In this code, we simulate uniform linear motion of an object with added noise to model measurement errors. Two datasets with different sample sizes are generated using the same equation of motion: $y=v_0 t+y_0$. For each dataset, the code computes the sum of squared residuals (SSR) and the root mean squared error (RMSE) to quantify the deviation between the true trajectory and the noisy observations. This comparison shows how sample size affects error metrics.

In [None]:
import numpy as np
from sklearn.metrics import root_mean_squared_error

# Compute the true y values for the motion of a projectile
# y = v0*t + y0
# and return both the true and noisy data
#
# -------------------------------------------------------------------------------------------------
# NOTE: The commenting format you see below (in the function) is an standard Python docstring format
#       used for properly documenting the code. I highly encourage you to use this format in your
#       own code, but for simplicity, I keep the comments (expect for this one and a few more) brief
#       in our lab notebooks.
#
#       As I said earlier in the class, this course is not "just" about Machine Learning. I try
#       to incorporate concepts from physics, programming, data analysis and engineering as well.
# -------------------------------------------------------------------------------------------------
#
def uniform_motion(v0, y0, time_range=(0, 5), num_samples=100, noise_level=5, seed=42):
    """ 
    Simulate uniform motion with added Gaussian noise.
    
    v0: initial velocity (or the true slope of the line)
    y0: initial position (or the y-intercept of the line)
    time_range: the range of time over which we simulate the motion of the projectile in units of seconds
    num_samples: the number of data samples to be generated
    noise_level: the standard deviation of the Gaussian noise to be added to the data to simulate measurement noise
    seed: the random seed for reproducibility
    """
    
    np.random.seed(seed)  # Set random seed for reproducibility

    # time for the uniform motion in units of seconds
    t = np.linspace(time_range[0], time_range[1], num_samples)

    # The equation of motion is y = v0*t + y0
    y_true  = v0*t + y0 # true trajectory
    y_noisy = y_true + np.random.normal(0, noise_level, num_samples) # Add noise to the trajectory
    return t, y_true, y_noisy

# ======== MAIN ========
# Parameters for the uniform motion
v0 = 2.0   # initial velocity (or the true slope of the line)
y0 = 1.0   # initial position (or the y-intercept of the line)

# Defining sample sizes for two datasets
n_samples1, n_samples2 = 10, 100  # Two datasets of different sizes

# Generate data for experiment 1
t1, y1_true, y1_noisy = uniform_motion(v0, y0, num_samples=n_samples1)

# Generate data for experiment 2
t2, y2_true, y2_noisy = uniform_motion(v0, y0, num_samples=n_samples2)

# Calculating the Sum of Squared Residuals (SSR)
ssr1 = np.sum((y1_true - y1_noisy) ** 2)
ssr2 = np.sum((y2_true - y2_noisy) ** 2)

# Calculating Root Mean Squared Error (RMSE)
rmse1 = root_mean_squared_error(y1_true, y1_noisy)
rmse2 = root_mean_squared_error(y2_true, y2_noisy)

print(f"\nDataset 1 ({n_samples1} samples):")
print(f" - Sum of Squared Residuals (SSR): {ssr1:.2f}")
print(f" - Root Mean Squared Error (RMSE): {rmse1:.2f}")
print(f"\nDataset 2 ({n_samples2} samples):")
print(f" - Sum of Squared Residuals (SSR): {ssr2:.2f}")
print(f" - Root Mean Squared Error (RMSE): {rmse2:.2f}")

Now, we want to simulate the uniform motion of an object in one dimension and visualize the motion over time with and without added noise. Uniform motion is characterized by a constant velocity $v$ and an initial position $y_0$​.

We generate 100 time values uniformly distributed between 0 and 5 seconds, representing moments in time during which the object's position is recorded. Additionally, random noise is introduced to simulate measurement errors or environmental variability. The level of noise is controlled by "noise_level", which uses a Gaussian function. 

The positions of the object at each time instance are calculated using the equation $y = v_0t + y_0$​, where $y$ represents the position, $v$ is the constant velocity, and $t$ is time. The calculated positions are then perturbed by the generated noise to create a realistic measurements.

In [None]:
import matplotlib.pyplot as plt

# Parameters for uniform motion
v0 = +2.1  # constant velocity
y0 = -1.2  # initial position y0
n  = 100   # Number of smaples
noise_level = 3.0   # noise level

# Generate data
t, y_true, y_noisy = uniform_motion(v0, y0, num_samples=n, noise_level=noise_level)

# Plot the uniform motion
plt.figure(figsize=(6, 4), dpi=200)

# Scatter plot of noisy data
plt.scatter(t, y_noisy, color="k", s=70, alpha=0.3, label=f"Noisy Data")

# Plot the true trajectory
plt.plot(t, y_true, color="royalblue", linewidth=2, label=f"True y = f(t) = {v0:+}t {y0:+}")

# Adding titles and labels
plt.title("Uniform Motion in One Dimension")
plt.xlabel("Time (s)")
plt.ylabel("Position (m)")
plt.grid(True, linestyle="--", alpha=0.5)
plt.legend()
plt.show()

Assume we do not have knowledge about the true function governing the motion of our object. This means we don't know $f(t) = 2.1t - 1.2$. Instead, we make an educated guess for the parameters: the slope $v_0$ (denoted as $w_1$ in the lecture notes) and the intercept $y_0$ (or $w_0$). In the code below, we define guessed values $v0_{pred}$ and $y0_{pred}$, construct a hypothetical function $h(t) = v0_{pred} t + y0_{pred}$, and evaluate how well this guess fits the data by calculating regression metrics (SSR, MSE, and RMSE).

In the next code section, we only calculate the metrics and in the code section comes after, we visualize our predicted values using our $h(t)$. 

**Note:** $h(t)$ is $h(x)$ in the lecture slides. 

In [None]:
# Hypothetical predicted parameters
v0_pred = +3.0  # My guess for the slope
y0_pred = -3.0  # My guess for the y-intercept

# Predicted y using the guessed values. 
# This is a possible hypothesis, h(t), but should not necessarily be the most accurate one.
# This h(t) is member of the hypothesis space H, we discussed in the class.
y_pred = v0_pred * t + y0_pred

# Compute metrics for the noisy data
ssr_pred  = np.sum ((y_pred - y_noisy)**2)              # Sum of Squared Residuals
mse_pred  = np.mean((y_pred - y_noisy)**2)              # Mean Squared Error
rmse_pred = root_mean_squared_error(y_pred, y_noisy)    # Root Mean Squared Error

print(f"\nModel prediction using v0_pred={v0_pred}, y0_pred={y0_pred}:")
print(f" - Sum of Squared Residuals (SSR): {ssr_pred:.2f}")
print(f" - Mean Squared Error (MSE)      : {mse_pred:.2f}")
print(f" - Root Mean Squared Error (RMSE): {rmse_pred:.2f}")

In [None]:
# =========================================================
# NOTE:
#       Those of you who are not familiar or have little experience with Python and its libraries,
#       you should carefully read the documentations.
# =========================================================
#
# Plot the uniform motion
#   figsize: the size of the figure in inches (width, height)
#   dpi: the resolution of the figure in dots per inch
plt.figure(figsize=(6, 4), dpi=200)

# Scatter plot of noisy data
#   color: the color of the points
#   s: the size of the points
#   alpha: the transparency level of the points
#   label: the label for the points to be shown in the legend
plt.scatter(t, y_noisy, color="k", s=70, alpha=0.3, label=f"Noisy Data")

# Plot the true trajectory
#   color: the color of the line
#   linewidth: the width of the line
#   label: the label for the line to be shown in the legend
plt.plot(t, y_true, color="royalblue", linewidth=2, label=f"True y = f(t) = {v0:+}t {y0:+}")

# Scatter plot of predicted data
plt.scatter(t, y_pred, marker="s", color="forestgreen", s=15, alpha=0.5, label=f"y_pred = h(t) = {v0_pred:+}t {y0_pred:+}")

# Adding titles and labels
plt.title("Uniform Motion in One Dimension")     # Title of the plot
plt.xlabel("Time (s)")                           # X-axis label
plt.ylabel("Position (m)")                       # Y-axis label
plt.grid(True, linestyle="--", alpha=0.5)        # Add gridlines with dashed lines and low opacity (0.5)
plt.legend()                                     # Add legend
plt.show()                                       # Display the plot

***
### ⛷️ Exercise

I mentioned earlier that you are not required to write much code during the lab sessions. That is true! However, for beginners in Python, regular practice is important. That is why (very) occasional code-writing tasks are included. They are meant to help you gradually work toward **mastering** Python.

Write a Python script that performs a brute-force search to find the best-fitting parameters $v_0$ and $y_0$ for a linear model $h(t) = v_0 t + y_0$, based on our generated data. Without assuming knowledge of the true function, define a range of candidate values using `np.linspace` for both $v_0$ and $y_0$, compute the predicted values $h(t)$ for each pair, and evaluate the model using a suitable regression metric (MSE or RMSE). The goal is to find the combination of $v_0$ and $y_0$ that give the minimume error with respect to the noisy observations.

E.g.,
```python
v0_pred = np.linspace(-3, 3, 13)
y0_pred = np.linspace(-3, 3, 13)
```

Do the rest yourself. 

**Do not use AI tools (e.g., Chat GPT) to produce codes for you, if you truly want to learn.**

***

## Predicting Position: Analytical Approach
Now, we minimize the Squared Error to analytically solve for the slope and intercept of our simple linear function.
Our linear function is in the form of $y=w_0 + w_1x$. We have 'n' data points from an observation, where
$(x_1, x_2, ..., x_n)$ are mapped to $(y_1, y_2, ..., y_n)$. By calculating the square to the residuals $\sum(y_i - (w_0 + w_1x_i))^2$ and minimizing it, we will find slope ($w_1$) and constant ($w_0$).

$$
w_1 = \frac{\sum(xy) - \bar{y} \cdot \sum x}{\sum x^2 - \bar{x} \cdot \sum x}
$$

and 

$$
w_0 = \frac{\bar{y} \cdot \sum x^2 - \bar{t} \cdot \sum(xy)}{\sum x^2 - \bar{x} \cdot \sum x}
$$

The code below performs linear regression to fit a line to our dataset using the analytical approach. We need to calculate every elements expressed in the analytical solution. So, we do that one by one in the code below.

In [None]:
# Calculate the mean of t
mean_t = np.mean(t)

# Calculate the mean of y_noisy
mean_y = np.mean(y_noisy)

# Calculate the sum of all elements in t
sum_t = np.sum(t)

# Calculate the sum of all elements in y_noisy
sum_y = np.sum(y_noisy)

# Calculate the sum of squares of all elements in t
sum_t_square = np.sum(t**2)

# Calculate the sum of squares of all elements in y_noisy
sum_y_square = np.sum(y_noisy**2)

# Calculate the dot product of t and y_noisy
dot_t_y = np.dot(t.T, y_noisy)

# Calculate the slope (w1) of the linear regression line using the analytic method
# Using the formula: w1 = (Σ(xy) - mean(y) * Σ(x)) / (Σ(x^2) - mean(x) * Σ(x))
# The expression above is mathematically the same as the one you have in your lecture notes.
analytic_w1 = (dot_t_y - mean_y * sum_t) / (sum_t_square - mean_t * sum_t)

# Convert analytic_w1 to a scalar
analytic_w1 = analytic_w1.item()

# Calculate the y-intercept (w0) of the linear regression line using the analytic method
# Using the formula: w0 = (mean(y) * Σ(x^2) - mean(x) * Σ(xy)) / (Σ(x^2) - mean(x) * Σ(x))
# The expression above is mathematically the same as the one you have in your lecture notes.
analytic_w0 = (mean_y * sum_t_square - mean_t * dot_t_y) / (sum_t_square - mean_t * sum_t)

# Convert analytic_w0 to a scalar
analytic_w0 = analytic_w0.item()

# Print the calculated slope and y-intercept
print("Slope     (v0) or (w1): {:.2f}".format(analytic_w1) )
print("Intercept (y0) or (w0): {:.2f}".format(analytic_w0) )


Now that we found the slope and intercept, we can calculate the predicted y values ($y^*$ in the lecture notes) using the analytic solution, and visualize the results.

In [None]:
# Calculate the predicted y values using the analytic solution
y_analytic = analytic_w1 * t + analytic_w0

# Plot the uniform motion
plt.figure(dpi=200, figsize=(6, 4))

# Scatter plot of noisy data
plt.scatter(t, y_noisy, color="k", s=70, alpha=0.3, label=f"Noisy Data")

# Plot the true trajectory
plt.plot(t, y_true, color="royalblue", linewidth=2, label=f"True y = f(t) = {v0:+}t {y0:+}")

# Scatter plot of analytic data
plt.scatter(t, y_analytic, marker="s", color="forestgreen", s=15, alpha=0.5, label=f"y_analytic = h(t) = {analytic_w1:+.2f}t {analytic_w0:+.2f}")

# Adding titles and labels
plt.title("Uniform Motion in One Dimension")
plt.xlabel("Time (s)")
plt.ylabel("Position (m)")
plt.grid(True, linestyle="--", alpha=0.5)
plt.legend()
plt.show()

***
### 💡 Reflect and Run
- Calculate the RMSE for the analytical solution and compare it with the smallest RMSE you got from the  **exercise** section (the brute-force search). Which one is the smallest? Explain your observations.

### ✅ Check your understanding

- The green squares shown in the figure represent the best-fit hypothetical function $h(t)$. However, this predicted solution does not perfectly match the actual true function $f(t)$, shown by the blue solid line. Explain the reasons for these discrepancies and discuss why the predicted model cannot fully recover the true function for the equation of motion.
***

## Predicting Position: ML Approach using Scikit Learn

We have not covered `Scikit-Learn` in detail yet and you may not be familiar with it at this point. However, this library forms the foundation of our course and will be used frequently (actually more than frequent), as it provides implementations for nearly all standard ML methods. As a first step, take some time to explore the Scikit-Learn website (https://scikit-learn.org) to get an overview of its capabilities and structure.

In the code section below, we introduce a basic ML approach to model the relationship between time $t$ and the noisy position measurements $y_{noisy}$ using Scikit-Learn. Although this example is simplified and not a proper implementation, it gives you an initial understanding of how to use ML models from this great library. In a complete workflow, the data would typically be split into training and test (or validation) sets to evaluate the model's performance properly. Since we haven't discussed data splitting yet, we will skip that step here for **simplicity**.

1- We begin by creating a linear regression model using the `LinearRegression()` function from Scikit-Learn. This is our **hop**, where we initialize the ML algorithm we want to use. 

2- Next comes the **step**: we train the model on the available data $(t, y_{\text{noisy}})$ using `model.fit()`, allowing the algorithm to learn the best-fitting parameters from the data. 

3- Finally, we make the **jump** by using the trained model to generate predictions $y_{\text{pred}}$ with `model.predict()`. 

These three stages (*hop, step, and jump*) form the essential workflow of applying a ML method in practice. The code below introduces you how Scikit-Learn models are used to learn from data and make predictions.

**🎉Welcome to applying ML methods!🎉**

In [None]:
# Required libraries from scikit-learn 
from sklearn.linear_model import LinearRegression

##############################################
# This cell generates data for uniform motion
# It is redundant as we generated the data earlier, but 
# we do it again, because I want you to change them later.

# Parameters for uniform motion
v0 = +2.1  # constant velocity
y0 = -1.2  # initial position y0
n  = 100   # Number of smaples
noise_level = 3.0   # noise level

# Generate data
t, y_true, y_noisy = uniform_motion(v0, y0, num_samples=n, noise_level=noise_level)
##############################################

# Step 1) [** hop **] 
# Create a linear regression model
model = LinearRegression()  # Linear regression modlel from scikit-learn

# Step 2) [** step **] 
# Train (or Fit) the model
model.fit(t.reshape(-1, 1), y_noisy)

# Step 3) [** jump **] 
# Make predictions using scikit learn
y_sk_pred = model.predict(t.reshape(-1, 1))  # SciKit Predicted y values

# Print the coefficients calculated by scikit-learn
print("Predicted Slope     (v0) or (w1): {:.2f}".format(model.coef_[0]) )
print("Predicted Intercept (y0) or (w0): {:.2f}".format(model.intercept_) )

# Calculate root mean squared error
rmse = root_mean_squared_error(y_noisy, y_sk_pred)
print(f"RMSE: {rmse:.4f}")

# Plot the uniform motion
plt.figure(figsize=(6, 4), dpi=200)

# Scatter plot of noisy data
plt.scatter(t, y_noisy, color="k", s=70, alpha=0.3, label=f"Noisy Data")

# Plot the true trajectory
plt.plot(t, y_true, color="royalblue", linewidth=2, label=f"True y = f(t) = {v0:+}t {y0:+}")

# Scatter plot of sklearn predicted data
plt.scatter(t, y_sk_pred, marker=".", color="red", s=10, alpha=0.7, 
            label=f'y_sk_pred = {model.coef_[0]:+.2f}t {model.intercept_:+.2f}')

# Adding titles and labels
plt.title("Uniform Motion in One Dimension")
plt.xlabel("Time (s)")
plt.ylabel("Position (m)")
plt.grid(True, linestyle="--", alpha=0.5)
plt.legend()
plt.show()


***
### ✅ Check your understanding

Compare the results you got from the ML model in Scikit Learn with those you got from the analytical approach. You see the results are the same, if not identical. Why is that? Isn't so that Scikit-Learn is supposed to provide highly accurate, "state-of-the-art" ML models? The reason for this agreement is because the relationship between $t$ and $y$ is perfectly linear, and `LinearRegression` is designed to model exactly that. Therefore, both the analytic approach and the ML model are solving the same simple problem (essentially in the same way).

***
### 💡 Reflect and Run

- Earlier, the noise level we applied to generate the data was relatively high. Reduce the noise level to 0.2 and rerun the code above. Compare the updated values of $w_0$​ and $w_1$​ with those obtained at the higher noise level. What changes do you observe, and what conclusions can you draw?

- Restore the noise level to its original value (3). Instead of reducing noise, increase the number of samples (or samples size) once by a factor of 10 and once by a factor of 100. Re-run the code and recalculate $w_0$​ and $w_1$​. What changes do you observe, and what conclusions do you draw?

- ⚠️⚠️⚠️ Jupyter Notebook is a powerful and convenient tool. There is no doubt! **But** it requires careful attention to how cells are executed. For example, since in the previous task, I asked you to chang the number of samples to 10,000 and re-ran the code cell, the variables `t`, `y_true`, and `y_noisy` were updated with the new values. As a result, if you now scroll up and re-run the cell where we applied the analytical approach (without making any further changes), it will operate on the **latest** values of `t`, `y_true`, and `y_noisy`. Go ahead and try it, and re-run the analytical section and compare the results to those obtained from the SK-Learn model. Do you notice any differences?

***

## Cost Function

Before proceeding, we want to make sure to reset the data to its original configuration, so:

In [None]:
##############################################
# Parameters for uniform motion
v0 = +2.1  # constant velocity
y0 = -1.2  # initial position y0
n  = 100   # Number of smaples
noise_level = 3.0   # noise level

# Generate data
t, y_true, y_noisy = uniform_motion(v0, y0, num_samples=n, noise_level=noise_level)
##############################################

We now perform a grird search to compute and visualize the cost function for our simple linear regression model. This is a more systematic version of the brute-force parameter search you did in the earlier exercise.

Below, we compute the MSE for each combination of $(w_0, w_1)$ over a predefined grid. We define a range for $w_0$ and $w_1$ values, and for every pair of values, we use the current data $t$ and $y_{noisy}$ to evaluate the cost and store the resulting scalar values in a cost matrix, $J_{vals}$. After evaluating all combinations, we identify the minimum of the cost function along with the corresponding values of $(w_0, w_1)$, and visualize the full cost "landscape".

In [None]:
# -------------------------------------------------
# NOTE: The code below is not efficient AT ALL!
# One should vectorize the operations (for loops).
# However, for simplicity, we use loops here.
# This is intensional for those not familiar with vectorization.
# -------------------------------------------------

# Cost function J(w) using the sum of squared errors
def cost_function(w0, w1, x, y):
    n = len(y)  # number of data points
    predictions = w1*x + w0
    return (1/n) * np.sum((y - predictions)**2)

# Generate a dense range of w0 and w1 values
# We assume w0=[-10, 10], w1=[-10, 10]
w0_range = np.linspace(-10, 10, 201)
w1_range = np.linspace(-10, 10, 201)

# Create meshgrid for computing and plotting
w0, w1 = np.meshgrid(w0_range, w1_range)

J_vals = np.zeros(w0.shape) # pre-allocate memory for cost function values

# For-loops should be avoided in Python, but are used here for simplicity
# Calculate the cost function for each combination of w0 and w1
for i in range(w0.shape[0]):
    for j in range(w0.shape[1]):
        J_vals[i, j] = cost_function(w0[i, j], w1[i, j], t, y_noisy)

# Find the minimum of J(w)
J_min = J_vals.min()    # find the minimum value of J(w)
min_index_i, min_index_j = np.where(J_vals == J_min) # find the indices of the minimum value

w0_min = w0[min_index_i[0], min_index_j[0]]      # find w0 where J(w) is minimum
w1_min = w1[min_index_i[0], min_index_j[0]]      # find w1 where J(w) is minimum

print(f"The minimum cost is J(w) = {J_min:.2f}")
print(f"The optimal parameters are w0 = {w0_min:.2f}, w1 = {w1_min:.2f}")
print(f"The best predicted line is h(t) = {w1_min:+.2f}t{w0_min:+.2f}")

Let's plot the solution space in 3D. The 2 horizontal axis show $w_0$ and $w_1$, and the vertical axis shows the associated cost function values.

In [None]:
# Plot the cost function as a 3D surface
fig = plt.figure(figsize=(6, 6), dpi=200)

# Create 3D axis subplot.
#   Since we defined a subplot, and assigned it to ax, we can use ax to plot.
#   Therefore, in the following code, we use "ax.blablaa" and not "plt.blablaa"
ax = fig.add_subplot(111, projection='3d')

# Surface plot
#   x: w0 (Intercept)
#   y: w1 (Slope)
#   z: J(w)
#   cmap: colormap (e.g., "magma"). 
#   For more color options, see: https://matplotlib.org/stable/tutorials/colors/colormaps.html
#
ax.plot_surface(w0, w1, np.log10(J_vals), cmap="magma")

# Mark the minimum point on the surface
ax.scatter(w0_min, w1_min, np.log10(J_min), 
           color="green",   # Minimum point color
           s=100,           # Marker size
           label=f"Min J(w) at (w1, w0)=({w1_min:.2f}, {w0_min:.2f})")

ax.set_xlabel("w0 (Intercept)")
ax.set_ylabel("w1 (Slope)")
ax.set_zlabel("J(w)")
ax.set_title("3D view of the cost function")
plt.legend()

# Rotate the figure for better viewing angle
ax.view_init(elev=20, azim=-30)
plt.show()

3D visualizations are cool, but sometimes not easy to digest. Let's show a 2D presentation of the cost function in the code section below.

In [None]:
# Plot J_vals using imshow in 2D (a heatmap)
plt.figure(figsize=(5, 4),dpi=200)

# Show a cyan circle at the minimum point. It should have been green to be 
# consistent with the previous 3D code, but for better visualization, I changed its color.
plt.scatter(w0_min, w1_min, s=50, color="cyan", edgecolor="black", label="Minimum J(w)")

# Show the cost function as a heatmap
# We have imshow in MATLAB as well. It is quite similar to the one in Python.
#   extent: indicates the extent of the axes [xmin, xmax, ymin, ymax]
#   origin: determines the [0,0] point of the image (default is "upper")
#   cmap: colormap (e.g., "magma"). For more color options, see: https://matplotlib.org/stable/tutorials/colors/colormaps.html
im = plt.imshow(np.log10(J_vals), 
           extent=[w0_range.min(), w0_range.max(), w1_range.min(), w1_range.max()], 
           origin="lower", cmap="magma")
plt.colorbar(im, label=r"$log_{10}$(J(w))") # Add colorbar

# Add contour lines for the log10(J(w))
#    levels: number of contour levels
plt.contour(w0, w1, np.log10(J_vals), levels=20, linewidths=0.5, colors="w", alpha=0.7)

plt.xlabel("w0 (Intercept)")
plt.ylabel("w1 (Slope)")
plt.title("Heatmap of the Cost Function J(w)")
plt.grid(True, linestyle="--", color="w", alpha=0.4)
plt.axis("equal")   # Equal aspect ratio for the figure
plt.tight_layout()  # Ensure everything fits into the figure without clipping
plt.show()

***
### ✅ Check your understanding

- Do you understand what is shown in the figures above? Can you explain your observations?

- What does the cyan circle show? 

- As you can see, the cost function is a convex function. What does it tell us and why is it important?

- Can we get our predicted ($w_0$, $w_1$) closer to the actual ($w_0$, $w_1$)? Think about it and discuss with your classmates.
 
***

# ⛷️ Exercise

I'm not providing step-by-step 

Use the dataset generated from a physical model of force:

$$
F = ma + \mu mg
$$

The dataset is located in the "datasets" folder and its name is "force_data.csv".

* Use Pandas to load the dataset (also see "Python_Jumpstart/05-Data_Analysis.ipynb"):

```python
import pandas as pd
df = pd.read_csv("../datasets/force_data.csv", comment="#")
```

* Explore the data through visualizing the relationships between the input features and the target variable `F`. Note that the dataset contains of multiple features. To list all features (i.e. column names) in a dataset using pandas, use:

```python
df.columns
```

* Assume you are not a physics student and you do not know anything about the relation between different features. This dataset is given to you and you assume the target (`F`) depends linearly on all input features as:

$$
F = w_1 m + w_2 a + w_3 \mu + w_0
$$

Use the **closed-form solution** for linear regression to solve for the optimal weights $w_0, w_1, w_2, w_3$. We discussed the general form in the class:

$$
\mathbf{w^*} = (X^T X)^{-1} X^T y
$$

to solve for the weights.

You need to construct matrix $X$ with a column of ones (for $w_0$) and columns for `mass`, `acceleration`, and `mu`. The $y$ matrix (vector) is the target, `F`. Once you computed the weights, use them to predict $F$ and compare your predictions to the actual values using metrics such as MSE or $R^2$ score. To calculate the inverse, you can use the `np.linalg.inv()` function from NumPy, or any smarter way you come up with. Also review the notebooks in "Python_Jumpstart", particularly "01-Numpy.ipynb".

* This exercise is fun to work on, but it also highlights an important lesson: when applying an ML model, you should always be aware of the data and the true relationships between features. In reality, the applied force is governed by the physical law $F = ma + \mu mg$, not by the simplified liear model we used earlier.

* Let's have more fun (or challenge): Imagine I now tell you that the dataset you worked on was generated in a simulator, mimicking an object moving on the surface of a solar system body other than Earth. Based on the physics in the data, can you figure out which solar system object it was?

***
END
***