# üçé Analyzing **Linear Relationships** in Science with Computational Notebooks

---

## üéØ PURPOSE: Why Use Computational Notebooks for Data Analysis?

Computational notebooks, like **Jupyter Notebooks**, offer a unique blend of **narrative text**, **executable code**, and **visualizations** all in one place. For analyzing experimental science data, this is incredibly powerful:

* **Reproducibility:** A notebook captures the entire analysis process‚Äîthe data, the cleaning steps, the calculations (like linear regression), and the final graph. Anyone can re-run the exact analysis and get the same results. In science, this is crucial for verifying findings.

* **Live Documentation:** Unlike static lab reports or spreadsheets, the code itself is right next to the explanation. You can explain *why* you are performing a linear regression and then execute the regression in the next cell. This makes the **methodology transparent**.

* **Iterative Exploration:** It's easy to change a single line of code (e.g., to exclude an outlier, switch the regression type) and immediately see the effect on the graph and results. This encourages **scientific curiosity and testing hypotheses**.

---

## ‚öñÔ∏è COMPARISON: Notebooks vs. Spreadsheets

| Feature | Computational Notebooks (e.g., Jupyter) | Spreadsheet Software (e.g., Excel, Sheets) |
| :--- | :--- | :--- |
| **Methodology** | **Transparent:** Code explicitly defines every step (cleaning, calculation, plotting). | **Opaque:** Steps are often hidden in cell formulas or menus; harder to see the full process. |
| **Documentation** | **Integrated:** Text/Explanations surround the live code and outputs. | **Separate:** Explanations are usually added in a separate document. |
| **Advanced Analysis** | **Powerful:** Easily handles complex tasks like error analysis, simulation, and advanced statistics. | **Limited:** Often requires add-ons or complex functions for advanced statistical modeling. |
| **Pedagogical Benefit** | Great for teaching **computational thinking** and **data literacy**; learners see the *how* and *why*. | Great for data entry and simple calculations; can quickly create basic graphs. |

---
## üß™ CONTENT: Exploring **Newton's Second Law**

Today, we are analyzing data from a classic high school physics experiment: pulling a cart on a horizontal plane. The goal is to verify **Newton's Second Law** of Motion:

$$\mathbf{F}_{net} = m \cdot \mathbf{a}$$

Where:
* $\mathbf{F}_{net}$ is the **Net Force** (Newtons, N) applied to the system.
* $m$ is the **Mass** (kilograms, kg) of the system (cart + weights).
* $\mathbf{a}$ is the **Acceleration** ($m/s^2$) of the system.

### The Effect of Friction

In a real-world experiment, there is always a resistive force, **friction** ($\mathbf{F}_{friction}$), that opposes the motion. This force must be overcome before the cart can accelerate. The actual equation governing the experiment is:

$$\mathbf{F}_{applied} = m \cdot \mathbf{a} + \mathbf{F}_{friction}$$

When we plot **Applied Force** ($\mathbf{F}$) on the y-axis against **Acceleration** ($\mathbf{a}$) on the x-axis, we expect a **linear relationship** ($y = mx + b$):

* **Slope ($m$):** Represents the **Mass** of the system.
* **Y-Intercept ($b$):** Represents the force needed to overcome the **Friction**.

---

# üìä PART 1: Using Linear Regression to Verify F=ma

## Learning Goal: Understand and apply linear regression

In this first part, we'll learn how to use **linear regression** to analyze experimental data and verify Newton's Second Law. We'll work with all our data points to understand the fundamentals.


---

## üß± CODE BLOCK 1: Loading and Inspecting the Data

Before we start, we need to load our experimental data. This data was collected by measuring the acceleration of a cart system when different pulling forces were applied.

In [3]:
# Code in this cell sets up the environment and loads the data for analysis.
# We will use two core tools: one for data handling and one for plotting.

# [Language-A: Python]
import pandas as pd # Tool for data manipulation and analysis
import numpy as np # Tool for numerical operations, like calculating the line of best fit
import matplotlib.pyplot as plt # Tool for creating static, interactive, and animated visualizations
from scipy.stats import linregress # Specific tool for linear regression

# [Language-B: R]
# library(tidyverse) # Collection of R packages designed for data science
# library(mosaic) # Helpful package for statistics and data manipulation

# Experimental data: [Acceleration (a) in m/s^2, Applied Force (F) in N]
data = {
    'Acceleration': [0.00, 0.15, 0.23, 0.31, 0.45, 0.52, 0.59, 0.76, 0.90, 0.98, 1.05, 1.20, 1.40, 1.55, 1.70],
    'Force': [0.60, 0.48, 0.54, 0.60, 0.72, 0.78, 0.85, 0.98, 1.40, 1.18, 1.25, 1.37, 1.50, 1.63, 1.76]
}# Note: The first point (0.00, 0.35) shows that 0.35 N of force was needed to overcome friction and cause *some* motion, but the measured acceleration was negligible (approx. 0.00).

# Create a data structure (often called a 'DataFrame') to hold the data
# [Language-A: Python]
df = pd.DataFrame(data)

# [Language-B: R]
# df <- data.frame(data)

# Display the first few rows to confirm the data loaded correctly.
# [Language-A: Python]
print(df.head())

# [Language-B: R]
# head(df)

ModuleNotFoundError: No module named 'pandas'

---

## üìã Theoretical (Accepted) Values

Before we analyze our experimental data, let's note the **theoretical (accepted) values** for comparison:

* **Theoretical Mass** ($m_{accepted}$): **0.850 kg**
  - This is the known mass of the cart system (cart + weights)

* **Expected Kinetic Friction** ($F_{expected}$): **0.350 N**
  - This is a reasonable estimate for the kinetic friction force
  - Kinetic friction is typically lower than static friction

**Goal:** Use linear regression to find the **experimental** values of mass and friction, then compare them to these accepted values.


In [None]:
# Store theoretical (accepted) values for later comparison
m_accepted = 0.850  # kg - Known mass of the cart system
F_expected = 0.350  # N - Expected kinetic friction force

print('=' * 60)
print('THEORETICAL (ACCEPTED) VALUES')
print('=' * 60)
print(f'Theoretical Mass: {m_accepted:.3f} kg')
print(f'Expected Kinetic Friction: {F_expected:.3f} N')
print('\nWe will compare our experimental results to these values.')


**Note:** We've loaded all our experimental data. Notice there are data points including one at acceleration = 0 (static friction) and some that might be outliers. We'll work with all of them in Part 1 to understand linear regression.


---

## üìà Step 1: Visualize the Data

Let's start by creating a scatter plot of all our data points. This helps us see the overall relationship between Force and Acceleration.


In [4]:
# Code to create a scatter plot of Force vs. Acceleration for ALL data.

# Create the plot
# [Language-A: Python]
plt.figure(figsize=(8, 5)) # Sets the size of the visualization
plt.scatter(df['Acceleration'], df['Force'], color='darkblue', s=100, label='All Experimental Data', alpha=0.7)

# [Language-B: R]
# ggplot(df, aes(x=Acceleration, y=Force)) + geom_point(color='darkblue', size=3) +
# labs(title='Force vs. Acceleration: All Data', x='Acceleration (a in m/s¬≤)', y='Force (F in N)') + theme_minimal()

# Add helpful labels and a title
plt.title('Force vs. Acceleration: All Data Points', fontsize=14, fontweight='bold')
plt.xlabel('Acceleration (a in $m/s^2$)', fontsize=12)
plt.ylabel('Applied Force (F in N)', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend()
plt.show()

print('\nObservation: We can see all data points. Notice the point at a=0 (static friction) and potential outliers in the middle range.')

NameError: name 'plt' is not defined

---

## üìö Understanding Linear Regression: Finding the "Best Fit" Line

Before we run our first regression, let's understand what **linear regression** actually does.

### What is Linear Regression?

**Linear regression** is a statistical method that finds the **"line of best fit"** through a set of data points. Imagine you have a scatter plot of points, and you want to draw a straight line that comes as close as possible to all of them. Linear regression does this mathematically.

### The Goal: Minimize Error

The regression algorithm finds the line that **minimizes the total squared distance** between the line and all data points. Think of it like this:
- For each data point, measure how far it is from the line (vertically)
- Square that distance (to make all distances positive and emphasize larger errors)
- Add up all these squared distances
- The "best fit" line is the one that makes this total as small as possible

### What We Get from Regression

When we perform linear regression, we get:

1. **Slope ($m$):** The steepness of the line. In our experiment, this represents the **mass** of the cart system.
2. **Y-Intercept ($b$):** Where the line crosses the y-axis. In our experiment, this represents the **friction force**.
3. **R-squared ($R^2$):** A measure of how well the line fits the data (we'll explain this in detail below).

### The Equation

The line of best fit follows the familiar equation: $y = mx + b$, where:
- $y$ is the dependent variable (Force, in our case)
- $x$ is the independent variable (Acceleration, in our case)
- $m$ is the slope (Mass)
- $b$ is the y-intercept (Friction)

In our physics context, this becomes: $\mathbf{F} = m \cdot \mathbf{a} + \mathbf{F}_{friction}$


---

## üî¨ Step 2: Run Linear Regression

Now let's apply linear regression to find the line of best fit through our data points.


In [None]:
# Regression on ALL data (including outliers) - This is our baseline
# [Language-A: Python]
X_all = df['Acceleration']
Y_all = df['Force']

# Perform linear regression on ALL data
# The linregress function finds the line that minimizes the sum of squared distances
slope_all, intercept_all, r_value_all, p_value_all, std_err_all = linregress(X_all, Y_all)
r_squared_all = r_value_all**2  # R-squared is the square of the correlation coefficient

# [Language-B: R]
# model_all <- lm(Force ~ Acceleration, data=df)
# slope_all <- coef(model_all)['Acceleration']
# intercept_all <- coef(model_all)['(Intercept)']
# r_squared_all <- summary(model_all)$r.squared
# std_err_all <- summary(model_all)$coefficients['Acceleration', 'Std. Error']

# Print baseline results
print('=' * 60)
print('=' * 60)
print(f"Mass (Slope): {slope_all:.3f} kg")
print(f"Friction Force (Y-Intercept): {intercept_all:.3f} N")
print(f"R-squared: {r_squared_all:.4f} ({r_squared_all*100:.1f}% of variation explained)")
print(f"Standard Error of Mass: {std_err_all:.3f} kg")
print(f"\nEquation: F = {slope_all:.3f}a + {intercept_all:.3f}")

# Interpret R-squared
print('\n' + '-' * 60)
print('INTERPRETING THE R-SQUARED VALUE:')
print('-' * 60)
if r_squared_all >= 0.95:
    interpretation = "Excellent fit! The line explains almost all the variation."
elif r_squared_all >= 0.80:
    interpretation = "Good fit. The line explains most of the variation, but there's some scatter."
elif r_squared_all >= 0.50:
    interpretation = "Moderate fit. The line explains about half the variation‚Äîsignificant scatter present."
else:
    interpretation = "Poor fit. The line explains little of the variation‚Äîweak linear relationship."

print(f"R¬≤ = {r_squared_all:.4f} means: {interpretation}")
print(f"      As we clean the data, we expect R¬≤ to improve (increase).")

---

## üìä Understanding R-squared ($R^2$): How Well Does Our Line Fit?

After running the regression, you saw an **R-squared** value. Let's understand what this important statistic means.

### What is R-squared?

**R-squared** (also written as $R^2$ or "R-squared") is a number between 0 and 1 that tells us **what percentage of the variation in the data is explained by our linear model**.

### Interpreting R-squared Values

- **$R^2 = 1.0$ (or 100%):** Perfect fit! All data points lie exactly on the line. This is extremely rare in real experiments.
- **$R^2 = 0.95$ (or 95%):** Excellent fit! The line explains 95% of the variation. Most data points are very close to the line.
- **$R^2 = 0.80$ (or 80%):** Good fit. The line explains 80% of the variation. Some scatter, but a clear linear trend.
- **$R^2 = 0.50$ (or 50%):** Moderate fit. The line explains only half the variation. There's significant scatter.
- **$R^2 = 0.0$ (or 0%):** No linear relationship. The line doesn't explain any of the variation‚Äîthe data is completely random with respect to the line.

### In Our Experiment

For our Force vs. Acceleration data:
- A high $R^2$ (close to 1.0) means the data follows Newton's Second Law very well‚Äîthe relationship is strongly linear.
- A low $R^2$ (far from 1.0) suggests either:
  - Experimental errors (outliers, measurement mistakes)
  - The relationship isn't truly linear (maybe friction changes with speed)
  - There are other forces we haven't accounted for

### Why We Care About R¬≤

As we clean our data (remove outliers), we expect $R^2$ to **increase**. This tells us that:
1. Our cleaned data has a stronger linear relationship
2. We've successfully removed problematic data points
3. Our final model is more reliable

**Watch the R¬≤ values as we progress:** Baseline ‚Üí No Static Friction ‚Üí Clean Data. You should see them improve!


---

## üìê Step 3: Interpret the Results

Let's understand what our regression tells us:

* **Slope** = **Mass** of the cart system
* **Y-Intercept** = **Friction Force** needed to overcome resistance
* **R¬≤** = How well our line fits the data (closer to 1.0 is better)

Our equation: $\mathbf{F} = m \cdot \mathbf{a} + \mathbf{F}_{friction}$


---

## üìä Step 4: Visualize the Best Fit Line

Let's plot our data points along with the regression line to see how well it fits.


In [None]:
# Plot data and regression line
# Use variables from previous regression
X_all = df['Acceleration']
Y_all = df['Force']

# Calculate line of best fit
Y_fit = slope_all * X_all + intercept_all

# Create plot
plt.figure(figsize=(8, 5))
plt.scatter(X_all, Y_all, color='darkblue', s=100, alpha=0.7, label='Experimental Data', zorder=2)
plt.plot(X_all, Y_fit, color='red', linestyle='-', linewidth=2, label='Line of Best Fit', zorder=1)

plt.title('Force vs. Acceleration with Linear Regression', fontsize=14, fontweight='bold')
plt.xlabel('Acceleration (a in $m/s^2$)', fontsize=12)
plt.ylabel('Applied Force (F in N)', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend()
plt.annotate(f'F = {slope_all:.3f}a + {intercept_all:.3f}\nR¬≤ = {r_squared_all:.4f}', 
             xy=(0.05, 0.95), xycoords='axes fraction', fontsize=10, 
             bbox=dict(boxstyle="round,pad=0.5", fc="yellow", alpha=0.5))
plt.tight_layout()
plt.show()

print(f'\nSummary:')
print(f'  ‚Ä¢ Mass (from slope): {slope_all:.3f} kg')
print(f'  ‚Ä¢ Friction Force (from intercept): {intercept_all:.3f} N')
print(f'  ‚Ä¢ R¬≤ = {r_squared_all:.4f} ({r_squared_all*100:.1f}% of variation explained)')


---

## üîç Comparing Our Results to Theoretical Values

Now let's compare our experimental results from Part 1 to the theoretical (accepted) values we defined earlier.

**Percent Error** tells us how far off our experimental value is from the accepted value:

$$\text{Percent Error} = \left| \frac{\text{Experimental Value} - \text{Accepted Value}}{\text{Accepted Value}} \right| \times 100\%$$

A smaller percent error means our experimental result is closer to the theoretical value.


In [None]:
# Compare Part 1 experimental results to theoretical values
# Use the theoretical values defined earlier: m_accepted and F_expected

# Experimental values from Part 1 regression
experimental_mass_part1 = slope_all  # Mass from slope
experimental_friction_part1 = intercept_all  # Friction from intercept

# Calculate percent errors
mass_error_part1 = abs((experimental_mass_part1 - m_accepted) / m_accepted) * 100
friction_error_part1 = abs((experimental_friction_part1 - F_expected) / F_expected) * 100

# Print comparison
print('=' * 60)
print('PART 1: EXPERIMENTAL vs THEORETICAL VALUES')
print('=' * 60)
print(f"\n{'Quantity':<25} {'Theoretical':<15} {'Experimental':<15} {'Percent Error'}")
print('-' * 60)
print(f"{'Mass (kg)':<25} {m_accepted:<15.3f} {experimental_mass_part1:<15.3f} {mass_error_part1:>15.2f}%")
print(f"{'Friction (N)':<25} {F_expected:<15.3f} {experimental_friction_part1:<15.3f} {friction_error_part1:>15.2f}%")

print('\n' + '-' * 60)
print('INTERPRETATION:')
print('-' * 60)
print(f"‚Ä¢ Mass Error: {mass_error_part1:.2f}% - ", end="")
if mass_error_part1 < 5:
    print("Excellent! Very close to theoretical value.")
elif mass_error_part1 < 10:
    print("Good! Reasonably close to theoretical value.")
elif mass_error_part1 < 20:
    print("Moderate. Some discrepancy from theoretical value.")
else:
    print("Large error. May need data cleaning (see Part 2).")

print(f"‚Ä¢ Friction Error: {friction_error_part1:.2f}% - ", end="")
if friction_error_part1 < 10:
    print("Good agreement with expected value.")
elif friction_error_part1 < 20:
    print("Reasonable agreement.")
else:
    print("Significant difference. May improve with data cleaning.")

print('\n' + '=' * 60)
print('NOTE: In Part 2, we will clean the data and see if we can improve these errors.')
print('=' * 60)


---

## ‚úÖ Part 1 Summary

**Congratulations!** You've successfully:

1. ‚úÖ Loaded and visualized experimental data
2. ‚úÖ Applied linear regression to find the line of best fit
3. ‚úÖ Interpreted the slope (mass) and intercept (friction)
4. ‚úÖ Understood R¬≤ as a measure of fit quality
5. ‚úÖ Compared experimental results to theoretical values using percent error

**Key Takeaway:** Linear regression allows us to extract physical quantities (mass and friction) from experimental data by finding the best-fit line. We can then compare these experimental values to theoretical values to assess the quality of our measurements.

**Next:** In Part 2, we'll learn how to improve our results by identifying and removing problematic data points.

---


# üîç PART 2: Improving Our Analysis Through Data Cleaning

## Learning Goal: Learn how to identify and remove problematic data points

**Remember from Part 1:** We performed linear regression on all data points and found our initial results.

**Question:** Can we improve our results by removing problematic data points?

In real experiments, some data points might be problematic due to:
* Experimental errors (measurement mistakes)
* Different physical regimes (like static vs. kinetic friction)
* Equipment issues (sticky wheels, etc.)

By carefully examining our data and removing problematic points, we can get a better fit and more accurate results.


---

## ü§î Reflection: Can We Improve Our Results?

Look back at our scatter plot from Part 1. Do you notice:

* A point at acceleration = 0? (This represents static friction)
* Any points that seem far from where a straight line would be?

These might be candidates for removal. Let's examine them systematically.


---

## ‚öñÔ∏è Step 1: Remove the Static Friction Point (a=0)

**Physical Justification:** The point at acceleration = 0 represents **static friction**‚Äîthe force needed to *start* the cart moving. Once moving, we deal with **kinetic friction**, which is different.

For analyzing Newton's Second Law ($F = ma$), we want to focus on **accelerating** data points. Let's remove the a=0 point and see how it affects our results.


In [None]:
# Remove the a=0 point (static friction point) - this is index 0
# [Language-A: Python]
df_no_static = df.iloc[1:].copy()  # Remove first row (a=0 point)
X_no_static = df_no_static['Acceleration']
Y_no_static = df_no_static['Force']

# Perform linear regression on data without static friction point
slope_no_static, intercept_no_static, r_value_no_static, p_value_no_static, std_err_no_static = linregress(X_no_static, Y_no_static)
r_squared_no_static = r_value_no_static**2

# [Language-B: R]
# df_no_static <- slice(df, 2:n())
# model_no_static <- lm(Force ~ Acceleration, data=df_no_static)
# slope_no_static <- coef(model_no_static)['Acceleration']
# intercept_no_static <- coef(model_no_static)['(Intercept)']
# r_squared_no_static <- summary(model_no_static)$r.squared
# std_err_no_static <- summary(model_no_static)$coefficients['Acceleration', 'Std. Error']

# Print results
print('=' * 60)
print('=' * 60)
print('REGRESSION: Data WITHOUT Static Friction Point (a=0)')
print('=' * 60)
print('=' * 60)
print(f"Mass (Slope): {slope_no_static:.3f} kg")
print(f"Friction Force (Y-Intercept): {intercept_no_static:.3f} N")
print(f"R-squared: {r_squared_no_static:.4f} ({r_squared_no_static*100:.1f}% of variation explained)")
print(f"Standard Error of Mass: {std_err_no_static:.3f} kg")
print(f"\nEquation: F = {slope_no_static:.3f}a + {intercept_no_static:.3f}")

# Compare R¬≤ improvement
r2_improvement = r_squared_no_static - r_squared_all
print(f"\n" + '-' * 60)
print('R-SQUARED IMPROVEMENT:')
print('-' * 60)
print(f"Baseline (all data):     R¬≤ = {r_squared_all:.4f} ({r_squared_all*100:.1f}%)")
print(f"Without static friction: R¬≤ = {r_squared_no_static:.4f} ({r_squared_no_static*100:.1f}%)")
print(f"Change:                  ŒîR¬≤ = {r2_improvement:+.4f} ({r2_improvement*100:+.1f} percentage points)")

if r2_improvement > 0:
    print(f"\n‚úì Good! Removing the static friction point improved the fit.")
    print(f"  The line now explains {r2_improvement*100:.1f}% more of the variation in the data.")
elif r2_improvement < 0:
    print(f"\n‚ö† Unexpected: R¬≤ decreased. This might indicate the a=0 point was actually useful.")
else:
    print(f"\n‚Üí No change in R¬≤. The a=0 point didn't significantly affect the fit.")

---

## üìâ Step 2: Examine Residuals to Find Outliers

**Residuals** are the vertical distances between each data point and the regression line. They show us which points don't fit well.

By plotting residuals, we can identify potential outliers that might need to be removed.


In [None]:
# Calculate residuals for data without static friction point
# Residual = Observed Y - Predicted Y
# [Language-A: Python]
Y_fit_no_static = slope_no_static * X_no_static + intercept_no_static
residuals_no_static = Y_no_static - Y_fit_no_static

# [Language-B: R]
# Y_fit_no_static <- predict(model_no_static, newdata = data.frame(Acceleration = X_no_static))
# residuals_no_static <- residuals(model_no_static)

# Calculate statistics for outlier detection
residual_mean = np.mean(residuals_no_static)
residual_std = np.std(residuals_no_static, ddof=1)  # Sample standard deviation

# Create a table showing all data points with their residuals
print('=' * 80)
print('DATA POINTS WITH RESIDUALS (after removing a=0 point)')
print('=' * 80)
print(f"{'Index':<8} {'Acceleration':<15} {'Force':<10} {'Residual':<12} {'|Residual|/œÉ':<15} {'Status'}")
print('-' * 80)

# Create a list to store point information for user selection
point_info = []
for i, idx in enumerate(df_no_static.index):
    accel = df_no_static.loc[idx, 'Acceleration']
    force = df_no_static.loc[idx, 'Force']
    residual = residuals_no_static.iloc[i] if hasattr(residuals_no_static, 'iloc') else residuals_no_static[i]
    abs_residual_sigma = abs(residual) / residual_std
    
    # Determine status
    if abs_residual_sigma > 3:
        status = ">3œÉ (likely outlier)"
    elif abs_residual_sigma > 2:
        status = ">2œÉ (possible outlier)"
    else:
        status = "Normal"
    
    point_info.append({
        'index': idx,
        'accel': accel,
        'force': force,
        'residual': residual,
        'abs_residual_sigma': abs_residual_sigma
    })
    
    print(f"{idx:<8} {accel:<15.2f} {force:<10.2f} {residual:<12.4f} {abs_residual_sigma:<15.2f} {status}")

print('-' * 80)
print(f"\nMean Residual: {residual_mean:.4f} N")
print(f"Standard Deviation (œÉ): {residual_std:.4f} N")
print(f"\nGuidelines:")
print(f"  ‚Ä¢ Points with |Residual|/œÉ > 2 may be outliers")
print(f"  ‚Ä¢ Points with |Residual|/œÉ > 3 are likely outliers")
print(f"\nUse the table above to identify which points you want to remove.")

# Create residual plot with point labels
plt.figure(figsize=(12, 6))

# Plot residuals
scatter = plt.scatter(X_no_static, residuals_no_static, color='green', s=100, alpha=0.7, label='Residuals', zorder=3)
plt.hlines(0, min(X_no_static), max(X_no_static), color='red', linestyle='--', linewidth=2, label='Zero Line', zorder=1)

# Add ¬±2œÉ and ¬±3œÉ bands for outlier detection
plt.axhspan(-2*residual_std, 2*residual_std, alpha=0.1, color='yellow', label='¬±2œÉ band', zorder=0)
plt.axhspan(-3*residual_std, 3*residual_std, alpha=0.05, color='orange', label='¬±3œÉ band', zorder=0)

# Label points with their index for easy identification
for i, idx in enumerate(df_no_static.index):
    accel = X_no_static.iloc[i] if hasattr(X_no_static, 'iloc') else X_no_static[i]
    residual = residuals_no_static.iloc[i] if hasattr(residuals_no_static, 'iloc') else residuals_no_static[i]
    plt.annotate(f'{idx}', (accel, residual), xytext=(5, 5), textcoords='offset points', 
                fontsize=8, alpha=0.7)

# Add labels and title
plt.title('Residual Plot: Identifying Outliers (Numbers = Data Point Index)', fontsize=14, fontweight='bold')
plt.xlabel('Acceleration (a in $m/s^2$)', fontsize=12)
plt.ylabel('Residual (N)', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend()
plt.tight_layout()
plt.show()

---

## üéØ Step 3: Choose Which Points to Remove

Based on the residual plot and table above, you can now choose which data points to remove. Look for points with:

* Large residuals (far from zero)
* |Residual|/œÉ > 2 (statistically significant deviation)

Edit the code below to specify which points to remove.


In [None]:
# ============================================================================
# USER INPUT: Choose which points to remove
# ============================================================================
# Option 1: Remove points by INDEX (recommended - use the index numbers from the plot)
#           Example: points_to_remove_by_index = [2, 5]  # Remove points with indices 2 and 5
#           Example: points_to_remove_by_index = []     # Don't remove any points

points_to_remove_by_index = []  # <-- EDIT THIS: List of indices to remove (e.g., [2, 5])

# Option 2: Remove points by ACCELERATION VALUE
#           Example: points_to_remove_by_accel = [0.23, 0.52]  # Remove points with these acceleration values
#           Example: points_to_remove_by_accel = []            # Don't remove any points

points_to_remove_by_accel = []  # <-- EDIT THIS: List of acceleration values to remove (e.g., [0.23, 0.52])

# Option 3: Automatically remove points using 2œÉ rule (uncomment to use)
#           This will automatically remove all points with |Residual|/œÉ > 2
# use_auto_2sigma = True  # <-- Uncomment this line to use automatic 2œÉ rule

# ============================================================================
# Process user selections
# ============================================================================

# Start with all data points (excluding the a=0 point that was already removed)
indices_to_remove = set()

# Add indices specified by user
if len(points_to_remove_by_index) > 0:
    for idx in points_to_remove_by_index:
        if idx in df_no_static.index:
            indices_to_remove.add(idx)
        else:
            print(f"Warning: Index {idx} not found in dataset (already removed or doesn't exist)")

# Add indices based on acceleration values
if len(points_to_remove_by_accel) > 0:
    for accel_val in points_to_remove_by_accel:
        matching = df_no_static[df_no_static['Acceleration'] == accel_val]
        if len(matching) > 0:
            indices_to_remove.update(matching.index.tolist())
        else:
            print(f"Warning: No point found with acceleration = {accel_val} m/s¬≤")

# Option: Use automatic 2œÉ rule (if enabled)
try:
    if use_auto_2sigma:
        outlier_threshold = 2 * residual_std
        outlier_mask = np.abs(residuals_no_static) > outlier_threshold
        auto_outlier_indices = df_no_static.index[outlier_mask].tolist()
        indices_to_remove.update(auto_outlier_indices)
        print(f"Auto-detected {len(auto_outlier_indices)} outlier(s) using 2œÉ rule")
except NameError:
    pass  # use_auto_2sigma not defined, skip automatic detection

# Convert to list and sort
indices_to_remove = sorted(list(indices_to_remove))

# Print what will be removed
print('=' * 60)
print('POINTS SELECTED FOR REMOVAL')
print('=' * 60)
if len(indices_to_remove) > 0:
    print(f"Removing {len(indices_to_remove)} point(s):")
    for idx in indices_to_remove:
        accel = df_no_static.loc[idx, 'Acceleration']
        force = df_no_static.loc[idx, 'Force']
        # Find residual for this point
        residual_idx = df_no_static.index.get_loc(idx)
        residual = residuals_no_static.iloc[residual_idx] if hasattr(residuals_no_static, 'iloc') else residuals_no_static[residual_idx]
        abs_residual_sigma = abs(residual) / residual_std
        print(f"  Index {idx}: a={accel:.2f} m/s¬≤, F={force:.2f} N, |Residual|/œÉ={abs_residual_sigma:.2f}")
else:
    print("No points selected for removal - using all data points (except a=0)")

# Create clean dataset
if len(indices_to_remove) > 0:
    df_clean = df_no_static[~df_no_static.index.isin(indices_to_remove)].copy()
else:
    df_clean = df_no_static.copy()

X_clean = df_clean['Acceleration']
Y_clean = df_clean['Force']

print(f"\nClean dataset: {len(df_clean)} points (removed {len(indices_to_remove)} point(s))")

# Perform FINAL regression on clean data
slope_clean, intercept_clean, r_value_clean, p_value_clean, std_err_clean = linregress(X_clean, Y_clean)
r_squared_clean = r_value_clean**2

# [Language-B: R]
# df_clean <- df_no_static[!df_no_static$index %in% indices_to_remove, ]
# model_clean <- lm(Force ~ Acceleration, data=df_clean)
# slope_clean <- coef(model_clean)['Acceleration']
# intercept_clean <- coef(model_clean)['(Intercept)']
# r_squared_clean <- summary(model_clean)$r.squared
# std_err_clean <- summary(model_clean)$coefficients['Acceleration', 'Std. Error']

# Store outlier_indices for use in visualization (convert to list for compatibility)
outlier_indices = indices_to_remove

# Print final results
print('\n' + '=' * 60)
print('FINAL REGRESSION: CLEAN DATA')
print('=' * 60)
print(f"Mass (Slope): {slope_clean:.3f} kg")
print(f"Friction Force (Y-Intercept): {intercept_clean:.3f} N")
print(f"R-squared: {r_squared_clean:.4f} ({r_squared_clean*100:.1f}% of variation explained)")
print(f"Standard Error of Mass: {std_err_clean:.3f} kg")
print(f"\nFinal Equation: F = {slope_clean:.3f}a + {intercept_clean:.3f}")

# Interpret final R¬≤
print('\n' + '-' * 60)
print('FINAL R-SQUARED INTERPRETATION:')
print('-' * 60)
if r_squared_clean >= 0.95:
    interpretation = "Excellent fit! Our cleaned data shows a very strong linear relationship."
elif r_squared_clean >= 0.80:
    interpretation = "Good fit. The linear relationship is strong, with some expected experimental scatter."
elif r_squared_clean >= 0.50:
    interpretation = "Moderate fit. There's a linear trend, but significant scatter remains."
else:
    interpretation = "Poor fit. The linear relationship is weak‚Äîconsider investigating experimental errors."

print(f"R¬≤ = {r_squared_clean:.4f} means: {interpretation}")
print(f"\nThis is our best estimate of the true relationship between Force and Acceleration.")

# Compare all three regressions
print('\n' + '=' * 60)
print('\n' + '=' * 60)
print('COMPARISON: Part 1 vs Part 2 (Final Clean Data)')
print('=' * 60)
print(f"{'Metric':<25} {'Part 1 (All Data)':<20} {'Part 2 (Clean Data)':<20} {'Improvement'}")
print('-' * 60)
print(f"{'R-squared':<25} {r_squared_all:<20.4f} {r_squared_clean:<20.4f} {r_squared_clean-r_squared_all:+.4f}")
print(f"{'R¬≤ (% explained)':<25} {r_squared_all*100:<20.1f}% {r_squared_clean*100:<20.1f}% {r_squared_clean*100-r_squared_all*100:+.1f}%")
print(f"{'Mass (kg)':<25} {slope_all:<20.3f} {slope_clean:<20.3f} {slope_clean-slope_all:+.3f}")
print(f"{'Friction (N)':<25} {intercept_all:<20.3f} {intercept_clean:<20.3f} {intercept_clean-intercept_all:+.3f}")
print(f"{'Metric':<25} {'Baseline':<12} {'No Static':<12} {'Clean Data':<12} {'Change'}")
print('-' * 60)
print(f"{'R-squared':<25} {r_squared_all:<12.4f} {r_squared_no_static:<12.4f} {r_squared_clean:<12.4f} {r_squared_clean-r_squared_all:+.4f}")
print(f"{'R¬≤ (% explained)':<25} {r_squared_all*100:<12.1f}% {r_squared_no_static*100:<12.1f}% {r_squared_clean*100:<12.1f}% {r_squared_clean*100-r_squared_all*100:+.1f}%")
print(f"{'Mass (kg)':<25} {slope_all:<12.3f} {slope_no_static:<12.3f} {slope_clean:<12.3f} {slope_clean-slope_all:+.3f}")
print(f"{'Friction (N)':<25} {intercept_all:<12.3f} {intercept_no_static:<12.3f} {intercept_clean:<12.3f} {intercept_clean-intercept_all:+.3f}")

print('\n' + '-' * 60)
print('\n' + '-' * 60)
print('KEY INSIGHT:')
print('-' * 60)
total_improvement = r_squared_clean - r_squared_all
print(f'By cleaning the data, we improved R¬≤ by {total_improvement:.4f} ({total_improvement*100:.1f} percentage points).')
print(f'This means our final model explains {total_improvement*100:.1f}% more of the variation than Part 1.')
print(f'Data cleaning successfully improved our analysis!')
total_improvement = r_squared_clean - r_squared_all
print(f"By cleaning the data, we improved R¬≤ by {total_improvement:.4f} ({total_improvement*100:.1f} percentage points).")
print(f"This means our final model explains {total_improvement*100:.1f}% more of the variation than the baseline.")
print(f"The data cleaning process successfully removed problematic points and improved our analysis!")

---

## üìä Step 4: Compare Results

Let's see how our cleaned data compares to our original analysis from Part 1.


In [None]:
# ============================================================================
# USER INPUT: Choose which points to remove
# ============================================================================
# Option 1: Remove points by INDEX (recommended - use the index numbers from the plot)
#           Example: points_to_remove_by_index = [2, 5]  # Remove points with indices 2 and 5
#           Example: points_to_remove_by_index = []     # Don't remove any points

points_to_remove_by_index = []  # <-- EDIT THIS: List of indices to remove (e.g., [2, 5])

# Option 2: Remove points by ACCELERATION VALUE
#           Example: points_to_remove_by_accel = [0.23, 0.52]  # Remove points with these acceleration values
#           Example: points_to_remove_by_accel = []            # Don't remove any points

points_to_remove_by_accel = []  # <-- EDIT THIS: List of acceleration values to remove (e.g., [0.23, 0.52])

# Option 3: Automatically remove points using 2œÉ rule (uncomment to use)
#           This will automatically remove all points with |Residual|/œÉ > 2
# use_auto_2sigma = True  # <-- Uncomment this line to use automatic 2œÉ rule

# ============================================================================
# Process user selections
# ============================================================================

# Start with all data points (excluding the a=0 point that was already removed)
indices_to_remove = set()

# Add indices specified by user
if len(points_to_remove_by_index) > 0:
    for idx in points_to_remove_by_index:
        if idx in df_no_static.index:
            indices_to_remove.add(idx)
        else:
            print(f"Warning: Index {idx} not found in dataset (already removed or doesn't exist)")

# Add indices based on acceleration values
if len(points_to_remove_by_accel) > 0:
    for accel_val in points_to_remove_by_accel:
        matching = df_no_static[df_no_static['Acceleration'] == accel_val]
        if len(matching) > 0:
            indices_to_remove.update(matching.index.tolist())
        else:
            print(f"Warning: No point found with acceleration = {accel_val} m/s¬≤")

# Option: Use automatic 2œÉ rule (if enabled)
try:
    if use_auto_2sigma:
        outlier_threshold = 2 * residual_std
        outlier_mask = np.abs(residuals_no_static) > outlier_threshold
        auto_outlier_indices = df_no_static.index[outlier_mask].tolist()
        indices_to_remove.update(auto_outlier_indices)
        print(f"Auto-detected {len(auto_outlier_indices)} outlier(s) using 2œÉ rule")
except NameError:
    pass  # use_auto_2sigma not defined, skip automatic detection

# Convert to list and sort
indices_to_remove = sorted(list(indices_to_remove))

# Print what will be removed
print('=' * 60)
print('POINTS SELECTED FOR REMOVAL')
print('=' * 60)
if len(indices_to_remove) > 0:
    print(f"Removing {len(indices_to_remove)} point(s):")
    for idx in indices_to_remove:
        accel = df_no_static.loc[idx, 'Acceleration']
        force = df_no_static.loc[idx, 'Force']
        # Find residual for this point
        residual_idx = df_no_static.index.get_loc(idx)
        residual = residuals_no_static.iloc[residual_idx] if hasattr(residuals_no_static, 'iloc') else residuals_no_static[residual_idx]
        abs_residual_sigma = abs(residual) / residual_std
        print(f"  Index {idx}: a={accel:.2f} m/s¬≤, F={force:.2f} N, |Residual|/œÉ={abs_residual_sigma:.2f}")
else:
    print("No points selected for removal - using all data points (except a=0)")

# Create clean dataset
if len(indices_to_remove) > 0:
    df_clean = df_no_static[~df_no_static.index.isin(indices_to_remove)].copy()
else:
    df_clean = df_no_static.copy()

X_clean = df_clean['Acceleration']
Y_clean = df_clean['Force']

print(f"\nClean dataset: {len(df_clean)} points (removed {len(indices_to_remove)} point(s))")

# Perform FINAL regression on clean data
slope_clean, intercept_clean, r_value_clean, p_value_clean, std_err_clean = linregress(X_clean, Y_clean)
r_squared_clean = r_value_clean**2

# [Language-B: R]
# df_clean <- df_no_static[!df_no_static$index %in% indices_to_remove, ]
# model_clean <- lm(Force ~ Acceleration, data=df_clean)
# slope_clean <- coef(model_clean)['Acceleration']
# intercept_clean <- coef(model_clean)['(Intercept)']
# r_squared_clean <- summary(model_clean)$r.squared
# std_err_clean <- summary(model_clean)$coefficients['Acceleration', 'Std. Error']

# Store outlier_indices for use in visualization (convert to list for compatibility)
outlier_indices = indices_to_remove

# Print final results
print('\n' + '=' * 60)
print('FINAL REGRESSION: CLEAN DATA')
print('=' * 60)
print(f"Mass (Slope): {slope_clean:.3f} kg")
print(f"Friction Force (Y-Intercept): {intercept_clean:.3f} N")
print(f"R-squared: {r_squared_clean:.4f} ({r_squared_clean*100:.1f}% of variation explained)")
print(f"Standard Error of Mass: {std_err_clean:.3f} kg")
print(f"\nFinal Equation: F = {slope_clean:.3f}a + {intercept_clean:.3f}")

# Interpret final R¬≤
print('\n' + '-' * 60)
print('FINAL R-SQUARED INTERPRETATION:')
print('-' * 60)
if r_squared_clean >= 0.95:
    interpretation = "Excellent fit! Our cleaned data shows a very strong linear relationship."
elif r_squared_clean >= 0.80:
    interpretation = "Good fit. The linear relationship is strong, with some expected experimental scatter."
elif r_squared_clean >= 0.50:
    interpretation = "Moderate fit. There's a linear trend, but significant scatter remains."
else:
    interpretation = "Poor fit. The linear relationship is weak‚Äîconsider investigating experimental errors."

print(f"R¬≤ = {r_squared_clean:.4f} means: {interpretation}")
print(f"\nThis is our best estimate of the true relationship between Force and Acceleration.")

# Compare all three regressions
print('\n' + '=' * 60)
print('\n' + '=' * 60)
print('COMPARISON: Part 1 vs Part 2 (Final Clean Data)')
print('=' * 60)
print(f"{'Metric':<25} {'Part 1 (All Data)':<20} {'Part 2 (Clean Data)':<20} {'Improvement'}")
print('-' * 60)
print(f"{'R-squared':<25} {r_squared_all:<20.4f} {r_squared_clean:<20.4f} {r_squared_clean-r_squared_all:+.4f}")
print(f"{'R¬≤ (% explained)':<25} {r_squared_all*100:<20.1f}% {r_squared_clean*100:<20.1f}% {r_squared_clean*100-r_squared_all*100:+.1f}%")
print(f"{'Mass (kg)':<25} {slope_all:<20.3f} {slope_clean:<20.3f} {slope_clean-slope_all:+.3f}")
print(f"{'Friction (N)':<25} {intercept_all:<20.3f} {intercept_clean:<20.3f} {intercept_clean-intercept_all:+.3f}")
print(f"{'Metric':<25} {'Baseline':<12} {'No Static':<12} {'Clean Data':<12} {'Change'}")
print('-' * 60)
print(f"{'R-squared':<25} {r_squared_all:<12.4f} {r_squared_no_static:<12.4f} {r_squared_clean:<12.4f} {r_squared_clean-r_squared_all:+.4f}")
print(f"{'R¬≤ (% explained)':<25} {r_squared_all*100:<12.1f}% {r_squared_no_static*100:<12.1f}% {r_squared_clean*100:<12.1f}% {r_squared_clean*100-r_squared_all*100:+.1f}%")
print(f"{'Mass (kg)':<25} {slope_all:<12.3f} {slope_no_static:<12.3f} {slope_clean:<12.3f} {slope_clean-slope_all:+.3f}")
print(f"{'Friction (N)':<25} {intercept_all:<12.3f} {intercept_no_static:<12.3f} {intercept_clean:<12.3f} {intercept_clean-intercept_all:+.3f}")

print('\n' + '-' * 60)
print('\n' + '-' * 60)
print('KEY INSIGHT:')
print('-' * 60)
total_improvement = r_squared_clean - r_squared_all
print(f'By cleaning the data, we improved R¬≤ by {total_improvement:.4f} ({total_improvement*100:.1f} percentage points).')
print(f'This means our final model explains {total_improvement*100:.1f}% more of the variation than Part 1.')
print(f'Data cleaning successfully improved our analysis!')
total_improvement = r_squared_clean - r_squared_all
print(f"By cleaning the data, we improved R¬≤ by {total_improvement:.4f} ({total_improvement*100:.1f} percentage points).")
print(f"This means our final model explains {total_improvement*100:.1f}% more of the variation than the baseline.")
print(f"The data cleaning process successfully removed problematic points and improved our analysis!")

---

## üìà Final Visualization

Let's create a comprehensive plot showing all data, removed points, and our final best-fit line.


In [None]:
# Create comprehensive visualization showing the data cleaning process
plt.figure(figsize=(10, 6))

# Plot all original data (gray, small)
plt.scatter(df['Acceleration'], df['Force'], color='lightgray', s=50, alpha=0.5, label='All Original Data', zorder=1)

# Highlight the a=0 point that was removed (static friction)
plt.scatter(df.iloc[0]['Acceleration'], df.iloc[0]['Force'], 
           color='orange', s=200, marker='x', linewidths=3, label='Removed: Static Friction (a=0)', zorder=3)

# Highlight any middle outliers that were removed
if len(outlier_indices) > 0:
    for idx in outlier_indices:
        plt.scatter(df_no_static.loc[idx, 'Acceleration'], df_no_static.loc[idx, 'Force'],
                   color='red', s=200, marker='x', linewidths=3, zorder=3)
    if len(outlier_indices) == 1:
        plt.scatter([], [], color='red', s=200, marker='x', linewidths=3, label='Removed: Outlier (2œÉ rule)')

# Plot clean data used in final regression (blue, large)
plt.scatter(X_clean, Y_clean, color='darkblue', s=150, alpha=0.8, 
           label=f'Clean Data (n={len(df_clean)})', zorder=2, edgecolors='black', linewidths=1)

# Plot final best-fit line
Y_fit_clean = slope_clean * X_clean + intercept_clean
plt.plot(X_clean, Y_fit_clean, color='red', linestyle='-', linewidth=2.5, 
         label=f'Final Best Fit (R¬≤ = {r_squared_clean:.4f})', zorder=2)

# Add labels and title
plt.title('Final Analysis: Clean Data with Best Fit Line', fontsize=14, fontweight='bold')
plt.xlabel('Acceleration (a in $m/s^2$)', fontsize=12)
plt.ylabel('Applied Force (F in N)', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend(loc='lower right', fontsize=10)
plt.tight_layout()
plt.show()

print(f'\nFinal Analysis Summary:')
print(f'  ‚Ä¢ Started with {len(df)} data points')
print(f'  ‚Ä¢ Removed 1 static friction point (a=0)')
if len(outlier_indices) > 0:
    print(f'  ‚Ä¢ Removed {len(outlier_indices)} outlier(s) using 2œÉ rule')
print(f'  ‚Ä¢ Final clean dataset: {len(df_clean)} points')
print(f'  ‚Ä¢ Final R¬≤: {r_squared_clean:.4f}')
print(f'  ‚Ä¢ Final Mass: {slope_clean:.3f} kg')
print(f'  ‚Ä¢ Final Friction: {intercept_clean:.3f} N')


---

## ‚úÖ Part 2 Summary

**Great work!** You've learned how to:

1. ‚úÖ Identify problematic data points (static friction, outliers)
2. ‚úÖ Use residuals to find statistical outliers
3. ‚úÖ Make informed decisions about data cleaning
4. ‚úÖ See how data cleaning improves R¬≤ and accuracy

**Key Takeaway:** Careful data analysis and cleaning can significantly improve the quality of your results. This is an essential skill in scientific data analysis!


---

## üîç Comparing Final Results to Theoretical Values

Now let's compare our **final cleaned data results** from Part 2 to the theoretical values we defined at the beginning.

**Percent Error** tells us how far off our experimental value is from the accepted value:

$$\text{Percent Error} = \left| \frac{\text{Experimental Value} - \text{Accepted Value}}{\text{Accepted Value}} \right| \times 100\%$$

We'll compare both mass and friction to see how well our cleaned data matches the theoretical values.


In [None]:
# Compare final clean data results to theoretical values
# Use the theoretical values defined earlier: m_accepted and F_expected

# Experimental values from final clean data regression
experimental_mass_final = slope_clean  # Mass from slope
experimental_friction_final = intercept_clean  # Friction from intercept

# Calculate percent errors for final clean data
mass_error_final = abs((experimental_mass_final - m_accepted) / m_accepted) * 100
friction_error_final = abs((experimental_friction_final - F_expected) / F_expected) * 100

# Also calculate errors for Part 1 and intermediate steps for comparison
mass_error_part1 = abs((slope_all - m_accepted) / m_accepted) * 100
mass_error_no_static = abs((slope_no_static - m_accepted) / m_accepted) * 100

print('=' * 60)
print('PART 2: FINAL CLEAN DATA vs THEORETICAL VALUES')
print('=' * 60)
print(f"\n{'Quantity':<25} {'Theoretical':<15} {'Experimental':<15} {'Percent Error'}")
print('-' * 60)
print(f"{'Mass (kg)':<25} {m_accepted:<15.3f} {experimental_mass_final:<15.3f} {mass_error_final:>15.2f}%")
print(f"{'Friction (N)':<25} {F_expected:<15.3f} {experimental_friction_final:<15.3f} {friction_error_final:>15.2f}%")

print('\n' + '-' * 60)
print('INTERPRETATION:')
print('-' * 60)
print(f"‚Ä¢ Mass Error: {mass_error_final:.2f}% - ", end="")
if mass_error_final < 5:
    print("Excellent! Very close to theoretical value.")
elif mass_error_final < 10:
    print("Good! Reasonably close to theoretical value.")
elif mass_error_final < 20:
    print("Moderate. Some discrepancy from theoretical value.")
else:
    print("Large error. May need further investigation.")

print(f"‚Ä¢ Friction Error: {friction_error_final:.2f}% - ", end="")
if friction_error_final < 10:
    print("Good agreement with expected value.")
elif friction_error_final < 20:
    print("Reasonable agreement.")
else:
    print("Significant difference.")

# Compare improvement from Part 1 to Part 2
print('\n' + '=' * 60)
print('IMPROVEMENT: Part 1 ‚Üí Part 2')
print('=' * 60)
print(f"{'Metric':<25} {'Part 1 Error':<15} {'Part 2 Error':<15} {'Improvement'}")
print('-' * 60)
mass_improvement = mass_error_part1 - mass_error_final
print(f"{'Mass Error (%)':<25} {mass_error_part1:<15.2f}% {mass_error_final:<15.2f}% {mass_improvement:>15.2f}%")

if mass_improvement > 0:
    print(f"\n‚úì Success! Data cleaning reduced mass error by {mass_improvement:.2f} percentage points.")
elif mass_improvement < 0:
    print(f"\n‚ö† Note: Mass error increased by {abs(mass_improvement):.2f} percentage points.")
    print("   This might indicate that some removed points were actually valid.")
else:
    print(f"\n‚Üí No change in mass error.")

print('=' * 60)


---

## üìö PLANNED USE: Integrating into Middle/High School Curriculum

This notebook covers three main learning goals, making it highly adaptable:

* **Statistical Goal:** Understanding **linear regression** (slope, y-intercept), **model fit** ($R^2$), and **quantifying uncertainty** (error analysis, residuals).

* **Code Goal:** **Loading data**, **generating scatter plots**, and using functions to **perform calculations** and **visualize results**.

* **Content Goal:** Applying **Newton's Second Law** and understanding how real-world effects (friction) modify theoretical models.

### Integration Ideas:

* **9th Grade Physics/Physical Science:** Students complete the experiment, plot the raw data in a notebook, and use the regression to find their *own* measured mass and compare it to the known mass of the cart.

* **11th Grade Algebra II/Pre-Calculus:** Students can focus on the regression mathematics, using different data sets to practice identifying independent/dependent variables and interpreting the meaning of the slope and intercept in different contexts.

---

## ‚ú® CONTENT EXTENSIONS: Beyond $\mathbf{F}=m\mathbf{a}$

The computational skills and regression logic used here can be applied to many other linear laws in science and mathematics:

* **Hooke's Law (Springs):** Plotting Force (F) vs. Displacement (x) to find the **Spring Constant (k)**, since $\mathbf{F} = k \cdot \mathbf{x}$. The slope is $k$.

* **Ohm's Law (Circuits):** Plotting Voltage (V) vs. Current (I) to find the **Resistance (R)**, since $\mathbf{V} = \mathbf{I} \cdot \mathbf{R}$. The slope is $R$.

* **Growth Models (Biology/Finance):** Using linear regression on **Time-Series Data** (e.g., population growth vs. time or stock price vs. time) to find the **linear rate of growth** (the slope).

---

## üí° DISPOSITIONS: Developing a Productive Mindset

As you explore computational notebooks, remember these key ideas:

* **It's Okay to Be Uncertain:** Coding and data analysis involve trial and error. If a cell produces an error, read the message and try to fix it. This is how scientists and programmers work!

* **Modify and Test:** The best way to learn is to change the code. For example, change the color of the scatter plot, or change the `accepted_mass` in the final cell and re-run. Test your understanding!

* **Document Your Thinking:** Use the Markdown cells (like this one!) to write down your observations and conclusions. This practice helps both you and anyone else who reviews your work.