(activity2_live)=

# Activity 2 Live

**2026-01-29**

# Part 1: NumPy and coding environment

Jupyter notebooks allow us to interleave code, text, math equations, and plots all in a single document. The content is organized into **cells**, which can either be a Python code cell or a markdown cell. Markdown is a lightweight markup language that allows us to write text with formatting instructions.

:::{warning}

Unlike typical Python scripts, Jupyter notebooks execute code on a per-cell basis. This means that:

- you can run a cell multiple times
- it is possible for a cell that is "below" another cell to use variables defined in the cell above it

Please be careful about this! A good practice is to make sure that the cells are always ordered from top to bottom.
:::

You can **double-click** a cell to edit it. You can also **run** a cell by clicking the **play** button in the toolbar above, or by pressing `Ctrl+Enter`.


In [3]:
print("hello, world!")

hello, world!


In [5]:
# standard way to import numpy
import numpy as np

# create a numpy array from a list
a = np.array([1,2,3,4])
print(a)
type(a)

[1 2 3 4]


numpy.ndarray

Below is a cell that creates a numpy array representing our housing data:

| $x_1$: sq ft | $x_2$: # bedrooms | $x_3$: yr built | $y$: price |
|---------------|-------------------|------------------|-------------|
| 2500          | 3                 | 1999             | $500,000    |
| 4000          | 5                 | 1950             | $1,000,000  |
| 1000          | 2                 | 1980             | $250,000    |
| 900           | 2                 | 2010             | $300,000    |

In [6]:
# create a 2 dimensional numpy array representing our housing data
data = np.array([
    #  x1,  x2,   x3,  y
    [2500,   3, 1999, 500000],
    [4000,   5, 1950, 1000000],
    [1000,   2, 1980, 250000],
    [900,    2, 2010, 300000]
])

In [8]:
# We can examine the "shape" of the array, which is the number of rows and columns
data.shape[0]

4

In [9]:
# To access a specific element, we use square bracket indexing for each dimension: row, column
# Example: access the cell that indicates year built: 1950
data[1, 2]

np.int64(1950)

In [10]:
# We use the `:` operator to access a range of elements
# Example 1: access the first row 
data[0, :]

array([  2500,      3,   1999, 500000])

In [11]:
# Example 2: access the x2 column
data[:, 1]

array([3, 5, 2, 2])

In [12]:
# We can also access a range of rows and columns by specifying a "slice" index
# Syntax: start_index:end_index

# Example: Access the "X" portion of the data by selecting all rows and the first three columns
X = data[:, 0:3]
X

array([[2500,    3, 1999],
       [4000,    5, 1950],
       [1000,    2, 1980],
       [ 900,    2, 2010]])

Checkpoint: What is line of code would we write to access the `y` column?

**Your response**: https://pollev.com/tliu

In [15]:
y = data[:, 3]
y

array([ 500000, 1000000,  250000,  300000])

---

# Part 2: Derivatives and LaTeX

Juypter notebooks and markdown cells also support LaTeX, which is a powerful language for writing math equations. These sections are demarcated by dollar signs, which can be used to write inline equations like $f(x) = x^2$.

We can also use LaTeX to write display equations using double dollar signs `$$`, which are centered and on a new line:

$$
f(x_1, x_2) = 4x_1^2 + 2x_2 + 1
$$

:::{tip}

Double click a markdown cell in any of the assignments to see the LaTeX code that generates it.

:::

Superscripts are generated using `^`: $$x^2$$

Subscripts are generated using `_`: $$x_1$$

The command to generate the curly d for partial derivatives is `\partial`: $$\partial$$

The command for fractions is `\frac{<numerator>}{<denominator>}`: $$\frac{1}{2}$$

Checkpoint: replace the TODO in the latex cell below with the partial derivative of $f$ with respect to $x_1$:

$$
\frac{\partial}{\partial x_1} f(x_1, x_2) = 8x_1
$$

---

# Part 3: scikit-learn

We'll implement the "mean" regression model from scratch, having it extend the `BaseEstimator` class in scikit-learn.

In [14]:
from sklearn.base import BaseEstimator

# TODO In Python, we can inherit from a class by specifying the parent class in parentheses
class MeanRegressor(BaseEstimator):
    """Simple model that predicts the mean of the training data."""

    # constructors in Python are defined using the `__init__` method
    # A quirk of Python OOP: the first argument is always `self`, which refers to the object itself
    def __init__(self):
        pass


    # fit method trains the model on the given data, and always takes X and y as arguments
    def fit(self, X, y):
        """Fits the mean regressor to the training data.

        Args:
            X: the data examples of shape (n, p)
            y: the answers vector of shape (n,)

        Returns:
            self: the fitted model
        """

        # TODO store the mean prediction
        # fitted model parameters are stored in `self` as instance variables and suffixed with `_`
        self.mean_ = np.mean(y)

        # TODO As convention, sklearn fit() methods return self
        return self

    # predict method makes predictions on new data, and always takes X as an argument
    def predict(self, X):
        """Predicts the values for new points X.

        This model will only predict the mean value of the fitted data for all new points.

        Args:
            X: the new points of shape (n_new, p)

        Returns:
            the predicted values of shape (n_new,)
        """
        
        predictions = []

        # TODO this loops over the rows of X
        for x in X:
            predictions.append(self.mean_) 

        # TODO return the mean prediction for all new points
        return np.array(predictions)


In [17]:
# Create a new model
model = MeanRegressor()

# Fit the model to the data
model.fit(X, y)

# Predict the value of a new point
X_new = np.array([
    [2000, 1, 2015]
])

model.predict(X_new)

array([512500.])