## HW 1: Python Basic and Visualization

This notebook has two parts:

1. **Python Fundamentals + NumPy Math** (core programming + numeric computing)
2. **Data Visualization with Matplotlib** (making clear plots)

You’ll see a mix of short explanations, examples you can run, and practice problems.

### How to use this notebook
- Run cells top-to-bottom.
- When you see a code cell containing **`...`**, that is where you should write your code.
- Don't be afraid to use ChatGPT /AI tools to help you along the way. If there's any bits of code you don't understand it can be good practice to copy/paste it into your favorite AI agent and ask it to explain the code to you.




## Setup (run this first)

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Optional: make plots show up a bit larger
plt.rcParams["figure.figsize"] = (7, 4)

print("NumPy version:", np.__version__)

# Part 1: Python Fundamentals + NumPy

In data science you’ll constantly do three things:
1. **Store data** (variables, lists, dictionaries)
2. **Repeat work** (loops, functions)
3. **Compute with arrays** (NumPy)




## 1.1 Variables and basic operations

In [None]:
# Variables store values
x = 10
y = 3

# Basic arithmetic
print("x + y =", x + y)
print("x / y =", x / y)
print("x**2 =", x**2)

# Types matter (int vs float vs str)
print(type(x), type(x / y))

## 1.2 Lists and indexing

In [None]:
names = ["Ada", "Grace", "Katherine", "Alan"]
print(names)
print("First name:", names[0])
print("Last name:", names[-1])

# Slicing: start:end (end not included)
print("Middle two:", names[1:3])

### Problem 1
Create a list called `temps_c` with these temperatures in Celsius:

`[18.0, 19.5, 21.0, 20.0, 22.5]`

Then:
1. Print the 3rd value (index 2)
2. Print the last two values using slicing


In [None]:
# YOUR CODE HERE
...

## 1.3 Dictionaries (key → value)

In [None]:
student = {
    "name": "Jordan",
    "year": 1,
    "major": "Data Science",
    "units": 16
}

print(student["name"])
print("Keys:", list(student.keys()))
print("Values:", list(student.values()))

### Problem 2
Create a dictionary named `movie` with keys:
- `"title"` (a string)
- `"year"` (an integer)
- `"rating"` (a float from 0 to 10)

Then print a sentence like:
`The movie ____ came out in ____ and has rating ____.`


In [None]:
# YOUR CODE HERE
...

## 1.4 Conditionals: if / elif / else

In [None]:
score = 84

if score >= 90:
    grade = "A"
elif score >= 80:
    grade = "B"
elif score >= 70:
    grade = "C"
else:
    grade = "D or F"

print("Grade:", grade)

## 1.5 Loops

In [None]:
# Loop over a list
nums = [2, 4, 6, 8]
total = 0
for n in nums:
    total += n
print("Sum:", total)

# Loop over indices
for i in range(len(nums)):
    print(i, nums[i])

### Problem 3
Write a loop that computes the **mean** of `nums` without using `np.mean`.

Hints:
- Start with `total = 0`
- Add each value to `total`
- Mean = total / number of values


In [None]:
nums = [2, 4, 6, 8]

# YOUR CODE HERE
...

## 1.6 Functions

In [None]:
def inches_to_cm(inches):
    """Convert inches to centimeters."""
    return 2.54 * inches

print(inches_to_cm(10))

### Problem 4
Write a function `fahrenheit_to_celsius(F)` that converts Fahrenheit to Celsius:

$$
C = (F - 32) \times \frac{5}{9}
$$

Test it on `F = 68` (should be 20°C).


In [None]:
# YOUR CODE HERE
def fahrenheit_to_celsius(F):
    ...

print(fahrenheit_to_celsius(68))

# 1.7 NumPy Arrays

NumPy arrays are efficient containers for numbers. They support **vectorized** math, meaning you can do math on the whole array at once.


In [None]:
a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])

print("a:", a)
print("a + b:", a + b)     # elementwise addition
print("a * 2:", a * 2)     # scalar multiplication
print("a**2:", a**2)       # elementwise power

## 1.8 Shapes and 2D arrays

In [None]:
M = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

print(M)
print("shape:", M.shape)   # (rows, cols)
print("First row:", M[0, :])
print("Second column:", M[:, 1])

### Problem 5
Create a 1D NumPy array `x` containing integers from 0 to 19.

Then create a 2D array `X` with shape `(4, 5)` by reshaping `x`.

Print:
- `x`
- `X`
- the shape of `X`


In [None]:
# YOUR CODE HERE
...

## 1.9 Broadcasting (Numpy Matrices)

In [None]:
# Create a column vector (shape (5,1)) and a row vector (shape (1,4))
col = np.arange(5).reshape(-1, 1)
row = np.arange(4).reshape(1, -1)

grid = col + row  # broadcasting makes a 5x4 grid

print("col shape:", col.shape)
print("row shape:", row.shape)
print("grid shape:", grid.shape)
print(grid)

## 1.10 Boolean masks (filtering data)

In [None]:
rng = np.random.default_rng(0)
data = rng.normal(loc=0, scale=1, size=10)  # 10 random numbers

mask = data > 0
print("data:", data)
print("mask:", mask)
print("positive values:", data[mask])

### Problem 6
Generate 1000 random numbers from a normal distribution with mean 50 and standard deviation 10.

Then:
1. Compute the fraction of values that are **greater than 60**
2. Compute the mean of values **between 40 and 60** (inclusive)

Use boolean masks to filter the array.


In [None]:
# YOUR CODE HERE
...

## 1.11 A mini data-science task (z-scores)

A common normalization is the **z-score**:

$$
z = \frac{x - \mu}{\sigma}
$$

where $\mu$ is the mean and $\sigma$ is the standard deviation.

We’ll write a function that converts an array to z-scores.


In [None]:
# Example
vals = np.array([10, 12, 13, 9, 11], dtype=float)
z_example = (vals - vals.mean()) / vals.std()
print("vals:", vals)
print("z-scores:", z_example)

### Practice 7
Complete the function `zscore(x)`.

Requirements:
- Input: a 1D NumPy array of floats
- Output: a NumPy array of z-scores (same shape)

Test your function on the `vals` example above and check it matches the example output.


In [None]:
# YOUR CODE HERE
def zscore(x):
    ...

vals = np.array([10, 12, 13, 9, 11], dtype=float)
print(zscore(vals))

# Part 2: Data Visualization with Matplotlib

Plots are how we visualize data and extract meaningfull patterns from it. Choosing the correct plotting style for your data is a task that takes experience. Your job is to make plots that are:
- correct
- readable
- labeled

In this part you’ll build up from simple plots to subplots and styling.


## 2.1 Your first line plot

In [None]:
x = np.linspace(0, 2*np.pi, 200)
y = np.sin(x)

plt.plot(x, y)
plt.title("Sine wave")
plt.xlabel("x (radians)")
plt.ylabel("sin(x)")
plt.show()

### Problem 8
Plot **both** `sin(x)` and `cos(x)` on the same axes.

Requirements:
- Use `plt.plot(...)` twice (once for sine, once for cosine)
- Add a legend with labels `"sin"` and `"cos"`
- Add axis labels and a title


In [None]:
x = np.linspace(0, 2*np.pi, 200)

# YOUR CODE HERE
...

## 2.2 Scatter plots

In [None]:
rng = np.random.default_rng(1)
n = 200
x = rng.normal(size=n)
y = 2*x + rng.normal(scale=0.5, size=n)

plt.scatter(x, y)
plt.title("Scatter plot example")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

### Problem 9
Make a scatter plot where the points are colored by their distance from the origin:


$$ r = \sqrt{x^2 + y^2} $$

Requirements:
- Compute `r`
- Use `plt.scatter(x, y, c=r)` to color by `r`
- Add a colorbar with `plt.colorbar()`


In [None]:
rng = np.random.default_rng(2)
n = 400
x = rng.normal(size=n)
y = rng.normal(size=n)

# YOUR CODE HERE
...

## 2.3 Histograms (distributions)

In [None]:
rng = np.random.default_rng(3)
samples = rng.normal(loc=0, scale=1, size=5000)

plt.hist(samples, bins=40)
plt.title("Histogram of samples")
plt.xlabel("value")
plt.ylabel("count")
plt.show()

### Problem 10
Make **two histograms** in the same plot for two distributions:
- `A ~ Normal(0, 1)` (mean 0, std 1)
- `B ~ Normal(1, 2)` (mean 1, std 2)

Requirements:
- Use the same bins for both
- Use transparency with `alpha=0.6` so both can be seen
- Add a legend


In [None]:
# YOUR CODE HERE
...

## 2.4 Subplots (multiple plots in one figure)

In [None]:
x = np.linspace(0, 2*np.pi, 300)
fig, ax = plt.subplots(2, 2)

ax[0, 0].plot(x, np.sin(x))
ax[0, 0].set_title("sin")

ax[0, 1].plot(x, np.cos(x))
ax[0, 1].set_title("cos")

ax[1, 0].plot(x, np.sin(2*x))
ax[1, 0].set_title("sin(2x)")

ax[1, 1].plot(x, np.cos(2*x))
ax[1, 1].set_title("cos(2x)")

fig.suptitle("A 2x2 grid of functions")
fig.tight_layout()
plt.show()

### Problem 11
Create a 1x3 row of subplots showing:
1. A line plot of `sin(x)`
2. A scatter plot of `(x, sin(x))` using every 10th point
3. A histogram of `sin(x)` values

Requirements:
- Use `fig, ax = plt.subplots(1, 3, figsize=(12, 3.5))`
- Label each subplot with a title
- Use `fig.tight_layout()` at the end


In [None]:
x = np.linspace(0, 2*np.pi, 400)
y = np.sin(x)

# YOUR CODE HERE
...

## 2.5 Plot annotations and saving figures

In [None]:
x = np.linspace(0, 10, 200)
y = np.exp(-0.2*x) * np.cos(2*x)

plt.plot(x, y)
plt.title("Damped oscillation")

# Annotate a point
idx = np.argmax(y)
plt.scatter([x[idx]], [y[idx]])
plt.annotate("max", xy=(x[idx], y[idx]), xytext=(x[idx]+1, y[idx]+0.2),
             arrowprops=dict(arrowstyle="->"))

plt.xlabel("x")
plt.ylabel("y")
plt.show()

### Problem 12 (mini-project)
Simulate a simple measurement with noise and plot it with error bars.

Steps:
1. Create `x` from 0 to 10 with 30 points
2. Define a true model: `y_true = 3*x + 5`
3. Add noise: `y_obs = y_true + Normal(0, 4)`
4. Assume a constant measurement uncertainty `sigma = 4` for each point
5. Plot:
   - The true line (`y_true`) as a line plot
   - The observations (`y_obs`) with error bars using `plt.errorbar(..., yerr=sigma)`

Add a title, axis labels, and a legend.


In [None]:
# YOUR CODE HERE
...

## Submission Reminder
Don't forget to export the notebook as a PDF and upload to catcourses!