# Session 2 — Working with Jupyter, Python, and ML Thinking

**Focus of this notebook:** how ML work is done day-to-day in a Jupyter notebook (interactive exploration), not model training yet.

**What this session is _not_:**
- Not “production Python”
- Not the full end-to-end ML workflow
- Not a deep dive into libraries

**What this session _is_:**
- Understanding how code cells work
- Understanding notebook state (powerful *and* dangerous)
- Practicing the core habit: **inspect → clean → sanity-check**


## 1) What is a Jupyter Notebook?

A Jupyter notebook is a document made of **cells**:
- **Markdown cells**: explanations, notes, instructions
- **Code cells**: executable Python

Key idea (many beginners miss this):

> **Execution order is the order you run cells, not the order they appear on the page.**

A notebook runs on a **kernel**: a live Python process that keeps variables in memory until you restart it.


## 2) The “Hidden State” Problem (and Feature)

A notebook remembers variables you created earlier. This is convenient for exploration, but it can also produce confusing bugs.

We’ll intentionally demonstrate this.


In [None]:
# Run this cell
x = 10

In [None]:
# Run this cell after the one above
print(x)

### Now break it on purpose

1. Click **Kernel → Restart Kernel** (or **Restart** in the UI)
2. Run only the next cell (without re-running the earlier cell)

You should get a `NameError`. That's the point.

**Teaching point:** ML results can be wrong simply because your notebook state is not what you think it is.


In [None]:
# After restarting the kernel, run ONLY this cell
print(x)

## 3) Python’s role in ML (practical view)

In real ML work, Python is mostly used for:
- loading data
- cleaning and transforming it
- running quick sanity checks
- iterating fast

It is much less about fancy syntax and much more about **small, correct transformations**.


## 4) A tiny “dataset” we can reason about

We'll use a deliberately small dataset so you can see what's happening.

Real datasets are larger, but the same thinking applies.


In [None]:
data = [10, 12, None, 15, 1000, 14, None, 13]
data

## 5) First ML habit: inspect before acting

Before we “do something”:
- How many values do we have?
- Are there missing values?
- Are there suspicious values?

(Do not rush to “model training”.)


In [None]:
len(data)

In [None]:
# A quick look at the raw data
data

## 6) Cleaning data (boring on purpose)

Most ML work is *this*.

We'll remove missing values (`None`). There are other approaches (imputation), but this is enough to demonstrate the workflow.


In [None]:
cleaned = [x for x in data if x is not None]
cleaned

In [None]:
len(data), len(cleaned)

## 7) Sanity-checking with a simple plot

We are not “making charts”. We are **asking a question**:

> “Does anything look obviously wrong?”


In [None]:
import matplotlib.pyplot as plt

plt.plot(cleaned)
plt.show()

### Interpretation questions (write your answers)

- Does the scale look reasonable?
- Is there an outlier?
- What would a model do with a value like `1000`?
- If you remove it, what assumption are you making?


## 8) Notebooks vs. “real world” code

Notebooks are excellent for:
- exploration
- learning
- prototyping
- debugging assumptions

Notebooks are risky for:
- production pipelines
- reproducibility without discipline
- collaboration (diffs can be messy)

This is why ML workflows often start in notebooks and later move to scripts/pipelines.


## 9) How today maps to the ML process

| What we did | ML meaning |
|---|---|
| Created a small dataset | Data acquisition (toy example) |
| Inspected length and values | Early data understanding |
| Cleaned missing values | Data preparation |
| Plotted the cleaned data | Exploratory sanity check |
| Experienced `NameError` after restart | Reproducibility / hidden-state risk |

You are building the habit: **inspect → transform → sanity-check**.


## 10) What’s next (next session)

Next session we move from exploration to a **repeatable workflow**:
- train/test split
- evaluation metrics (and why accuracy can mislead)
- avoiding “accidental cheating” (data leakage)
- first end-to-end model training (carefully)

> Next time, we stop trusting ourselves and start validating.
