In [1]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

### ðŸ§­ CONTEXT TRACKER
* **TOPIC:** Reproducible Random Pandas Series Generation
* **COMPLEXITY:** Low
* **KEY TAKEAWAY:** Generates a fixed set of random integers mapped to a custom string index using modern NumPy practices.
---

### Layer 1: The Executive Summary (TL;DR)
This code generates a Pandas Series representing 10 random "test scores" (between 70 and 100) assigned to a specific sequence of school months (September to June). It uses a fixed "seed" so the random numbers are identical every time you run the cell.

### Layer 2: The Mental Model (Analogy)
Imagine a teacher who needs to create a dummy gradebook for a presentation.
1.  **`default_rng(0)`**: She buys a specific "magic dice set" (Seed 0) that always rolls the exact same sequence of numbers when thrown.
2.  **`months`**: She writes the names of the months on the chalkboard.
3.  **`g.integers(...)`**: She rolls her magic dice 10 times to get scores between 70 and 100.
4.  **`Series(...)`**: She glues each score next to a month on the board, creating a single lookup table.

### Layer 3: The Autopsy (Line-by-Line)
*   `g = np.random.default_rng(0)`:
    *   **`default_rng`**: The modern "Generator" method for random numbers (replaces the old `np.random.seed`).
    *   **`0`**: The seed. This ensures your "random" numbers are predictable and reproducible for others sharing your notebook.
*   `months = 'Sep ... Jun'.split()`:
    *   **`.split()`**: A Python trick. It takes a single long string and chops it at every space, turning it into a list: `['Sep', 'Oct', ...]`. It saves you from typing quotes and commas 10 times.
*   `s = Series(...)`:
    *   **`g.integers(70, 101, 10)`**: Generates **10** integers.
        *   **`70`**: The minimum value (inclusive).
        *   **`101`**: The maximum value (exclusive). This is why it stops at 100.
    *   **`index=months`**: Instead of default numbers (0, 1, 2...), we use our custom month names as the row labels.

### Layer 4: The Architect's View (Context)
*   **Modern vs. Legacy**: The author uses `default_rng()` instead of the global `np.random.seed()`. This is the "new" (NumPy 1.17+) standard. It is safer because it creates an isolated random generator instance (`g`) rather than affecting the global state of the entire notebook.
*   **The "High" Trap**: A common bug in random integers is the "off-by-one" error. Notice `101` is used to allow the value `100` to appear. If they wrote `100`, the max score would be `99`.
*   **Data Alignment**: The code implicitly relies on the fact that the number of months (10) exactly matches the number of integers generated (10). If these counts mismatched, Pandas would throw a `ValueError`.

In [2]:

g = np.random.default_rng(0)
#.split(): A Python trick. It takes a single long string and chops it at every space, turning it into a list: ['Sep', 'Oct', ...]. It saves you from typing quotes and commas 10 times.
months = 'Sep Oct Nov Dec Jan Feb Mar Apr May Jun'.split()


s = Series(g.integers(70, 101, 10),
          index=months)
s

Sep    96
Oct    89
Nov    85
Dec    78
Jan    79
Feb    71
Mar    72
Apr    70
May    75
Jun    95
dtype: int64

### ðŸ§­ CONTEXT TRACKER
* **TOPIC:** Basic Series Aggregation (Mean)
* **COMPLEXITY:** Low
* **KEY TAKEAWAY:** Calculates the arithmetic average of a Series and embeds it directly into a printable string.
---

### Layer 1: The Executive Summary (TL;DR)
This line calculates the average (mean) of all the values in the Series `s` and prints it out in a readable sentence. It combines a mathematical operation with text formatting in a single step.

### Layer 2: The Mental Model (Analogy)
Think of a calculator tape.
1.  **`s.mean()`**: You hit the "Average" button on the calculator, which instantly sums all the numbers and divides by the count.
2.  **`f'...'`**: You take a sticky note and write "Entire year average: " on it.
3.  **`{...}`**: You write the number from the calculator screen onto the sticky note right where the brackets are.

### Layer 3: The Autopsy (Line-by-Line)
*   **`s.mean()`**:
    *   This is a native Pandas method. It is highly optimized (written in C) and much faster than writing a Python loop to sum and divide.
    *   It automatically handles missing data (`NaN`) by ignoring it, unlike standard Python math which might crash or return `NaN`.
*   **`f'...'` (f-string)**:
    *   The `f` before the quote tells Python: "Expect executable code inside the curly braces `{}`."
    *   It evaluates `s.mean()` first, converts the resulting number to a string, and inserts it into the text.

### Layer 4: The Architect's View (Context)
*   **Type Conversion**: Even though the original Series `s` contained integers (70-100), the `.mean()` operation almost always returns a **float** (decimal number).
*   **Vectorization**: This operation is "vectorized." It doesn't look at items one by one in Python; it hands the whole array to the CPU to process in a batch. This is the secret sauce of Pandas/NumPy speed.

In [3]:
print(f'Entire year average: {s.mean()}')

Entire year average: 81.0


### ðŸ§­ CONTEXT TRACKER
* **TOPIC:** Label-Based Slicing & Comparative Analysis
* **COMPLEXITY:** Low
* **KEY TAKEAWAY:** Uses label-based slicing (inclusive) to subset data and calculate performance metrics.
---

### Layer 1: The Executive Summary (TL;DR)
This block splits the school year into two semesters ("Sep-Jan" and "Feb-Jun") by slicing the data based on the month names. It calculates the average score for each semester separately and then subtracts the first from the second to quantify the student's "improvement."

### Layer 2: The Mental Model (Analogy)
Think of taking a physical report card and cutting it in half with scissors.
1.  **`s['Sep':'Jan']`**: You cut the list right after "Jan" and take the top strip.
2.  **`.mean()`**: You calculate the GPA for just that top strip.
3.  **`second - first`**: You compare the GPA of the bottom strip to the top strip to see if the trend is going up or down.

### Layer 3: The Autopsy (Line-by-Line)
*   `first_half_average = s['Sep':'Jan'].mean()`:
    *   **`['Sep':'Jan']`**: This is **Label Slicing**. Unlike standard Python lists (where the end is excluded), Pandas label slicing is **inclusive**. It includes 'Jan'.
    *   **`.mean()`**: Collapses those 5 months into a single number.
*   `second_half_average = s['Feb':'Jun'].mean()`:
    *   Same logic. It grabs the range starting at 'Feb' and ending at (and including) 'Jun'.
*   `print(f'Improvement: {second_half_average - first_half_average}')`:
    *   Calculates the delta. A positive number means the student got better; a negative number means they slacked off.

### Layer 4: The Architect's View (Context)
*   **The "Inclusive" Gotcha**: This is the most dangerous part of Pandas for beginners. In standard Python `list[0:5]`, index 5 is *excluded*. In Pandas `series['a':'f']`, label 'f' is *included*. If you treated this like a Python list, you might be surprised that 'Jan' is part of the first average.
*   **Order Matters**: This slicing relies entirely on the **positional order** of the index. If the index were sorted alphabetically (Apr, Aug, Dec...), slicing from 'Sep' to 'Jan' would return a completely different (and likely nonsensical) set of months.

In [None]:
first_half_average = s['Sep':'Jan'].mean()
second_half_average = s['Feb':'Jun'].mean()

print(f'Yearly average: {s.mean()}')
print(f'First half average: {first_half_average}')
print(f'Second half average: {second_half_average}')
print(f'Improvement: {second_half_average - first_half_average}')

Yearly average: 81.0
First half average: 85.4
Second half average: 76.6
Improvement: -8.800000000000011
