# Introduction to Testing in Python

In professional software development, testing is a fundamental practice. Especially when using Test-Driven Development (TDD). In TDD, developers often start by writing tests that define how the code should behave before the actual implementation begins. This approach not only helps ensure the code functions correctly but also promotes thoughtful design and helps catch edge cases early.


As a new developer, it's common to be handed a set of tests and be asked to write code that passes them. This reverses the typical approach of "write code, then test it" and reinforces writing reliable, maintainable code from the start.

In this section, we’ll explore three key approaches to testing in Python:

- Testing with pandas: Since many data workflows use pandas, we’ll cover how to test dataframes and common patterns for validating data outputs.

- unittest: Python’s built-in testing framework, widely used for creating structured, class-based test cases.

- pytest: A powerful and more flexible testing tool that supports simpler syntax, fixtures, and detailed failure reporting.


By understanding these tools and techniques, you'll be better prepared to work in real-world codebases, where automated testing is essential for ensuring software quality and reliability.

---


### **Table of Contents**
* [Tests using Pandas Tests](#tests-using-pandas-tests)
  * [What This Code Does Overall](#what-this-code-does-overall)
  * [Function-by-Function](#function-by-function)
* [Summary](#summary)
* [Testing Time Using Pandas Testing](#testing-time-using-pandas-testing)
  * [Defining the test data generator function](#defining-the-test-data-generator-function)
  * [Demonstrating `prepare_dataframe_for_melt`](#demonstrating-prepare_dataframe_for_melt)
  * [Demonstrating `unpivot_classes_to_rows`](#demonstrating-unpivot_classes_to_rows)
  * [Demonstrating `_explode_grades_list`](#demonstrating-_explode_grades_list)
  * [Demonstrating `add_assignment_numbers`](#demonstrating-add_assignment_numbers)
* [A look at different tests](#a-look-at-different-tests)
* [Unit testing](#unit-testing)
  * [What Are Unit Tests?](#what-are-unit-tests)
  * [This first test will fail](#this-first-test-will-fail)
  * [Why the Tests Failed (Short + Clear)](#why-the-tests-failed-short--clear)
  * [Why This Matters in Development](#why-this-matters-in-development)
  * [We have 3 things to fix and now have to edit the function to no longer break.](#we-have-3-things-to-fix-and-now-have-to-edit-the-function-to-no-longer-break)
  * [**Fix 1:** dtype mismatch in test_basic_explosion](#fix-1-dtype-mismatch-in-test_basic_explosion)
  * [Fix 2: Error message mismatch in test_missing_grades_column_raises_error](#fix-2-error-message-mismatch-in-test_missing_grades_column_raises_error)
  * [Fix 3: test_with_empty_list unexpected row](#fix-3-test_with_empty_list-unexpected-row)
* [corrected function example](#corrected-function-example)
* [working Unit test](#working-unit-test)
* [pytest](#pytest)
  * [How pytest Differs from Other Testing Approaches](#how-pytest-differs-from-other-testing-approaches)
  * [Why pytest Needs a Separate File](#why-pytest-needs-a-separate-file)
  * [How We Got pytest Working in a Notebook](#how-we-got-pytest-working-in-a-notebook)

# Tests using Pandas Tests

We're going to dive into unit testing. Unit tests are small, focused tests that verify individual parts of your code (often functions or methods) to make sure they behave as expected.

[Documentation For Functions](https://pandas.pydata.org/docs/reference/testing.html)

In [20]:
# ruff: noqa
import random
import pandas as pd
import pandas.testing as pd_testing


#### Making our Data
In this example, we're setting up some fake test data. We're using the `random` module to generate different grades each time the function runs. This simulates a dynamic data environment. Similar to what you'd see in production, rather than relying on a fixed dataset. Testing with changing data helps ensure our code is flexible, not just tuned to pass one static scenario.

In [21]:
def generate_gradebook_dataset():
    """
    Generates a dataset of student grades for multiple classes.

    The dataset includes 6 classes, 10 students per class, and 10
    randomly generated grades (between 60 and 100) for each student
    in each class.

    Returns:
        dict: A nested dictionary representing the gradebook.
              Format: {className: {studentName: [grades]}}
    """
    # Set Parameters for the data
    NUM_STUDENTS = 10
    NUM_CLASSES = 6
    NUM_ASSIGNMENTS = 10
    MIN_GRADE = 60
    MAX_GRADE = 100

    # Set The student Names
    student_names = [
        "Liam Smith", "Olivia Johnson", "Noah Williams", "Emma Brown",
        "Oliver Jones", "Ava Garcia", "Elijah Miller", "Sophia Davis",
        "James Rodriguez", "Isabella Martinez"]

    # List of class subjects
    class_names = [
        "Algebra II", "American Literature", "Biology",
        "World History", "Chemistry", "Art History"]

    # Define deict
    school_grades = {}

    print("Generating gradebook dataset...")

    # Iterate through each class
    for class_name in class_names:
        class_grades = {}
        for student_name in student_names:
            assignment_grades = [random.randint(MIN_GRADE, MAX_GRADE) for _ in range(NUM_ASSIGNMENTS)]
            class_grades[student_name] = assignment_grades

        school_grades[class_name] = class_grades

    print("Dataset generation complete.")
    return school_grades


In [22]:
gradebook = generate_gradebook_dataset()
grades = pd.DataFrame(gradebook)
grades

Generating gradebook dataset...
Dataset generation complete.


Unnamed: 0,Algebra II,American Literature,Biology,World History,Chemistry,Art History
Liam Smith,"[97, 72, 74, 92, 72, 70, 81, 82, 66, 96]","[95, 66, 95, 94, 62, 66, 100, 73, 80, 73]","[82, 95, 62, 65, 62, 96, 91, 90, 81, 61]","[87, 92, 91, 63, 68, 75, 81, 80, 77, 82]","[78, 68, 86, 78, 81, 66, 99, 66, 65, 60]","[67, 97, 80, 90, 68, 72, 63, 81, 98, 79]"
Olivia Johnson,"[94, 95, 72, 84, 70, 100, 71, 72, 92, 88]","[100, 63, 68, 72, 95, 89, 69, 98, 63, 87]","[78, 71, 74, 92, 77, 68, 69, 65, 97, 72]","[90, 74, 73, 77, 80, 73, 100, 94, 63, 70]","[67, 95, 81, 96, 94, 79, 87, 79, 64, 75]","[79, 79, 89, 92, 70, 100, 78, 79, 75, 65]"
Noah Williams,"[92, 100, 67, 61, 64, 93, 65, 99, 78, 66]","[73, 66, 88, 92, 74, 66, 87, 96, 77, 71]","[96, 79, 97, 65, 61, 64, 66, 64, 100, 98]","[71, 83, 84, 76, 76, 74, 76, 74, 73, 63]","[75, 71, 84, 86, 84, 100, 60, 80, 82, 96]","[75, 65, 98, 69, 68, 88, 86, 61, 78, 89]"
Emma Brown,"[62, 95, 80, 70, 63, 70, 72, 97, 91, 80]","[73, 98, 87, 96, 86, 83, 92, 91, 68, 65]","[93, 68, 83, 92, 70, 99, 71, 95, 67, 93]","[73, 93, 72, 61, 88, 81, 93, 60, 85, 70]","[86, 94, 85, 76, 80, 86, 88, 72, 61, 89]","[99, 83, 66, 83, 90, 79, 82, 63, 60, 66]"
Oliver Jones,"[93, 68, 61, 76, 85, 96, 71, 78, 80, 89]","[72, 71, 72, 99, 63, 63, 63, 91, 61, 76]","[68, 88, 87, 63, 95, 87, 91, 77, 76, 77]","[94, 89, 78, 63, 92, 70, 60, 98, 85, 63]","[83, 92, 60, 85, 61, 67, 97, 71, 74, 100]","[69, 93, 68, 74, 98, 81, 77, 92, 83, 91]"
Ava Garcia,"[91, 60, 67, 68, 72, 64, 90, 86, 61, 95]","[90, 78, 81, 95, 80, 92, 79, 80, 93, 70]","[75, 78, 77, 93, 75, 78, 67, 60, 88, 69]","[77, 79, 97, 75, 98, 90, 91, 76, 95, 81]","[69, 63, 71, 61, 75, 93, 83, 89, 91, 75]","[72, 72, 99, 78, 89, 88, 90, 93, 84, 84]"
Elijah Miller,"[63, 80, 82, 72, 66, 70, 75, 96, 77, 100]","[99, 63, 84, 71, 99, 79, 68, 60, 95, 87]","[78, 69, 81, 99, 78, 69, 68, 96, 90, 98]","[99, 84, 89, 88, 66, 73, 63, 65, 85, 99]","[89, 72, 71, 66, 70, 96, 97, 64, 68, 61]","[75, 85, 100, 90, 76, 91, 92, 96, 78, 92]"
Sophia Davis,"[69, 99, 70, 66, 95, 64, 62, 84, 89, 88]","[88, 87, 69, 78, 93, 76, 69, 95, 100, 90]","[78, 77, 64, 65, 86, 98, 99, 85, 80, 61]","[62, 71, 92, 97, 75, 81, 93, 61, 87, 62]","[60, 63, 99, 95, 74, 98, 98, 89, 87, 96]","[87, 71, 72, 79, 86, 81, 76, 90, 60, 93]"
James Rodriguez,"[78, 99, 89, 69, 65, 62, 65, 98, 97, 78]","[65, 86, 69, 87, 93, 92, 98, 90, 89, 83]","[74, 87, 82, 99, 62, 74, 68, 60, 82, 71]","[83, 77, 81, 75, 89, 68, 64, 71, 79, 87]","[64, 65, 63, 100, 80, 81, 69, 69, 97, 88]","[72, 62, 81, 60, 78, 88, 66, 89, 71, 65]"
Isabella Martinez,"[81, 65, 87, 85, 69, 71, 98, 60, 74, 63]","[77, 60, 91, 71, 90, 68, 79, 78, 68, 60]","[75, 93, 86, 69, 82, 98, 87, 60, 96, 88]","[81, 93, 83, 91, 62, 66, 93, 60, 94, 60]","[60, 82, 61, 93, 75, 100, 77, 69, 96, 81]","[87, 70, 88, 62, 83, 97, 63, 67, 90, 79]"


## What This Code Does Overall

This code transforms a wide-format(list grades) DataFrame to a more usable dataframe. Where student names are in the index, columns are class names, and each cell contains a list of grades—into a long-format(tidy) DataFrame. In the final output, each row represents a single grade with associated metadata: student name, class name, and assignment number.


## Function-by-Function

- **prepare_dataframe_for_melt(df)**: Resets the index and turns student names into a column called "Student Name" to prepare for reshaping.

- **unpivot_classes_to_rows(df)**: Converts class columns into rows so that each row represents a student/class pairing, with all grades still bundled in a list.

- **explode_grades_list(df)**: Breaks apart the list of grades so that each individual grade becomes its own row.

- **add_assignment_numbers(df)**: Adds a sequential "Assignment" label (e.g., "Assignment 1", "Assignment 2") for each grade within a student/class group.

- **finalize_columns(df)**: Selects and orders the final columns: "Student Name", "Class Name", "Assignment", and "Grade".

- **transform_df_to_long_format(grades_df)**: Orchestrates the entire transformation by calling the steps above in sequence and returning the final tidy DataFrame.

In [23]:
def prepare_dataframe_for_melt(df: pd.DataFrame) -> pd.DataFrame:
    """sets the index to student names column."""
    return df.reset_index().rename(columns={'index': 'Student Name'})

def unpivot_classes_to_rows(df: pd.DataFrame) -> pd.DataFrame:
    """Unpivots the DataFrame, turning class columns into rows."""
    return df.melt(
        id_vars=['Student Name'],
        var_name='Class Name',
        value_name='Grades'
    )

def explode_grades_list(df: pd.DataFrame) -> pd.DataFrame:
    """Explodes the list in the 'Grades' column into separate rows for each grade."""
    return df.explode('Grades').rename(columns={'Grades': 'Grade'})

def add_assignment_numbers(df: pd.DataFrame) -> pd.DataFrame:
    """Adds a numbered 'Assignment' column based on the grade's order."""
    df_with_assignments = df.assign(
        Assignment='Assignment ' + (df.groupby(['Student Name', 'Class Name']).cumcount() + 1).astype(str)
    )
    return df_with_assignments

def finalize_columns(df: pd.DataFrame) -> pd.DataFrame:
    """Selects and reorders columns to the final desired format."""
    return df[['Student Name', 'Class Name', 'Assignment', 'Grade']].reset_index(drop=True)


def transform_df_to_long_format(grades_df: pd.DataFrame) -> pd.DataFrame:
    """
    Transforms a "wide" format DataFrame into a "long" or "tidy" format
    by executing a pipeline of data cleaning steps.

    Args:
        grades_df: A DataFrame where the index contains student names,
                   columns are class names, and values are lists of grades.

    Returns:
        A tidy DataFrame with a row for each individual grade.
    """
    # This is our data transformation pipeline. Each function is a distinct step.
    prepared_df = prepare_dataframe_for_melt(grades_df)
    unpivoted_df = unpivot_classes_to_rows(prepared_df)
    exploded_df = explode_grades_list(unpivoted_df)
    df_with_assignments = add_assignment_numbers(exploded_df)
    final_df = finalize_columns(df_with_assignments)

    return final_df

In [24]:
gradebook = transform_df_to_long_format(grades)
gradebook

Unnamed: 0,Student Name,Class Name,Assignment,Grade
0,Liam Smith,Algebra II,Assignment 1,97
1,Liam Smith,Algebra II,Assignment 2,72
2,Liam Smith,Algebra II,Assignment 3,74
3,Liam Smith,Algebra II,Assignment 4,92
4,Liam Smith,Algebra II,Assignment 5,72
...,...,...,...,...
595,Isabella Martinez,Art History,Assignment 6,97
596,Isabella Martinez,Art History,Assignment 7,63
597,Isabella Martinez,Art History,Assignment 8,67
598,Isabella Martinez,Art History,Assignment 9,90


# Summary

This setup is similar to many of the capstone or personal projects you've likely worked on. While we can see the code running and producing the expected output, it's important to go a step further. Writing tests helps ensure our code behaves as intended—and can catch bugs or edge cases we might overlook.

In the sections below, we'll write some unit tests and walk through what they do and how they work.



---



---


---



# Testing Time Using Pandas Testing

## Defining the test data generator function
- we make a small set of data to test the functions with.



In [25]:
print("--- Defining the test data generator function ---")

def initial_wide_df() -> pd.DataFrame:  
    """Provides the initial 'wide' DataFrame for testing."""
    return pd.DataFrame({
        "Biology": {"Liam Smith": [85, 92], "Olivia Johnson": [78, 65]},
        "Algebra II": {"Liam Smith": [95, 88], "Olivia Johnson": [72, 81]}
    })

print("Test data function has been defined.")
print("\n" + "="*50 + "\n")

--- Defining the test data generator function ---
Test data function has been defined.




## Demonstrating `prepare_dataframe_for_melt`
- This demonstration checks if the student names in the index are correctly moved into a 'Student Name' column.


In [26]:
print("Cell 3: Demonstrating `prepare_dataframe_for_melt`  \n")

def demonstrate_prepare_for_melt():
    # ARRANGE 
    # We get our starting data from the generator function.
    initial_df = initial_wide_df()

    # We define the state of the data we expect after the function runs.
    expected_df = pd.DataFrame({
        'Student Name': ['Liam Smith', 'Olivia Johnson'],
        'Biology': [[85, 92], [78, 65]],
        'Algebra II': [[95, 88], [72, 81]]
    })

    # ACT 
    # We run the function we're testing.
    result_df = prepare_dataframe_for_melt(initial_df)

    #  ASSERT (Visual & Actual) 
    print("Demonstration Goal: Check if the index is converted to a 'Student Name' column.")
    print("\nInitial DataFrame:")
    print(initial_df)
    print("\nExpected DataFrame:")
    print(expected_df)
    print("\nResult DataFrame:")
    print(result_df)

    # -------------------------------------------
    # -------- look here \/ ---------------------
    # The actual assertion to confirm correctness
    pd_testing.assert_frame_equal(result_df, expected_df)
    print("\n✅ Demonstration Passed!")

# calling our self-contained demonstration function.
demonstrate_prepare_for_melt()
print("\n" + "="*60 + "\n")

Cell 3: Demonstrating `prepare_dataframe_for_melt`  

Demonstration Goal: Check if the index is converted to a 'Student Name' column.

Initial DataFrame:
                 Biology Algebra II
Liam Smith      [85, 92]   [95, 88]
Olivia Johnson  [78, 65]   [72, 81]

Expected DataFrame:
     Student Name   Biology Algebra II
0      Liam Smith  [85, 92]   [95, 88]
1  Olivia Johnson  [78, 65]   [72, 81]

Result DataFrame:
     Student Name   Biology Algebra II
0      Liam Smith  [85, 92]   [95, 88]
1  Olivia Johnson  [78, 65]   [72, 81]

✅ Demonstration Passed!




## Demonstrating `unpivot_classes_to_rows`
- This demonstration checks if the class columns ("Biology", "Algebra II") are correctly "melted" into rows.

In [27]:
print("--- Cell 4: Demonstrating `unpivot_classes_to_rows` ---\n")

def demonstrate_unpivot_classes_to_rows():
    # ARRANGE 
    # The input for this function is the output of the previous step.
    start_df = prepare_dataframe_for_melt(initial_wide_df())

    expected_df = pd.DataFrame({
        'Student Name': ['Liam Smith', 'Olivia Johnson', 'Liam Smith', 'Olivia Johnson'],
        'Class Name': ['Biology', 'Biology', 'Algebra II', 'Algebra II'],
        'Grades': [[85, 92], [78, 65], [95, 88], [72, 81]]
    })

    # ACT 
    result_df = unpivot_classes_to_rows(start_df)

    # ASSERT (Visual & Actual) 
    print("Demonstration Goal: Check if class columns are turned into rows.")
    print("\nInitial DataFrame:")
    print(start_df)
    print("\nExpected DataFrame:")
    print(expected_df)
    print("\nResult DataFrame:")
    print(result_df)

    # -------------------------------------------
    # --------- look here \/ --------------------
    pd_testing.assert_frame_equal(result_df, expected_df)
    print("\n✅ Demonstration Passed!")

# We call our self-contained demonstration function.
demonstrate_unpivot_classes_to_rows()
print("\n" + "="*60 + "\n")

--- Cell 4: Demonstrating `unpivot_classes_to_rows` ---

Demonstration Goal: Check if class columns are turned into rows.

Initial DataFrame:
     Student Name   Biology Algebra II
0      Liam Smith  [85, 92]   [95, 88]
1  Olivia Johnson  [78, 65]   [72, 81]

Expected DataFrame:
     Student Name  Class Name    Grades
0      Liam Smith     Biology  [85, 92]
1  Olivia Johnson     Biology  [78, 65]
2      Liam Smith  Algebra II  [95, 88]
3  Olivia Johnson  Algebra II  [72, 81]

Result DataFrame:
     Student Name  Class Name    Grades
0      Liam Smith     Biology  [85, 92]
1  Olivia Johnson     Biology  [78, 65]
2      Liam Smith  Algebra II  [95, 88]
3  Olivia Johnson  Algebra II  [72, 81]

✅ Demonstration Passed!





## Demonstrating `_explode_grades_list`
- This demonstration checks if the lists in the 'Grades' column are correctly
"exploded" into separate rows.


In [28]:

print("--- Cell 5: Demonstrating `explode_grades_list` ---\n")

def demonstrate_explode_grades_list():
    # ARRANGE 
    # For this test, it's easier to define the input directly.
    start_df = pd.DataFrame({
        'Student Name': ['Liam Smith', 'Olivia Johnson'],
        'Class Name': ['Biology', 'Biology'],
        'Grades': [[85, 92], [78, 65]]
    })
    expected_df = pd.DataFrame({
        'Student Name': ['Liam Smith', 'Liam Smith', 'Olivia Johnson', 'Olivia Johnson'],
        'Class Name': ['Biology', 'Biology', 'Biology', 'Biology'],
        'Grade': [85, 92, 78, 65]
    })

    # ACT 
    result_df = explode_grades_list(start_df)
    result_df['Grade'] = pd.to_numeric(result_df['Grade'])

    # ASSERT (Visual & Actual) 
    print("Demonstration Goal: Check if lists of grades are converted into one row per grade.")
    print("\nInitial DataFrame:")
    print(start_df)
    print("\nExpected DataFrame:")
    print(expected_df)
    print("\nResult DataFrame (index reset for comparison):")
    print(result_df.reset_index(drop=True))
    
    # -------------------------------------------
    # ---------- look here \/ -------------------
    pd_testing.assert_frame_equal(result_df.reset_index(drop=True), expected_df)
    print("\n✅ Demonstration Passed!")

demonstrate_explode_grades_list()
print("\n" + "="*80 + "\n")

--- Cell 5: Demonstrating `explode_grades_list` ---

Demonstration Goal: Check if lists of grades are converted into one row per grade.

Initial DataFrame:
     Student Name Class Name    Grades
0      Liam Smith    Biology  [85, 92]
1  Olivia Johnson    Biology  [78, 65]

Expected DataFrame:
     Student Name Class Name  Grade
0      Liam Smith    Biology     85
1      Liam Smith    Biology     92
2  Olivia Johnson    Biology     78
3  Olivia Johnson    Biology     65

Result DataFrame (index reset for comparison):
     Student Name Class Name  Grade
0      Liam Smith    Biology     85
1      Liam Smith    Biology     92
2  Olivia Johnson    Biology     78
3  Olivia Johnson    Biology     65

✅ Demonstration Passed!




## Demonstrating `add_assignment_numbers`
- This demonstration checks that a unique assignment number is correctly calculated for each grade, grouped by student and class.


In [29]:
print("--- Cell 6: Demonstrating `_add_assignment_numbers` ---\n")

def demonstrate_add_assignment_numbers():
    # ARRANGE
    start_df = pd.DataFrame({
        'Student Name': ['Liam Smith', 'Liam Smith', 'Olivia Johnson'],
        'Class Name': ['Biology', 'Biology', 'Biology'],
        'Grade': [85, 92, 78]
    })
    expected_df = start_df.copy()
    expected_df['Assignment'] = ['Assignment 1', 'Assignment 2', 'Assignment 1']

    # ACT
    result_df = add_assignment_numbers(start_df)

    # ASSERT (Visual & Actual)
    print("Demonstration Goal: Check if an 'Assignment' column is added with correct numbering.")
    print("\nInitial DataFrame:")
    print(start_df)
    print("\nExpected DataFrame:")
    print(expected_df)
    print("\nResult DataFrame:")
    print(result_df)
    pd_testing.assert_frame_equal(result_df, expected_df)
    print("\n✅ Demonstration Passed!")

demonstrate_add_assignment_numbers()


--- Cell 6: Demonstrating `_add_assignment_numbers` ---

Demonstration Goal: Check if an 'Assignment' column is added with correct numbering.

Initial DataFrame:
     Student Name Class Name  Grade
0      Liam Smith    Biology     85
1      Liam Smith    Biology     92
2  Olivia Johnson    Biology     78

Expected DataFrame:
     Student Name Class Name  Grade    Assignment
0      Liam Smith    Biology     85  Assignment 1
1      Liam Smith    Biology     92  Assignment 2
2  Olivia Johnson    Biology     78  Assignment 1

Result DataFrame:
     Student Name Class Name  Grade    Assignment
0      Liam Smith    Biology     85  Assignment 1
1      Liam Smith    Biology     92  Assignment 2
2  Olivia Johnson    Biology     78  Assignment 1

✅ Demonstration Passed!


# A look at different tests

First we will look at our function again for reference. We are going to look at the explode_grades_list and go over a few things.

In [30]:
def explode_grades_list(df: pd.DataFrame) -> pd.DataFrame: # noqa: F811
    """Explodes the list in the 'Grades' column into separate rows for each grade."""
    return df.explode('Grades').rename(columns={'Grades': 'Grade'})

In this function, we’ve added a simple safeguard to make sure the DataFrame contains the "Grades" column before we attempt to explode it. If the column is missing, the function raises a KeyError with a clear message.

<br>

This kind of check isn't a formal test using a testing framework like pytest, but it's still a valuable defensive programming technique. It helps make our code more robust in production by catching errors early and providing helpful context when something goes wrong—making debugging faster and easier.

In [31]:
def explode_grades_list(df: pd.DataFrame) -> pd.DataFrame: # noqa: F811
    """Explodes the list in the 'Grades' column into separate rows for each grade."""
    # Added a check to ensure the column exists for more robust error handling
    if 'Grades' not in df.columns:
        raise KeyError("Input DataFrame must contain a 'Grades' column.")
    return df.explode('Grades').rename(columns={'Grades': 'Grade'})

# Unit testing

### What Are Unit Tests?

Unit tests are small, focused tests that verify the behavior of individual functions or components in your code. They're designed to check that your code does what it's supposed to do—especially under different inputs or edge cases—without relying on the rest of the system.

<br>

## This first test will fail

This first test is expected to fail and that's a good thing.

Let’s take a closer look and analyze what’s happening. While the function technically runs and returns a result, the test reveals hidden issues we didn’t account for, such as data type mismatches or unexpected behavior with edge cases.

<br>

### Why the Tests Failed (Short + Clear)

test_basic_explosion:

- The Grade column was expected to be int64, but the actual output had a dtype of object.

- Pandas treats exploded lists as generic objects unless explicitly cast.

test_with_empty_list:

- The function didn’t drop students with empty grade lists, resulting in an extra row with NaN.

test_missing_grades_column_raises_error:

- The test compared the full error string directly, including Python’s automatic quotes.

- This caused a mismatch, even though the raised error was correct.

<br>

### Why This Matters in Development

- Prevents silent bugs: Unit tests catch mismatches, assumptions, and edge cases you might not notice during manual testing.

- Improves reliability: Tests give you confidence that your code behaves correctly as you refactor or scale.

- Speeds up debugging: When a test fails, it points you directly to the failing function and scenario.

- Enforces consistency: Helps ensure things like data types, structure, and error handling remain predictable.

In [32]:
import unittest
from pandas.testing import assert_frame_equal

# The function being tested
def explode_grades_list(df: pd.DataFrame) -> pd.DataFrame: # noqa: F811
    if 'Grades' not in df.columns:
        raise KeyError("Input DataFrame must contain a 'Grades' column.")
    return df.explode('Grades').rename(columns={'Grades': 'Grade'})


class TestExplodeGradesList(unittest.TestCase):

    def test_basic_explosion(self):
        """Tests a standard case where the list of grades is correctly exploded."""
        input_data = {
            'Student': ['Alice', 'Bob'],
            'Grades': [[90, 85], [78]]
        }
        input_df = pd.DataFrame(input_data)

        result_df = explode_grades_list(input_df).reset_index(drop=True)

        expected_data = {
            'Student': ['Alice', 'Alice', 'Bob'],
            'Grade': [90, 85, 78]
        }
        expected_df = pd.DataFrame(expected_data).reset_index(drop=True)

        assert_frame_equal(result_df, expected_df)

    def test_with_empty_list(self):
        """Tests that a student with an empty list of grades is dropped."""
        input_data = {
            'Student': ['Alice', 'Charlie'],
            'Grades': [[90, 85], []]
        }
        input_df = pd.DataFrame(input_data)

        result_df = explode_grades_list(input_df).reset_index(drop=True)

        expected_data = {
            'Student': ['Alice', 'Alice'],
            'Grade': [90, 85]
        }
        expected_df = pd.DataFrame(expected_data).reset_index(drop=True)

        assert_frame_equal(result_df, expected_df)

    def test_missing_grades_column_raises_error(self):
        """Tests that a missing 'Grades' column raises a KeyError."""
        input_df = pd.DataFrame({'Student': ['David']})

        with self.assertRaises(KeyError) as cm:
            explode_grades_list(input_df)

        self.assertEqual(
            str(cm.exception),
            "Input DataFrame must contain a 'Grades' column."
        )


if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False)


FFF
FAIL: test_basic_explosion (__main__.TestExplodeGradesList.test_basic_explosion)
Tests a standard case where the list of grades is correctly exploded.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/folders/2d/yt4_w6zn5pbfjg_jx5sdmm180000gn/T/ipykernel_13123/2472581863.py", line 29, in test_basic_explosion
    assert_frame_equal(result_df, expected_df)
  File "/Users/dannymorton/Desktop/postGrad/CY_post_grad_data/venv/lib/python3.12/site-packages/pandas/_testing/asserters.py", line 1303, in assert_frame_equal
    assert_series_equal(
  File "/Users/dannymorton/Desktop/postGrad/CY_post_grad_data/venv/lib/python3.12/site-packages/pandas/_testing/asserters.py", line 999, in assert_series_equal
    assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}")
  File "/Users/dannymorton/Desktop/postGrad/CY_post_grad_data/venv/lib/python3.12/site-packages/pandas/_testing/asserters.py", line 421, in assert_attr_e

## We have 3 things to fix and now have to edit the function to no longer break.

### **Fix 1:** dtype mismatch in test_basic_explosion

**Problem:**
<br>
The explode() method with lists of integers results in a column of object dtype, but your expected DataFrame uses native int64. Pandas is strict about dtype matching in tests.

**Solution:**
<br>
Explicitly cast the Grade column to int in both result and expected DataFrames.

### Fix 2: Error message mismatch in test_missing_grades_column_raises_error

**Problem:**
<br>
The actual error string includes quotes automatically when raised, so comparing it as a raw string fails.

**Solution:**
<br>
**Either:**

Use .args[0] on the exception to get the raw message, OR

Use assertIn() instead of assertEqual().

We'll use .args[0] here for precision.

###Fix 3: test_with_empty_list unexpected row

**Problem:**
<br>
By default, explode() does not drop empty rows when the list is empty—it will produce a NaN.

**Solution:**
<br>
Manually drop rows with NaN in the "Grade" column after exploding.



# corrected function example

In [33]:
############################################################################
########                       Old function                         ########
############################################################################

def explode_grades_list(df: pd.DataFrame) -> pd.DataFrame: # noqa: F811
    if 'Grades' not in df.columns:
        raise KeyError("Input DataFrame must contain a 'Grades' column.")
    return df.explode('Grades').rename(columns={'Grades': 'Grade'})


############################################################################
########                       New function                         ########
############################################################################

def explode_grades_list(df: pd.DataFrame) -> pd.DataFrame: # noqa: F811
    if 'Grades' not in df.columns:
        raise KeyError("Input DataFrame must contain a 'Grades' column.")

    exploded = df.explode('Grades').rename(columns={'Grades': 'Grade'})

    # Drop rows where the exploded value is NaN (e.g., from empty lists)
    return exploded.dropna(subset=['Grade'])

# working Unit test

In [34]:
import unittest
import pandas as pd

# The new function being tested
def explode_grades_list(df: pd.DataFrame) -> pd.DataFrame: # noqa: F811
    if 'Grades' not in df.columns:
        raise KeyError("Input DataFrame must contain a 'Grades' column.")
    exploded = df.explode('Grades').rename(columns={'Grades': 'Grade'})
    return exploded.dropna(subset=['Grade'])


class TestExplodeGradesList(unittest.TestCase):

    def test_basic_explosion(self):
        """Tests a standard case where the list of grades is correctly exploded."""
        input_data = {
            'Student': ['Alice', 'Bob'],
            'Grades': [[90, 85], [78]]
        }
        input_df = pd.DataFrame(input_data)

        result_df = explode_grades_list(input_df).reset_index(drop=True)
        result_df['Grade'] = result_df['Grade'].astype(int)

        expected_data = {
            'Student': ['Alice', 'Alice', 'Bob'],
            'Grade': [90, 85, 78]
        }
        expected_df = pd.DataFrame(expected_data).reset_index(drop=True)

        assert_frame_equal(result_df, expected_df)

    def test_with_empty_list(self):
        """Tests that a student with an empty list of grades is dropped."""
        input_data = {
            'Student': ['Alice', 'Charlie'],
            'Grades': [[90, 85], []]
        }
        input_df = pd.DataFrame(input_data)

        result_df = explode_grades_list(input_df).reset_index(drop=True)
        result_df['Grade'] = result_df['Grade'].astype(int)

        expected_data = {
            'Student': ['Alice', 'Alice'],
            'Grade': [90, 85]
        }
        expected_df = pd.DataFrame(expected_data).reset_index(drop=True)

        assert_frame_equal(result_df, expected_df)

    def test_missing_grades_column_raises_error(self):
        """Tests that a missing 'Grades' column raises a KeyError."""
        input_df = pd.DataFrame({'Student': ['David']})

        with self.assertRaises(KeyError) as cm:
            explode_grades_list(input_df)

        self.assertEqual(
            cm.exception.args[0],
            "Input DataFrame must contain a 'Grades' column."
        )

# Run tests
if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False)


...
----------------------------------------------------------------------
Ran 3 tests in 0.008s

OK


# pytest

pytest is a powerful and flexible Python testing framework used to write simple, readable tests and scale to complex test suites. It’s widely used in professional development and supports features like fixtures, parameterized tests, and detailed failure reports out of the box.

<br>

## How pytest Differs from Other Testing Approaches

**Compared to unittest:**

- pytest is simpler and less verbose—no need for classes or self.assertEqual().

- Tests are just plain Python functions that start with test_, and assertions use the built-in assert keyword.

- pytest gives better failure output and is easier to scale for large projects.

<br>

**Compared to pandas.testing:**

- pandas.testing provides low-level utilities like assert_frame_equal() for comparing DataFrames, but it doesn't manage full test workflows or output.

- We often use pandas.testing inside pytest for precise DataFrame validation.

<br>

## Why pytest Needs a Separate File

- Unlike unittest, which can be run inline in a notebook or script, pytest relies on test discovery

- it scans files starting with test_ or ending in _test.py to find and run test functions.

<br>

That means:

- You usually need to save your tests to a .py file (like test_explode.py).

- You run them with a command like !pytest test_explode.py in the terminal or Colab.

<br>

## How We Got pytest Working in a Notebook

- Since Colab/Jupyter doesn’t automatically support pytest inline:

    - We used the %%writefile magic command to write our test code into a .py file.

    - Then, we ran it using !pytest test_explode.py, which allowed pytest to discover and execute the tests properly.

    - Alternatively, we manually called test functions for quick feedback without using the full pytest CLI (Command Line Interface).

In [35]:
%%writefile test_explode.py
import pytest


def explode_grades_list(df: pd.DataFrame) -> pd.DataFrame: # noqa: F811
    if 'Grades' not in df.columns:
        raise KeyError("Input DataFrame must contain a 'Grades' column.")
    exploded = df.explode('Grades').rename(columns={'Grades': 'Grade'})
    return exploded.dropna(subset=['Grade'])


@pytest.fixture
def sample_dataframe():
    data = {
        'Student': ['Alice', 'Bob'],
        'Class': ['Math', 'History'],
        'Grades': [[90, 85], [78]]
    }
    return pd.DataFrame(data)


def test_basic_explosion_pytest(sample_dataframe):
    result_df = explode_grades_list(sample_dataframe).reset_index(drop=True)
    result_df["Grade"] = result_df["Grade"].astype(int)

    expected_df = pd.DataFrame({
        'Student': ['Alice', 'Alice', 'Bob'],
        'Class': ['Math', 'Math', 'History'],
        'Grade': [90, 85, 78]
    }).reset_index(drop=True)
    expected_df["Grade"] = expected_df["Grade"].astype(int)

    assert_frame_equal(result_df, expected_df)


def test_with_empty_list_pytest():
    input_df = pd.DataFrame({
        'Student': ['Alice', 'Charlie'],
        'Grades': [[90, 85], []]
    })
    result_df = explode_grades_list(input_df).reset_index(drop=True)
    result_df["Grade"] = result_df["Grade"].astype(int)

    expected_df = pd.DataFrame({
        'Student': ['Alice', 'Alice'],
        'Grade': [90, 85]
    }).reset_index(drop=True)
    expected_df["Grade"] = expected_df["Grade"].astype(int)

    assert_frame_equal(result_df, expected_df)


def test_missing_grades_column_raises_error_pytest():
    df = pd.DataFrame({'Student': ['David']})
    with pytest.raises(KeyError, match="Input DataFrame must contain a 'Grades' column."):
        explode_grades_list(df)


Overwriting test_explode.py


This will:
- Use pytest to run the test file test_explode.py
- Suppress warnings with --disable-warnings
- Run in quiet mode (-q) so the output is cleaner

In [36]:
!pytest test_explode.py --disable-warnings -q


[31m[1m_______________________ ERROR collecting test_explode.py _______________________[0m
[1m[31mtest_explode.py[0m:4: in <module>
    [0m[94mdef[39;49;00m[90m [39;49;00m[92mexplode_grades_list[39;49;00m(df: pd.DataFrame) -> pd.DataFrame: [90m# noqa: F811[39;49;00m[90m[39;49;00m
                                ^^[90m[39;49;00m
[1m[31mE   NameError: name 'pd' is not defined[0m
[31mERROR[0m test_explode.py - NameError: name 'pd' is not defined
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
[31m[31m[1m1 error[0m[31m in 0.06s[0m[0m


## To run your pytest test file from a regular terminal

- Navigate to `3_tests` folder

Run this command:

```bash
pytest test_explode.py
```

---
---

FIN

---
---
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
---

# For use in colab only

In [37]:
# from google.colab import drive
# drive.mount('/content/drive')

# Used to make the table of contents. 

In [38]:
# import json

# # Path to the notebook file in the Colab environment
# notebook_path = '/content/drive/MyDrive/path/to/your/notebook.ipynb' #<-- CHANGE THIS to your notebook's path

# # Or, if you're working with a notebook that's not saved to Drive yet,
# # you can try to find it in the local Colab file system.
# # This requires you to know the name it's currently running as.
# # from google.colab import _instance_id
# # notebook_path = f'/content/sample_data/colab_notebooks/{_instance_id}' # This is an advanced trick and may not always work

# def generate_toc_from_notebook(notebook_path):
#     """Parses a notebook and generates Markdown for a Table of Contents."""
#     try:
#         with open(notebook_path, 'r', encoding='utf-8') as f:
#             notebook = json.load(f)
#     except FileNotFoundError:
#         print(f"Error: Notebook file not found at '{notebook_path}'")
#         print("Please make sure you have mounted your Google Drive and updated the path.")
#         return

#     toc_markdown = "### **Table of Contents**\n"
#     for cell in notebook['cells']:
#         if cell['cell_type'] == 'markdown':
#             # Check each line of the markdown cell for a heading
#             for line in cell['source']:
#                 if line.startswith('#'):
#                     # Found a heading
#                     level = line.count('#')
#                     title = line.strip('#').strip()
#                     link = title.lower().replace(' ', '-').strip('-.()') # Basic cleaning

#                     # Create indentation based on heading level
#                     indent = '  ' * (level - 1)
#                     toc_markdown += f"{indent}* [{title}](#{link})\n"

#     print("--- Copy the Markdown below and paste it into a new Text Cell ---")
#     print(toc_markdown)


# # IMPORTANT: You need to know the path to your notebook for this to work.
# # 1. Paste the path below.
# notebook_file_path = '/content/drive/MyDrive/Colab Notebooks/post_grad_ideas.ipynb'  # <-- PASTE YOUR COPIED PATH HERE

# generate_toc_from_notebook(notebook_file_path)