# Python Course - Tutorial 5

### Exercise 1: Working with Dates
Perform operations on dates using Python's [datetime](https://docs.python.org/3/library/datetime.html) module.  
This exercise will help you learn how to manipulate dates and perform date arithmetic, which is useful for analyzing timelines and planning.

1. Import the `datetime` module and create an object representing today's date. This will introduce you to creating and using date objects.
2. Calculate the date 100 days from today using a [timedelta](https://docs.python.org/3/library/datetime.html#datetime.timedelta) of 100 days.
3. Calculate the number of days between today and December 31, 2025. This will give you practice in determining the difference between two dates.
4. Determine the day of the week for your next birthday. This will help you practice extracting specific information from a date object. You can use the `datetime` method together with [strftime](https://docs.python.org/3/library/datetime.html#datetime.datetime.strftime) to format the date.


In [None]:
from datetime import datetime, timedelta

# (i) Create an object representing today's date
today = datetime.today()
print(f"Today's Date: {today}")

# Optionally, you can format the date using strftime
print(f"Today's Date: {today.strftime('%Y-%m-%d')}")

# (ii) Calculate the date 100 days from today
future_date = today + timedelta(days=100)
print(f"Date 100 Days from Today: {future_date.strftime('%Y-%m-%d')}")

# (iii) Calculate the number of days between today and December 31, 2024
end_of_year = datetime(2025, 12, 31)
days_until_end_of_year = (end_of_year - today).days
print(f"Days until December 31, 2024: {days_until_end_of_year}")

# (iv) Determine the day of the week for your next birthday
next_birthday = datetime(2026, 5, 15)  # Example: May 15, 2026
day_of_week = next_birthday.strftime("%A")
print(f"Day of the Week for Next Birthday: {day_of_week}")

### Exercise 2: Managing CSV Files
Use Python's `csv` module to work with CSV data.  
This exercise will introduce you to reading from and writing to CSV files, which is a common format for storing and sharing data.

1. Use the [csv.writer](https://docs.python.org/3/library/csv.html#csv.writer) function to create a CSV file named `data.csv` with columns `Name`, `Age`, and `Income`.  
   This will help you understand how to create and structure CSV files.
2. Write data for 5 individuals into the CSV file. This step will show you how to add data to a CSV file.
3. Read the CSV file using the [csv.reader](https://docs.python.org/3/library/csv.html#csv.reader) function and calculate the average income.

In [None]:
import csv

# Task 1: Create a CSV file named 'data.csv' with columns Name, Age, and Income
with open('data.csv', mode='w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["Name", "Age", "Income"])

    # Task 2: Write data for 5 individuals
    writer.writerows([
    ["Alice", 28, 50000],
    ["Bob", 34, 60000],
    ["Charlie", 25, 45000],
    ["David", 40, 70000],
    ["Eve", 30, 52000]
    ])

# Task 3: Read the CSV file and calculate the average income
with open('data.csv', mode='r') as file:
    reader = csv.DictReader(file)
    incomes = [int(row["Income"]) for row in reader]
    average_income = sum(incomes) / len(incomes)
    print("Average Income:", average_income)

### Exercise 3: Navigating File Systems with the OS Module

In this exercise, you will utilize Python's [os](https://docs.python.org/3/library/os.html) module to perform basic file system operations. You'll practice creating directories, handling files, and manipulating file paths.

#### Tasks:

1. Import the `os` module. Use `os.getcwd()` to print the current working directory.
2. Create a new directory named `workspace` using `os.mkdir()`.
3. Within the `workspace` directory, create a text file called `sample.txt` and write "Exploring OS module in Python." into it.
4. Verify if `sample.txt` exists in the `workspace` directory using `os.path.exists()` and print a confirmation message.
5. Open `sample.txt`, read its contents, and display them on the console.
6. Rename `sample.txt` to `sample_updated.txt` using `os.rename()`.
7. Delete `sample_updated.txt` using `os.remove()`, and then remove the `workspace` directory using `os.rmdir()`.


In [None]:
import os

# Task 1: Print the current working directory
current_dir = os.getcwd()
print(f"Current working directory: {current_dir}")

# Task 2: Create a new directory named 'workspace'
os.mkdir('workspace')

# Task 3: Create 'sample.txt' and write content to it
file_path = os.path.join('workspace', 'sample.txt')
with open(file_path, 'w') as file:
    file.write("Exploring OS module in Python.")

# Task 4: Check if 'sample.txt' exists
if os.path.exists(file_path):
    print("'sample.txt' exists in the 'workspace' directory.")

# Task 5: Read and display the contents of 'sample.txt'
with open(file_path, 'r') as file:
    content = file.read()
    print(f"Content of 'sample.txt': {content}")

# Task 6: Rename 'sample.txt' to 'sample_updated.txt'
new_file_path = os.path.join('workspace', 'sample_updated.txt')
os.rename(file_path, new_file_path)

# Task 7: Delete 'sample_updated.txt' and remove 'workspace' directory
os.remove(new_file_path)
os.rmdir('workspace')

### Exercise 4: Debugging a Program

The following Python program is intended to read sales data from a text file called `sales_data.txt` (in the `tutorials/data` directory), process the data to compute the total sales, average sales per day, and identify the day with the highest total sales. However, the program contains several bugs that prevent it from working correctly, particularly in handling datetime values when grouping sales by date. Your task is to identify and fix these bugs using a debugger.

In [None]:
def process_sales_data(path):
    total_sales = 0.0
    sales_per_day = {}
    highest_sales = 0.0
    highest_day = ''
    sales_count = 0

    with open(path, 'r') as file:
        for line in file:
            line = line.strip()
            if not line:
                continue
            date, sales = line.split(',')
            sales = float(sales)
            total_sales += sales
            sales_count += 1
            if date not in sales_per_day:
                sales_per_day[date] = sales
            else:
                sales_per_day[date] += sales
            if sales > highest_sales:
                highest_sales = sales
                highest_day = date

    average_sales_per_day = total_sales / len(sales_per_day)
    return total_sales, average_sales_per_day, highest_day

# Sample usage
total, average_per_day, best_day = process_sales_data('data/sales_data.txt')
print(f"Total Sales: ${total}")
print(f"Average Daily Sales: ${average_per_day}")
print(f"Highest Sales Day: {best_day}")

Total Sales: $22385.000000000004
Average Daily Sales: $1119.2500000000002
Highest Sales Day: 2025-01-03 15:55:00


**Identifying and Fixing Bugs**

When addressing logical errors, set breakpoints at critical points where key values change or computations occur. This allows you to observe how the data evolves and pinpoint potential issues. Remember, the debugger won't activate unless an exception is thrown or you manually define breakpoints. By stepping through the code, you can systematically trace the program's flow and identify discrepancies.

(i) Bugs in the Function:
- **Bug 1**: The variable `date` is assigned as `date = date_time_str`, which includes the time component. This prevents correct grouping by date.
- **Bug 2**: When checking for the highest sales day, the code compares individual `sales` amounts instead of the total sales per day. This means it identifies the transaction with the highest amount, not the day with the highest total sales.
- **Bug 3**: The average sales per day is calculated as `total_sales / len(sales_per_day)`, but if the date extraction is incorrect, the number of days may be wrong or inflated due to time components.
- **Bug 4**: The program does not handle exceptions that may arise from incorrect date formats or parsing errors.

In [None]:
#(ii) Fixed Function:

from datetime import datetime

def process_sales_data(filename):
    # Initialize variables to track total sales, daily sales, and highest sales
    total_sales = 0.0
    sales_per_day = {}  # Dictionary to store sales grouped by date
    highest_sales = 0.0  # Variable to track the highest sales value for a single day
    highest_day = ''  # Variable to track the date with the highest sales
    sales_count = 0  # Counter for the total number of valid sales records processed

    # Open the file containing sales data
    with open(filename, 'r') as file:
        for line in file:
            line = line.strip()  # Remove any leading/trailing whitespace from the line
            if not line:
                continue  # Skip empty lines
            try:
                # Split the line into date-time and sales values
                date_time_str, sales = line.split(',')
                sales = float(sales)  # Convert the sales value to a float
                # Parse the date-time string into a datetime object
                date_time = datetime.strptime(date_time_str, '%Y-%m-%d %H:%M:%S')
                date = date_time.date()  # Extract the date (ignoring time)
            except ValueError:
                # Print a warning for lines that can't be processed and skip them
                print(f"Skipping invalid line: {line}")
                continue

            # Accumulate total sales and count the record
            total_sales += sales
            sales_count += 1
            # Add the sales value to the corresponding date in the dictionary
            if date not in sales_per_day:
                sales_per_day[date] = sales
            else:
                sales_per_day[date] += sales

    # Find the day with the highest total sales
    for date, daily_total in sales_per_day.items():
        if daily_total > highest_sales:
            highest_sales = daily_total
            highest_day = date

    # Calculate average sales per day, guarding against division by zero
    if sales_per_day:
        average_sales_per_day = total_sales / len(sales_per_day)
    else:
        average_sales_per_day = 0.0

    # Return a tuple with total sales, average sales per day, and the day with the highest sales
    return total_sales, average_sales_per_day, highest_day


# Sample usage
total, average_per_day, best_day = process_sales_data('sales_data.txt')
print(f"Total Sales: ${total}")
print(f"Average Daily Sales: ${average_per_day}")
print(f"Highest Sales Day: {best_day}")

### Exercise 5: Debugging an Experimental Data Analysis Pipeline

This exercise introduces you to a small experimental data analysis workflow.  
The corresponding code and data are provided in the `tutorial5_debugging` directory.

Two CSV files contain information about study participants and their pre- and post-experiment scores.  
The code loads, cleans, merges, and aggregates the data.

A key quantity in this exercise is the **improvement index**, intended to measure relative improvement from pre- to post-score:

$$
\text{improvement index} = \frac{\text{post\_score} - \text{pre\_score}}{\text{pre\_score}},
$$

where a positive value indicates improvement and a negative value indicates a decline.

After running the program, you will notice that the summary statistics appear **inconsistent** with the raw data.  
Your task is to **debug the analysis**, identify the source of the inconsistency, and fix the underlying issue.


**Solution**

After running the code, the summary statistics show positive mean changes in raw scores but **negative mean improvement indices** for both groups. Because the program runs without errors, the issue must come from the **logic inside one of the functions**. Using the debugger helps you trace where the incorrect values are produced.

1. **Set a breakpoint in `main.py`**

   Place a breakpoint **before** the index is created:

   ```python
   index = index_measurements_by_participant(measurements)
   ```

   Start debugging (PyCharm or VS Code). You can now:

   * Step through `index_measurements_by_participant`.
   * Step over to the line

     ```python
     improvements = compute_improvements(participants, index)
     ```

   You may also set a **conditional breakpoint**, e.g.:

   * `post_score > pre_score` but `improvement < 0`.

   This immediately highlights suspicious cases.

2. **Step into `compute_improvements` (in `stats.py`)**

   Use *Step Into* when the debugger reaches the function call.
   Inspect the values for each participant:

   * `pre_score`
   * `post_score`
   * calculated `improvement`

   You will find cases where the post-score is larger than the pre-score, yet the computed improvement is **negative**.
   This reveals that the formula in the function is incorrect.

3. **Locate and correct the bug**

   In `stats.py` the function currently computes:

   ```python
   improvement = pre_score - post_score
   ```

   This produces negative values whenever scores increase.
   The correct direction of change is:

   ```python
   improvement = post_score - pre_score
   ```

4. **Verify the fix**

   Run the debugger again:

   * Step into `compute_improvements` and check a few participants.
   * Confirm that whenever `post_score > pre_score`, the improvement is now positive.
   * Let the program finish and review the updated summary statistics.

### Exercise 6: Git and GitHub

1. In our `python_course` repository, check out a new branch on your computer called `feature/<some_creative_name>`.
2. Create a new file called `git_exercise.py` which contains a small function that prints "Hello Git!" to the console.
3. Commit and push the changes to the remote repository.
4. On GitHub, create a pull request to merge your branch into the `main` branch (you should only create the pull request, but not actually merge the branches).


### Exercise 7: GitHub Copilot Installation

1. Sign up for the free student plan of GitHub Copilot, e.g., on [GitHub Education](https://github.com/education/students).
2. Install the GitHub Copilot extension in Visual Studio Code and the GitHub Copilot plugin in PyCharm.
3. Test the extension/plugin by writing a few lines of code in both editors.

### Exercise 8: Learn Git Branching (Optional)

For anyone who would like to dive deeper into the possibilities of Git, I recommend doing the tutorial at [https://learngitbranching.js.org/](https://learngitbranching.js.org/).\
It is a great option to learn Git interactively and in a fun way.


# Python Course - Tutorial 3 & 5 Sample Exam

---

## Part A (4 points)

**Answer the following questions with a single word or at most one sentence.**

### A.1
What is the output of the following code snippet?



In [None]:
from datetime import datetime, timedelta
today = datetime.today()
future_date = today + timedelta(days=100)
print(type(future_date))



### A.2
In Exercise 2 (Managing CSV Files), what function from the `csv` module is used to read CSV data and return each row as a dictionary with column headers as keys?

### A.3
In Exercise 3 (Navigating File Systems), what `os` module function is used to check whether a file or directory exists at a given path?

### A.4
In Exercise 4 (Debugging Sales Data), what was the primary bug that prevented the program from correctly identifying the day with the highest total sales?

---

## Part B (8 points)

**For each question, mark all correct answer options. All correct answers must be marked (and no incorrect ones) to receive 2 points. Each deviation costs 1 point (minimum 0 points per question).**

### B.1
Consider the following code from Exercise 1 (Working with Dates):



In [None]:
from datetime import datetime, timedelta

today = datetime.today()
end_of_year = datetime(2025, 12, 31)
days_until_end_of_year = (end_of_year - today).days



Which of the following statements are correct?

(a) The variable `days_until_end_of_year` will always be a positive integer

(b) The expression `(end_of_year - today)` returns a `timedelta` object

(c) If today is December 31, 2025, then `days_until_end_of_year` evaluates to `0`

(d) The `.days` attribute of a `timedelta` object returns a float value representing the fractional number of days

### B.2
In Exercise 2 (Managing CSV Files), consider the following code that reads and processes CSV data:



In [None]:
import csv

with open('data.csv', mode='r') as file:
    reader = csv.DictReader(file)
    incomes = [int(row["Income"]) for row in reader]
    average_income = sum(incomes) / len(incomes)



Which of the following statements are correct?

(a) `csv.DictReader` automatically converts string values to their appropriate data types (int, float, etc.)

(b) If the CSV file is empty (only headers), the code will raise a ZeroDivisionError

(c) The variable `reader` can be iterated over multiple times without reopening the file

(d) Each `row` returned by `csv.DictReader` is a dictionary where keys are column names from the header row

### B.3
Consider the file operations in Exercise 3 (Navigating File Systems). Which of the following statements about the `os` module functions are correct?



In [None]:
import os

os.mkdir('workspace')
file_path = os.path.join('workspace', 'sample.txt')
os.rename(file_path, new_file_path)
os.remove(new_file_path)
os.rmdir('workspace')



(a) `os.mkdir()` will raise an exception if the directory already exists

(b) `os.path.join()` correctly handles path separators across different operating systems (Windows, macOS, Linux)

(c) `os.rename()` can move a file to a different directory if the new path includes a different directory name

(d) `os.rmdir()` can delete a directory regardless of whether it contains files

### B.4
In Exercise 4 (Debugging Sales Data), the original buggy function compared individual `sales` amounts instead of total sales per day when finding the highest sales day. Consider the following corrected code:



In [None]:
for date, daily_total in sales_per_day.items():
    if daily_total > highest_sales:
        highest_sales = daily_total
        highest_day = date



Which of the following statements are correct?

(a) The original buggy version compared `sales > highest_sales` instead of `daily_total > highest_sales`

(b) The bug caused the function to return the date of the **transaction** with the highest single sale, not the **day** with the highest total sales

(c) The corrected version properly iterates through the `sales_per_day` dictionary to find the day with maximum accumulated sales

(d) Using `max(sales_per_day.items(), key=lambda item: item[1])` would be an equivalent Pythonic alternative to the corrected loop

---

---

## Answer Key & Explanations

### Part A Solutions

| Question | Answer | Explanation |
|----------|--------|-------------|
| **A.1** | `<class 'datetime.datetime'>` | The `timedelta` is added to a `datetime` object, returning another `datetime` object. The `type()` function confirms this as the `datetime.datetime` class. |
| **A.2** | `csv.DictReader` | Exercise 2 uses `csv.DictReader(file)` to read CSV files and return rows as dictionaries. The standard reader returns lists instead. |
| **A.3** | `os.path.exists()` | Exercise 3 explicitly uses: `if os.path.exists(file_path):` to verify file existence before operations. |
| **A.4** | The code compared individual `sales` amounts instead of daily total sales when finding the highest sales day | The buggy version checked `if sales > highest_sales` (single transaction) instead of `if daily_total > highest_sales` (accumulated daily total). |

---

### Part B Solutions

#### **B.1: Working with Dates - Correct Answers: (b), (c)**

**Analysis:**

| Option | Correct? | Reasoning |
|--------|----------|-----------|
| **(a)** | ❌ **NO** | If `today` is after December 31, 2025, then `days_until_end_of_year` would be negative. The statement assumes it's always positive, which is incorrect. |
| **(b)** | ✅ **YES** | Subtracting two `datetime` objects returns a `timedelta` object. This is the core datetime arithmetic operation in Python. |
| **(c)** | ✅ **YES** | If today equals December 31, 2025, then `(end_of_year - today).days = 0`. The difference between the same date is zero days. |
| **(d)** | ❌ **NO** | The `.days` attribute returns an **integer**, not a float. For fractional seconds/microseconds, you would use `.total_seconds()` which returns a float. |

**Points:** (b), (c) = **2 points**

---

#### **B.2: Managing CSV Files - Correct Answers: (b), (d)**

**Analysis:**

| Option | Correct? | Reasoning |
|--------|----------|-----------|
| **(a)** | ❌ **NO** | `csv.DictReader` reads all values as **strings** by default. Type conversion (like `int(row["Income"])`) must be done manually, as shown in the code with `int(row["Income"])`. |
| **(b)** | ✅ **YES** | If the CSV file contains only headers and no data rows, the list `incomes` will be empty. Then `sum(incomes) / len(incomes)` becomes `0 / 0`, raising **ZeroDivisionError**. |
| **(c)** | ❌ **NO** | Once a file is opened with `with open()` and iterated over (as `csv.DictReader` does internally), the iterator is exhausted. You cannot iterate again without reopening the file. The `with` block also closes the file automatically. |
| **(d)** | ✅ **YES** | This is the core feature of `csv.DictReader`: it reads the header row and uses those column names as dictionary keys for each subsequent data row. |

**Points:** (b), (d) = **2 points**

---

#### **B.3: File System Operations with `os` Module - Correct Answers: (a), (b), (c)**

**Analysis:**

| Option | Correct? | Reasoning |
|--------|----------|-----------|
| **(a)** | ✅ **YES** | `os.mkdir()` raises a `FileExistsError` if the directory already exists. This is documented behavior. To avoid this, use `os.makedirs(..., exist_ok=True)`. |
| **(b)** | ✅ **YES** | `os.path.join()` automatically uses the correct path separator for the operating system: `\` on Windows, `/` on macOS/Linux. This is its primary purpose. |
| **(c)** | ✅ **YES** | `os.rename(old_path, new_path)` can move files across directories if the new path specifies a different directory. Example: `os.rename('workspace/file.txt', 'other_dir/file.txt')` moves the file. |
| **(d)** | ❌ **NO** | `os.rmdir()` only removes **empty** directories. If the directory contains files, it raises an `OSError`. To remove directories with contents, use `shutil.rmtree()`. |

**Points:** (a), (b), (c) = **2 points**

---

#### **B.4: Debugging Sales Data - Correct Answers: (a), (b), (c), (d)**

**Analysis:**

| Option | Correct? | Reasoning |
|--------|----------|-----------|
| **(a)** | ✅ **YES** | The original code had: `if sales > highest_sales:` which compared individual transaction amounts. The corrected version uses `if daily_total > highest_sales:`. This is explicitly stated in Exercise 4's bug description. |
| **(b)** | ✅ **YES** | Because the buggy code compared `sales` (single transaction) instead of `daily_total`, it identified the **date of the highest single transaction**, not the **day with the highest accumulated sales**. Exercise 4 specifically identifies this as Bug 2. |
| **(c)** | ✅ **YES** | The corrected code properly iterates `for date, daily_total in sales_per_day.items()` and compares accumulated daily totals. This correctly solves the problem. |
| **(d)** | ✅ **YES** | Using `max(sales_per_day.items(), key=lambda item: item[1])` returns a tuple `(date, daily_total)` where `daily_total` is maximum. This is functionally equivalent to the loop and is more Pythonic. Example: `highest_day, highest_sales = max(sales_per_day.items(), key=lambda item: item[1])`. |

**Points:** (a), (b), (c), (d) = **2 points**

---

## Summary Table

| Part | Question | Correct Answers | Points |
|------|----------|-----------------|--------|
| **A** | A.1 | `<class 'datetime.datetime'>` | 1 |
| **A** | A.2 | `csv.DictReader` | 1 |
| **A** | A.3 | `os.path.exists()` | 1 |
| **A** | A.4 | Compared individual sales instead of daily totals | 1 |
| **B** | B.1 | (b), (c) | 2 |
| **B** | B.2 | (b), (d) | 2 |
| **B** | B.3 | (a), (b), (c) | 2 |
| **B** | B.4 | (a), (b), (c), (d) | 2 |
| | **Total** | | **16 points** |

---

## Detailed Concept Notes

### Key Concepts Tested

**Tutorial 5 Topics:**

1. **Working with Dates** (Exercise 1)
   - `datetime` module: creating date objects
   - `timedelta` for date arithmetic
   - Difference between `datetime` and `timedelta` objects
   - `.days` attribute (integer) vs `.total_seconds()` (float)

2. **CSV File I/O** (Exercise 2)
   - `csv.writer()` vs `csv.DictReader()`
   - Automatic type conversion (doesn't happen - must be manual)
   - Iterator exhaustion and file closure
   - Dictionary-based row access

3. **File System Operations** (Exercise 3)
   - `os.getcwd()`, `os.mkdir()`, `os.path.exists()`
   - `os.path.join()` for cross-platform path handling
   - `os.rename()` for moving/renaming files
   - `os.remove()` and `os.rmdir()` limitations
   - Context managers (`with` statement) for file handling

4. **Debugging Techniques** (Exercise 4 & 5)
   - Using debuggers to identify logic errors
   - Setting breakpoints and stepping through code
   - Comparing individual values vs. aggregated values
   - Exception handling in file operations

---

### Common Pitfalls & Student Mistakes

| Topic | Common Mistake | Correct Approach |
|-------|----------------|------------------|
| **Date Arithmetic** | Assuming date subtraction returns an integer | Date subtraction returns `timedelta`; use `.days` for integer days |
| **CSV Reading** | Expecting automatic type conversion | All CSV values are strings; convert manually with `int()`, `float()`, etc. |
| **CSV Iterator** | Attempting to iterate a `DictReader` multiple times | Must reopen file or create a new reader; iterators are consumed |
| **Path Handling** | Using hardcoded `/` or `\` in paths | Use `os.path.join()` for cross-platform compatibility |
| **Directory Deletion** | Using `os.rmdir()` on non-empty directories | Use `shutil.rmtree()` or delete files first |
| **Debugging Logic** | Confusing individual values with aggregated values | Use breakpoints to inspect intermediate values during iteration |
| **Type Checking** | Not validating data types from external files | Always validate and convert external data before use |

---

### Exam Strategy Tips

1. **Part A Questions:**
   - Pay close attention to **return types** (e.g., `datetime` vs `timedelta` vs `int`)
   - Know **module function names** (`csv.DictReader`, `os.path.exists`, etc.)
   - Understand **edge cases** (empty files, directory existence, negative dates)

2. **Part B Questions:**
   - **Read carefully**: "which are correct" means potentially multiple answers
   - **Test edge cases**: empty data, boundary conditions, type mismatches
   - **Know module limitations**: `os.rmdir()` only works on empty directories, `csv.DictReader` returns strings, etc.
   - **Recognize Pythonic alternatives**: `max()` with `key` parameter, list comprehensions, etc.

# Python Course - Tutorial 5 Sample Exam

---

## Part A (4 points)

**Answer the following questions with a single word or at most one sentence.**

### A.1
What is the output of the following code snippet?



In [None]:
from datetime import datetime, timedelta
today = datetime.today()
future_date = today + timedelta(days=100)
print(type(future_date))



### A.2
In Exercise 2 (Managing CSV Files), what function from the `csv` module is used to read CSV data and return each row as a dictionary with column headers as keys?

### A.3
In Exercise 3 (Navigating File Systems), what `os` module function is used to check whether a file or directory exists at a given path?

### A.4
In Exercise 4 (Debugging Sales Data), what was the primary bug that prevented the program from correctly identifying the day with the highest total sales?

---

## Part B (8 points)

**For each question, mark all correct answer options. All correct answers must be marked (and no incorrect ones) to receive 2 points. Each deviation costs 1 point (minimum 0 points per question).**

### B.1
Consider the following code from Exercise 1 (Working with Dates):



In [None]:
from datetime import datetime, timedelta

today = datetime.today()
end_of_year = datetime(2025, 12, 31)
days_until_end_of_year = (end_of_year - today).days



Which of the following statements are correct?

(a) The variable `days_until_end_of_year` will always be a positive integer

(b) The expression `(end_of_year - today)` returns a `timedelta` object

(c) If today is December 31, 2025, then `days_until_end_of_year` evaluates to `0`

(d) The `.days` attribute of a `timedelta` object returns a float value representing the fractional number of days

### B.2
In Exercise 2 (Managing CSV Files), consider the following code that reads and processes CSV data:



In [None]:
import csv

with open('data.csv', mode='r') as file:
    reader = csv.DictReader(file)
    incomes = [int(row["Income"]) for row in reader]
    average_income = sum(incomes) / len(incomes)



Which of the following statements are correct?

(a) `csv.DictReader` automatically converts string values to their appropriate data types (int, float, etc.)

(b) If the CSV file is empty (only headers), the code will raise a ZeroDivisionError

(c) The variable `reader` can be iterated over multiple times without reopening the file

(d) Each `row` returned by `csv.DictReader` is a dictionary where keys are column names from the header row

### B.3
Consider the file operations in Exercise 3 (Navigating File Systems). Which of the following statements about the `os` module functions are correct?



In [None]:
import os

os.mkdir('workspace')
file_path = os.path.join('workspace', 'sample.txt')
os.rename(file_path, new_file_path)
os.remove(new_file_path)
os.rmdir('workspace')



(a) `os.mkdir()` will raise an exception if the directory already exists

(b) `os.path.join()` correctly handles path separators across different operating systems (Windows, macOS, Linux)

(c) `os.rename()` can move a file to a different directory if the new path includes a different directory name

(d) `os.rmdir()` can delete a directory regardless of whether it contains files

### B.4
In Exercise 4 (Debugging Sales Data), the original buggy function compared individual `sales` amounts instead of total sales per day when finding the highest sales day. Consider the following corrected code:



In [None]:
for date, daily_total in sales_per_day.items():
    if daily_total > highest_sales:
        highest_sales = daily_total
        highest_day = date



Which of the following statements are correct?

(a) The original buggy version compared `sales > highest_sales` instead of `daily_total > highest_sales`

(b) The bug caused the function to return the date of the **transaction** with the highest single sale, not the **day** with the highest total sales

(c) The corrected version properly iterates through the `sales_per_day` dictionary to find the day with maximum accumulated sales

(d) Using `max(sales_per_day.items(), key=lambda item: item[1])` would be an equivalent Pythonic alternative to the corrected loop

---

---

## Answer Key & Explanations

### Part A Solutions

| Question | Answer | Explanation |
|----------|--------|-------------|
| **A.1** | `<class 'datetime.datetime'>` | The `timedelta` is added to a `datetime` object, returning another `datetime` object. The `type()` function confirms this as the `datetime.datetime` class. |
| **A.2** | `csv.DictReader` | Exercise 2 uses `csv.DictReader(file)` to read CSV files and return rows as dictionaries. The standard reader returns lists instead. |
| **A.3** | `os.path.exists()` | Exercise 3 explicitly uses: `if os.path.exists(file_path):` to verify file existence before operations. |
| **A.4** | The code compared individual `sales` amounts instead of daily total sales when finding the highest sales day | The buggy version checked `if sales > highest_sales` (single transaction) instead of `if daily_total > highest_sales` (accumulated daily total). |

---

### Part B Solutions

#### **B.1: Working with Dates - Correct Answers: (b), (c)**

**Analysis:**

| Option | Correct? | Reasoning |
|--------|----------|-----------|
| **(a)** | ❌ **NO** | If `today` is after December 31, 2025, then `days_until_end_of_year` would be negative. The statement assumes it's always positive, which is incorrect. |
| **(b)** | ✅ **YES** | Subtracting two `datetime` objects returns a `timedelta` object. This is the core datetime arithmetic operation in Python. |
| **(c)** | ✅ **YES** | If today equals December 31, 2025, then `(end_of_year - today).days = 0`. The difference between the same date is zero days. |
| **(d)** | ❌ **NO** | The `.days` attribute returns an **integer**, not a float. For fractional seconds/microseconds, you would use `.total_seconds()` which returns a float. |

**Points:** (b), (c) = **2 points**

---

#### **B.2: Managing CSV Files - Correct Answers: (b), (d)**

**Analysis:**

| Option | Correct? | Reasoning |
|--------|----------|-----------|
| **(a)** | ❌ **NO** | `csv.DictReader` reads all values as **strings** by default. Type conversion (like `int(row["Income"])`) must be done manually, as shown in the code with `int(row["Income"])`. |
| **(b)** | ✅ **YES** | If the CSV file contains only headers and no data rows, the list `incomes` will be empty. Then `sum(incomes) / len(incomes)` becomes `0 / 0`, raising **ZeroDivisionError**. |
| **(c)** | ❌ **NO** | Once a file is opened with `with open()` and iterated over (as `csv.DictReader` does internally), the iterator is exhausted. You cannot iterate again without reopening the file. The `with` block also closes the file automatically. |
| **(d)** | ✅ **YES** | This is the core feature of `csv.DictReader`: it reads the header row and uses those column names as dictionary keys for each subsequent data row. |

**Points:** (b), (d) = **2 points**

---

#### **B.3: File System Operations with `os` Module - Correct Answers: (a), (b), (c)**

**Analysis:**

| Option | Correct? | Reasoning |
|--------|----------|-----------|
| **(a)** | ✅ **YES** | `os.mkdir()` raises a `FileExistsError` if the directory already exists. This is documented behavior. To avoid this, use `os.makedirs(..., exist_ok=True)`. |
| **(b)** | ✅ **YES** | `os.path.join()` automatically uses the correct path separator for the operating system: `\` on Windows, `/` on macOS/Linux. This is its primary purpose. |
| **(c)** | ✅ **YES** | `os.rename(old_path, new_path)` can move files across directories if the new path specifies a different directory. Example: `os.rename('workspace/file.txt', 'other_dir/file.txt')` moves the file. |
| **(d)** | ❌ **NO** | `os.rmdir()` only removes **empty** directories. If the directory contains files, it raises an `OSError`. To remove directories with contents, use `shutil.rmtree()`. |

**Points:** (a), (b), (c) = **2 points**

---

#### **B.4: Debugging Sales Data - Correct Answers: (a), (b), (c), (d)**

**Analysis:**

| Option | Correct? | Reasoning |
|--------|----------|-----------|
| **(a)** | ✅ **YES** | The original code had: `if sales > highest_sales:` which compared individual transaction amounts. The corrected version uses `if daily_total > highest_sales:`. This is explicitly stated in Exercise 4's bug description. |
| **(b)** | ✅ **YES** | Because the buggy code compared `sales` (single transaction) instead of `daily_total`, it identified the **date of the highest single transaction**, not the **day with the highest accumulated sales**. Exercise 4 specifically identifies this as Bug 2. |
| **(c)** | ✅ **YES** | The corrected code properly iterates `for date, daily_total in sales_per_day.items()` and compares accumulated daily totals. This correctly solves the problem. |
| **(d)** | ✅ **YES** | Using `max(sales_per_day.items(), key=lambda item: item[1])` returns a tuple `(date, daily_total)` where `daily_total` is maximum. This is functionally equivalent to the loop and is more Pythonic. Example: `highest_day, highest_sales = max(sales_per_day.items(), key=lambda item: item[1])`. |

**Points:** (a), (b), (c), (d) = **2 points**

---

## Summary Table

| Part | Question | Correct Answers | Points |
|------|----------|-----------------|--------|
| **A** | A.1 | `<class 'datetime.datetime'>` | 1 |
| **A** | A.2 | `csv.DictReader` | 1 |
| **A** | A.3 | `os.path.exists()` | 1 |
| **A** | A.4 | Compared individual sales instead of daily totals | 1 |
| **B** | B.1 | (b), (c) | 2 |
| **B** | B.2 | (b), (d) | 2 |
| **B** | B.3 | (a), (b), (c) | 2 |
| **B** | B.4 | (a), (b), (c), (d) | 2 |
| | **Total** | | **16 points** |

---

## Detailed Concept Notes

### Key Concepts Tested

**Tutorial 5 Topics:**

1. **Working with Dates** (Exercise 1)
   - `datetime` module: creating date objects with `datetime.today()` and explicit dates
   - `timedelta` for date arithmetic: adding/subtracting days
   - Difference between `datetime` and `timedelta` objects
   - `.days` attribute returns **integer**, not float
   - `.total_seconds()` method for fractional time values
   - String formatting with `strftime()` for date display

2. **CSV File I/O** (Exercise 2)
   - `csv.writer()` for writing CSV files with `writerow()` and `writerows()`
   - `csv.DictReader()` for reading CSV as dictionaries
   - **All values are strings** by default - manual type conversion required
   - Iterator exhaustion: cannot iterate `DictReader` twice without reopening
   - File context managers (`with` statement) for automatic file closure
   - Dictionary-based row access using column names as keys

3. **File System Operations with `os` Module** (Exercise 3)
   - `os.getcwd()`: get current working directory
   - `os.mkdir()`: create single directory (raises `FileExistsError` if exists)
   - `os.path.join()`: cross-platform path construction
   - `os.path.exists()`: check file/directory existence
   - File I/O with context managers
   - `os.rename()`: rename or move files/directories across directories
   - `os.remove()`: delete files
   - `os.rmdir()`: delete **empty** directories only
   - Path construction best practices for portability

4. **Debugging Techniques** (Exercise 4)
   - Using debuggers to identify **logic errors** (not syntax errors)
   - Setting breakpoints at critical decision points
   - Stepping through loop iterations
   - Conditional breakpoints: e.g., `post_score > pre_score` but `improvement < 0`
   - Inspecting variable values during execution
   - **Individual vs. Aggregate Comparisons**: comparing single values vs. accumulated totals
   - Exception handling: try-except blocks for file parsing
   - Parsing datetime strings with `strptime()` and extracting date component

---

### Common Pitfalls & Student Mistakes

| Topic | Common Mistake | Correct Approach |
|-------|----------------|------------------|
| **Date Arithmetic** | Assuming `(date1 - date2)` returns an integer | Date subtraction returns `timedelta`; use `.days` for integer days |
| **CSV Type Conversion** | Expecting automatic type conversion from CSV | All CSV values are strings; convert manually with `int()`, `float()`, etc. |
| **CSV Iterator** | Attempting to iterate a `DictReader` multiple times | Must reopen file or create new reader; iterators are consumed after first iteration |
| **Empty CSV Files** | Not handling division by zero with empty data | Check `len(incomes) > 0` before division or handle `ZeroDivisionError` |
| **Path Handling** | Using hardcoded `/` or `\` in file paths | Use `os.path.join()` for automatic cross-platform path separators |
| **Directory Operations** | Using `os.rmdir()` on directories with files | Only works on empty directories; use `shutil.rmtree()` for non-empty dirs |
| **File Closure** | Forgetting to close files after reading | Always use `with` context manager for automatic file closure |
| **Debugging Logic Errors** | Comparing single values instead of aggregates | Debug by inspecting intermediate values; watch for off-by-one or aggregation errors |
| **Date Parsing** | Not handling timezone-naive datetime objects | `datetime.today()` returns naive datetime (no timezone); use `datetime.now()` for UTC-aware |
| **Datetime Formatting** | Mixing up `strptime()` (parse) vs `strftime()` (format) | `strptime()` creates datetime from string; `strftime()` converts datetime to string |

---

### Exam Strategy Tips

1. **Part A Questions:**
   - Pay close attention to **return types**: `datetime` vs `timedelta` vs `int`
   - Know **module function names** exactly: `csv.DictReader`, `os.path.exists()`, `os.path.join()`
   - Understand **edge cases**: empty files, directory existence, negative dates, type conversions
   - Be precise about **integer vs. float** attributes (`.days` is int, `.total_seconds()` is float)

2. **Part B Questions:**
   - **Read carefully**: "which are correct" means potentially multiple answers
   - **Test edge cases**: empty CSV files (ZeroDivisionError), non-existent directories, file exhaustion
   - **Know module limitations**: 
     - `csv.DictReader` returns strings only
     - `os.rmdir()` only works on empty directories
     - Iterators are consumed after first iteration
   - **Recognize Pythonic alternatives**: `max()` with `key` parameter vs. explicit loops
   - **Datetime subtraction**: always produces `timedelta`, use `.days` for integer result
   - **Cross-platform compatibility**: `os.path.join()` is essential, not hardcoded separators

---

### Sample Debugging Workflow (Exercise 4 Reference)

**Problem:** Summary statistics show negative improvement indices despite positive score changes.

**Debugging Steps:**
1. Set breakpoint before `compute_improvements()` call
2. Step into function and inspect `pre_score` and `post_score` values
3. Check if `post_score > pre_score` but `improvement < 0`
4. Trace the improvement calculation: `improvement = post_score - pre_score` (CORRECT) vs `improvement = pre_score - post_score` (BUGGY)
5. Fix the formula and re-run to verify

**Key Insight:** Logic errors (wrong formula) won't raise exceptions; they silently produce incorrect results. Debuggers reveal the truth by letting you observe actual values.