# Python Course - Tutorial 5

### Exercise 1: Working with Dates
Perform operations on dates using Python's [datetime](https://docs.python.org/3/library/datetime.html) module.  
This exercise will help you learn how to manipulate dates and perform date arithmetic, which is useful for analyzing timelines and planning.

1. Import the `datetime` module and create an object representing today's date. This will introduce you to creating and using date objects.
2. Calculate the date 100 days from today using a [timedelta](https://docs.python.org/3/library/datetime.html#datetime.timedelta) of 100 days.
3. Calculate the number of days between today and December 31, 2025. This will give you practice in determining the difference between two dates.
4. Determine the day of the week for your next birthday. This will help you practice extracting specific information from a date object. You can use the `datetime` method together with [strftime](https://docs.python.org/3/library/datetime.html#datetime.datetime.strftime) to format the date.


In [None]:
from datetime import datetime, timedelta

# (i) Create an object representing today's date
today = datetime.today()
print(f"Today's Date: {today}")

# Optionally, you can format the date using strftime
print(f"Today's Date: {today.strftime('%Y-%m-%d')}")

# (ii) Calculate the date 100 days from today
future_date = today + timedelta(days=100)
print(f"Date 100 Days from Today: {future_date.strftime('%Y-%m-%d')}")

# (iii) Calculate the number of days between today and December 31, 2024
end_of_year = datetime(2025, 12, 31)
days_until_end_of_year = (end_of_year - today).days
print(f"Days until December 31, 2024: {days_until_end_of_year}")

# (iv) Determine the day of the week for your next birthday
next_birthday = datetime(2026, 5, 15)  # Example: May 15, 2026
day_of_week = next_birthday.strftime("%A")
print(f"Day of the Week for Next Birthday: {day_of_week}")

### Exercise 2: Managing CSV Files
Use Python's `csv` module to work with CSV data.  
This exercise will introduce you to reading from and writing to CSV files, which is a common format for storing and sharing data.

1. Use the [csv.writer](https://docs.python.org/3/library/csv.html#csv.writer) function to create a CSV file named `data.csv` with columns `Name`, `Age`, and `Income`.  
   This will help you understand how to create and structure CSV files.
2. Write data for 5 individuals into the CSV file. This step will show you how to add data to a CSV file.
3. Read the CSV file using the [csv.reader](https://docs.python.org/3/library/csv.html#csv.reader) function and calculate the average income.

In [None]:
import csv

# Task 1: Create a CSV file named 'data.csv' with columns Name, Age, and Income
with open('data.csv', mode='w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["Name", "Age", "Income"])

    # Task 2: Write data for 5 individuals
    writer.writerows([
    ["Alice", 28, 50000],
    ["Bob", 34, 60000],
    ["Charlie", 25, 45000],
    ["David", 40, 70000],
    ["Eve", 30, 52000]
    ])

# Task 3: Read the CSV file and calculate the average income
with open('data.csv', mode='r') as file:
    reader = csv.DictReader(file)
    incomes = [int(row["Income"]) for row in reader]
    average_income = sum(incomes) / len(incomes)
    print("Average Income:", average_income)

### Exercise 3: Navigating File Systems with the OS Module

In this exercise, you will utilize Python's [os](https://docs.python.org/3/library/os.html) module to perform basic file system operations. You'll practice creating directories, handling files, and manipulating file paths.

#### Tasks:

1. Import the `os` module. Use `os.getcwd()` to print the current working directory.
2. Create a new directory named `workspace` using `os.mkdir()`.
3. Within the `workspace` directory, create a text file called `sample.txt` and write "Exploring OS module in Python." into it.
4. Verify if `sample.txt` exists in the `workspace` directory using `os.path.exists()` and print a confirmation message.
5. Open `sample.txt`, read its contents, and display them on the console.
6. Rename `sample.txt` to `sample_updated.txt` using `os.rename()`.
7. Delete `sample_updated.txt` using `os.remove()`, and then remove the `workspace` directory using `os.rmdir()`.


In [None]:
import os

# Task 1: Print the current working directory
current_dir = os.getcwd()
print(f"Current working directory: {current_dir}")

# Task 2: Create a new directory named 'workspace'
os.mkdir('workspace')

# Task 3: Create 'sample.txt' and write content to it
file_path = os.path.join('workspace', 'sample.txt')
with open(file_path, 'w') as file:
    file.write("Exploring OS module in Python.")

# Task 4: Check if 'sample.txt' exists
if os.path.exists(file_path):
    print("'sample.txt' exists in the 'workspace' directory.")

# Task 5: Read and display the contents of 'sample.txt'
with open(file_path, 'r') as file:
    content = file.read()
    print(f"Content of 'sample.txt': {content}")

# Task 6: Rename 'sample.txt' to 'sample_updated.txt'
new_file_path = os.path.join('workspace', 'sample_updated.txt')
os.rename(file_path, new_file_path)

# Task 7: Delete 'sample_updated.txt' and remove 'workspace' directory
os.remove(new_file_path)
os.rmdir('workspace')

### Exercise 4: Debugging a Program

The following Python program is intended to read sales data from a text file called `sales_data.txt` (in the `tutorials/data` directory), process the data to compute the total sales, average sales per day, and identify the day with the highest total sales. However, the program contains several bugs that prevent it from working correctly, particularly in handling datetime values when grouping sales by date. Your task is to identify and fix these bugs using a debugger.

In [None]:
def process_sales_data(path):
    total_sales = 0.0
    sales_per_day = {}
    highest_sales = 0.0
    highest_day = ''
    sales_count = 0

    with open(path, 'r') as file:
        for line in file:
            line = line.strip()
            if not line:
                continue
            date, sales = line.split(',')
            sales = float(sales)
            total_sales += sales
            sales_count += 1
            if date not in sales_per_day:
                sales_per_day[date] = sales
            else:
                sales_per_day[date] += sales
            if sales > highest_sales:
                highest_sales = sales
                highest_day = date

    average_sales_per_day = total_sales / len(sales_per_day)
    return total_sales, average_sales_per_day, highest_day

# Sample usage
total, average_per_day, best_day = process_sales_data('data/sales_data.txt')
print(f"Total Sales: ${total}")
print(f"Average Daily Sales: ${average_per_day}")
print(f"Highest Sales Day: {best_day}")

Total Sales: $22385.000000000004
Average Daily Sales: $1119.2500000000002
Highest Sales Day: 2025-01-03 15:55:00


**Identifying and Fixing Bugs**

When addressing logical errors, set breakpoints at critical points where key values change or computations occur. This allows you to observe how the data evolves and pinpoint potential issues. Remember, the debugger won't activate unless an exception is thrown or you manually define breakpoints. By stepping through the code, you can systematically trace the program's flow and identify discrepancies.

(i) Bugs in the Function:
- **Bug 1**: The variable `date` is assigned as `date = date_time_str`, which includes the time component. This prevents correct grouping by date.
- **Bug 2**: When checking for the highest sales day, the code compares individual `sales` amounts instead of the total sales per day. This means it identifies the transaction with the highest amount, not the day with the highest total sales.
- **Bug 3**: The average sales per day is calculated as `total_sales / len(sales_per_day)`, but if the date extraction is incorrect, the number of days may be wrong or inflated due to time components.
- **Bug 4**: The program does not handle exceptions that may arise from incorrect date formats or parsing errors.

In [None]:
#(ii) Fixed Function:

from datetime import datetime

def process_sales_data(filename):
    # Initialize variables to track total sales, daily sales, and highest sales
    total_sales = 0.0
    sales_per_day = {}  # Dictionary to store sales grouped by date
    highest_sales = 0.0  # Variable to track the highest sales value for a single day
    highest_day = ''  # Variable to track the date with the highest sales
    sales_count = 0  # Counter for the total number of valid sales records processed

    # Open the file containing sales data
    with open(filename, 'r') as file:
        for line in file:
            line = line.strip()  # Remove any leading/trailing whitespace from the line
            if not line:
                continue  # Skip empty lines
            try:
                # Split the line into date-time and sales values
                date_time_str, sales = line.split(',')
                sales = float(sales)  # Convert the sales value to a float
                # Parse the date-time string into a datetime object
                date_time = datetime.strptime(date_time_str, '%Y-%m-%d %H:%M:%S')
                date = date_time.date()  # Extract the date (ignoring time)
            except ValueError:
                # Print a warning for lines that can't be processed and skip them
                print(f"Skipping invalid line: {line}")
                continue

            # Accumulate total sales and count the record
            total_sales += sales
            sales_count += 1
            # Add the sales value to the corresponding date in the dictionary
            if date not in sales_per_day:
                sales_per_day[date] = sales
            else:
                sales_per_day[date] += sales

    # Find the day with the highest total sales
    for date, daily_total in sales_per_day.items():
        if daily_total > highest_sales:
            highest_sales = daily_total
            highest_day = date

    # Calculate average sales per day, guarding against division by zero
    if sales_per_day:
        average_sales_per_day = total_sales / len(sales_per_day)
    else:
        average_sales_per_day = 0.0

    # Return a tuple with total sales, average sales per day, and the day with the highest sales
    return total_sales, average_sales_per_day, highest_day


# Sample usage
total, average_per_day, best_day = process_sales_data('sales_data.txt')
print(f"Total Sales: ${total}")
print(f"Average Daily Sales: ${average_per_day}")
print(f"Highest Sales Day: {best_day}")

### Exercise 5: Debugging an Experimental Data Analysis Pipeline

This exercise introduces you to a small experimental data analysis workflow.  
The corresponding code and data are provided in the `tutorial5_debugging` directory.

Two CSV files contain information about study participants and their pre- and post-experiment scores.  
The code loads, cleans, merges, and aggregates the data.

A key quantity in this exercise is the **improvement index**, intended to measure relative improvement from pre- to post-score:

$$
\text{improvement index} = \frac{\text{post\_score} - \text{pre\_score}}{\text{pre\_score}},
$$

where a positive value indicates improvement and a negative value indicates a decline.

After running the program, you will notice that the summary statistics appear **inconsistent** with the raw data.  
Your task is to **debug the analysis**, identify the source of the inconsistency, and fix the underlying issue.


**Solution**

After running the code, the summary statistics show positive mean changes in raw scores but **negative mean improvement indices** for both groups. Because the program runs without errors, the issue must come from the **logic inside one of the functions**. Using the debugger helps you trace where the incorrect values are produced.

1. **Set a breakpoint in `main.py`**

   Place a breakpoint **before** the index is created:

   ```python
   index = index_measurements_by_participant(measurements)
   ```

   Start debugging (PyCharm or VS Code). You can now:

   * Step through `index_measurements_by_participant`.
   * Step over to the line

     ```python
     improvements = compute_improvements(participants, index)
     ```

   You may also set a **conditional breakpoint**, e.g.:

   * `post_score > pre_score` but `improvement < 0`.

   This immediately highlights suspicious cases.

2. **Step into `compute_improvements` (in `stats.py`)**

   Use *Step Into* when the debugger reaches the function call.
   Inspect the values for each participant:

   * `pre_score`
   * `post_score`
   * calculated `improvement`

   You will find cases where the post-score is larger than the pre-score, yet the computed improvement is **negative**.
   This reveals that the formula in the function is incorrect.

3. **Locate and correct the bug**

   In `stats.py` the function currently computes:

   ```python
   improvement = pre_score - post_score
   ```

   This produces negative values whenever scores increase.
   The correct direction of change is:

   ```python
   improvement = post_score - pre_score
   ```

4. **Verify the fix**

   Run the debugger again:

   * Step into `compute_improvements` and check a few participants.
   * Confirm that whenever `post_score > pre_score`, the improvement is now positive.
   * Let the program finish and review the updated summary statistics.

### Exercise 6: Git and GitHub

1. In our `python_course` repository, check out a new branch on your computer called `feature/<some_creative_name>`.
2. Create a new file called `git_exercise.py` which contains a small function that prints "Hello Git!" to the console.
3. Commit and push the changes to the remote repository.
4. On GitHub, create a pull request to merge your branch into the `main` branch (you should only create the pull request, but not actually merge the branches).


### Exercise 7: GitHub Copilot Installation

1. Sign up for the free student plan of GitHub Copilot, e.g., on [GitHub Education](https://github.com/education/students).
2. Install the GitHub Copilot extension in Visual Studio Code and the GitHub Copilot plugin in PyCharm.
3. Test the extension/plugin by writing a few lines of code in both editors.

### Exercise 8: Learn Git Branching (Optional)

For anyone who would like to dive deeper into the possibilities of Git, I recommend doing the tutorial at [https://learngitbranching.js.org/](https://learngitbranching.js.org/).\
It is a great option to learn Git interactively and in a fun way.
