# Week 4 - Pie Charts and Conditional Logic

## Goal

Create a Pie Chart to show the distribution of runs by Time of Day.

## Imports

In [1]:
import pandas as pd
file_id = "1ymbNqfv9s6YGZzN93HFKAhjg0Z5xZXV1"
url = f"https://drive.google.com/uc?id={file_id}"
df = pd.read_csv(url)

This week, the goal is to investigate the frequency of morning (AM) and afternoon (PM) runs.

### 1. Create a new DataFrame with `Date` and `AMPM` columns

To do this, we extract:
- the date of each run, and
- whether the run occurred in the morning (AM) or afternoon (PM).

You may find the pandas datetime formatting method below useful:

```
.dt.strftime("%p")
```
This method converts a datetime value into a string representing AM or PM.

For more information, see:  
https://www.geeksforgeeks.org/python/python-strftime-function/

In [5]:
#Your code here

Unnamed: 0,Date,AMPM
0,2025-07-21,PM
1,2025-07-19,AM
2,2025-07-16,PM
3,2025-07-15,AM
4,2025-07-12,AM


### 2. Group by day

It is possible to have multiple runs on the same day.  
If runs occur in both the morning and the afternoon, the `AMPM` column should be labelled as `Both`.

We begin by grouping the data by `Date` and applying `.unique()` to the `AMPM` column.

The `unique()` method returns an array of all distinct values for each day, for example:

- `['AM']`
- `['PM']`
- `['AM', 'PM']`

This allows us to determine whether a day contains only morning runs, only afternoon runs, or both.

In [9]:
#Your code here

Unnamed: 0_level_0,AMPM
Date,Unnamed: 1_level_1
2011-01-05,[AM]
2011-01-10,[AM]
2011-01-17,[AM]
2011-01-21,[AM]
2011-01-27,[AM]


### 3 Create a function to check array length

At this stage, we need a way to convert the arrays returned by `.unique()` into a single label.

The logic is:

- If the `AMPM` array contains more than one value (i.e. both `AM` and `PM`), label the day as `Both`
- Otherwise, keep the existing value (`AM` or `PM`)

This logic can be implemented using a simple function that checks the length of the array.

I've provided a testing template below.

In [10]:
example = ["AM", "PM"]  # Example list (same structure as the array returned by .unique())

def label_ampm(x):
    # Check if the list contains both AM and PM
    #if __________: #your condition
    #    return __________
    #else:
    #    return x[0]  # [0] refers to the first (and only) item in the list

# label_ampm(example)

'Both'

### 4 Apply the function

Once the function has been defined, we can apply it to the grouped data.

Pandas provides the `.apply()` method, which allows a function to be applied to each value in a column.

In this case, `.apply()` will pass each array of `AMPM` values (for each day) into the function and return a single label.

For more information, see:  
https://www.w3schools.com/python/pandas/ref_df_apply.asp


In [11]:
#Your code here

Unnamed: 0_level_0,AMPM
Date,Unnamed: 1_level_1
2011-01-05,AM
2011-01-10,AM
2011-01-17,AM
2011-01-21,AM
2011-01-27,AM


### 5. Create counts

To create the pie chart, we need a table containing the count for each category.

For example:

| Category | Count |
|----------|-------|
| AM       | 0     |
| PM       | 0     |

> *(Obviously mornings are for sleeping, and running is well too much effort!)*

> Hint
>
> You may find the pandas method `.value_counts()` useful for this step.

In [13]:
#Your code here

###6. Save and check

In [16]:
daily_summary.to_csv("daily_summary.csv", index=False)

### Solutions

In [14]:
def label_ampm(x):
  if len(x) == 2: #checks if condition is true. ie the size of the list is equivalent '==' to '2'.
    return "Both"
  else:
    return x[0] #[0] refers to the first item in the list.

daily_summary = pd.DataFrame()
daily_summary["Date"] = pd.to_datetime(df["start_date"]).dt.date
daily_summary["AMPM"] = pd.to_datetime(df["start_date"]).dt.strftime("%p") #creates a column with AM/PM
daily_summary = (
  daily_summary
  .groupby("Date")["AMPM"]
  .unique()
  .apply(label_ampm)
  .value_counts()
  .reset_index()
)
daily_summary.head()

Unnamed: 0,AMPM,count
0,AM,491
1,PM,421
2,Both,7


In [15]:
#Solution 2
#With a lambda function
daily_summary = pd.DataFrame()
daily_summary["Date"] = pd.to_datetime(df["start_date"]).dt.date
daily_summary["AMPM"] = pd.to_datetime(df["start_date"]).dt.strftime("%p") #creates a column with AM/PM
daily_summary = (
  daily_summary
  .groupby("Date")["AMPM"]
  .unique()
  .apply(lambda x: "Both" if len(x) == 2 else list(x)[0])
  .value_counts()
  .reset_index()
)
daily_summary.head()

Unnamed: 0,AMPM,count
0,AM,491
1,PM,421
2,Both,7
