# Week 9 Lab: Lists, Dictionaries, and Panda
This week’s lab gives you practical experience with data analysis in Python.

You will:
- Traverse lists using for loops and the accumulator pattern
- Use dictionaries to represent structured data and practice common iteration patterns
- Load, access, and explore data using pandas DataFrames

**Instructions**
- Work through the problems in order.
- Write tests where indicated and run them to verify your progress.


#### Run the cell below once to set up the test environment.

In [1]:
import piplite
await piplite.install(["pytest", "ipytest"])

import ipytest
ipytest.autoconfig()

## Problem 1: Movie Ratings Dashboard 
**Focus:** Lists, loops, accumulator pattern, dictionaries

You are designing a simple analytics utility for a movie review site.

### Task 1.1 – Summing and Averaging Ratings
Implement `average_rating(ratings)` to compute the mean rating (return `0.0` for empty lists). Use a **loop + accumulator**. (avoid using sum()/len() directly for practice). 

**Write some test cases.**

In [2]:
# Implement using a loop + accumulator

def average_rating(ratings: list[float]) -> float:
    # TODO: replace the placeholder implementation below

    pass


In [3]:
%%ipytest -qq
# Your test cases here





### Use the Below Movie Ratings Dictionary for the Next Two Tasks

In [4]:
movie_ratings = {
    "Inception": [5, 4, 5, 5, 4],
    "Avatar": [4, 3, 4, 4],
    "Titanic": [5, 5, 4, 5],
    "Joker": [3, 3.5, 4]
}
movie_ratings

{'Inception': [5, 4, 5, 5, 4],
 'Avatar': [4, 3, 4, 4],
 'Titanic': [5, 5, 4, 5],
 'Joker': [3, 3.5, 4]}

### Task 1.2 – Compute Average Ratings per Movie
Implement `print_movie_averages(movies)` that iterates and prints each movie with its average rating using your function from Task 1.1.

In [5]:
def print_movie_averages(movies: dict[str, list[float]]) -> None:
    # TODO: iterate over items and print each movie with its average rating. 
    # You can use an f-string to format your printed output: f"{title}: {avg:.2f}"
    
    pass

# Call your function to preview


### Task 1.3 – Reverse Engineering a Function
Implement a function `filter_by_threshold(movies, threshold)` so that all tests in the next cell pass.

In [6]:
def filter_by_threshold(movies: dict[str, list[float]], threshold: float) -> list[str]:

    pass

In [7]:
%%ipytest -qq

def test_filter_by_threshold():
    assert filter_by_threshold(movie_ratings, 4.0) == ['Inception', 'Titanic']
    assert filter_by_threshold(movie_ratings, 4.6) == ['Titanic']
    assert filter_by_threshold(movie_ratings, 3.0) == ['Inception', 'Avatar', 'Titanic', 'Joker']


[31mF[0m[31m                                                                                            [100%][0m
[31m[1m_____________________________________ test_filter_by_threshold _____________________________________[0m

    [0m[94mdef[39;49;00m [92mtest_filter_by_threshold[39;49;00m():[90m[39;49;00m
>       [94massert[39;49;00m filter_by_threshold(movie_ratings, [94m4.0[39;49;00m) == [[33m'[39;49;00m[33mInception[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mTitanic[39;49;00m[33m'[39;49;00m][90m[39;49;00m
[1m[31mE       AssertionError: assert None == ['Inception', 'Titanic'][0m
[1m[31mE        +  where None = filter_by_threshold({'Avatar': [4, 3, 4, 4], 'Inception': [5, 4, 5, 5, 4], 'Joker': [3, 3.5, 4], 'Titanic': [5, 5, 4, 5]}, 4.0)[0m

[1m[31m<ipython-input-7-0a2ba548cb9f>[0m:2: AssertionError
[31mFAILED[0m t_4d7784b8795443b8b6999c2f0e6b5450.py::[1mtest_filter_by_threshold[0m - AssertionError: assert None == ['Inception', 'Titanic']


---
## Problem 2: Pandas (Rows, Columns, Basic Analysis)

You will practice **exactly** the core operations from the lecture:
- `pd.read_csv`
- `head()` / `tail()`
- Row access with `iloc` (including slicing)
- Column access with `orders['column']`
- Series operations: `.mean()`, `.sum()`, `.unique()`

Dataset: **`retail_orders.csv`** (coffee shop sales)

**Columns:** `order_id, date, branch, item, size, quantity, unit_price, order_type, payment_method`


### Load the data

Use `pd.read_csv` and preview the first few rows.

In [13]:
import pandas as pd
orders = pd.read_csv('retail_orders.csv')
orders.head()

Unnamed: 0,order_id,date,branch,item,size,quantity,unit_price,order_type,payment_method
0,1001,2023-07-01,Riverside,Sandwich,S,1,4.5,dine-in,card
1,1002,2023-07-01,Riverside,Cappuccino,S,4,3.0,dine-in,card
2,1003,2023-07-01,City Centre,Tea,M,1,2.2,takeout,app
3,1004,2023-07-01,University,Latte,L,3,4.3,dine-in,app
4,1005,2023-07-01,City Centre,Tea,M,1,2.2,takeout,card


### Task 2.1 - Row and Column access
1. Show rows **5 to 9** (remember slicing excludes the end index).
2. Show every **10th** row starting at 0.
3. Show the **last row** using negative indexing with `iloc`.
4. Get the `quantity` column as a Series and show the first 8 values.
5. Get the **unique** values of `order_type`.

In [14]:
# TODO: 1 Rows 5 to 9


In [15]:
# TODO: 2 Every 10th row


In [16]:
# TODO: 3 Last row


In [17]:
# TODO: 4 Quantity first 8


In [18]:
# TODO: 5 Unique order_type


### Task 2.2 - Create a derived column
Create `total_price = quantity * unit_price` using simple arithmetic. Then preview with `head()`.

In [19]:
# TODO: Create total_price column then preview


### Task 2.3 - Basic analyses using column selection and Series methods
Use `Boolean` filters inside []. Also use Series methods: `.sum()`/`.mean()`/`.unique()`:<br>
- **Total quantity** sold for the item `'Latte'` (e.g., `orders[orders['item'] == 'Latte']['quantity'].sum()`).<br>
    - `orders['item'] == 'Latte'`: This expression creates a Boolean Series (a list of True/False values), one per row.
    - `orders[orders['item'] == 'Latte']`: This uses Boolean indexing to filter the DataFrame, keeping only the rows where the condition is True.
    - `['quantity']`: From that filtered DataFrame, you now select just the 'quantity' column.
- **Average quantity** for orders with `order_type == 'takeout'`.<br>
- **Unique** items sold at the `'University'` branch.

**Windowed comparisons:** 
<br>
- For rows **0 – 24**, compute the **mean quantity**.<br>
- For rows **25 – 49**, compute the **mean quantity**.<br>
- Which window has the higher mean? 

In [20]:
# TODO: Total quantity for Latte 
orders[orders['item'] == 'Latte']['quantity'].sum()

np.int64(106)

In [21]:
# TODO: Mean quantity for takeout


In [22]:
# TODO: Unique items at University


In [23]:
# TODO: Compute window means and compare


### Task 2.4 - Total Quantity
Compute the **total quantity** for each of these items:
`['Latte', 'Espresso', 'Cappuccino']` by summing filtered Series.
Then figure out which of the three has the highest total.

**Instructions:**
- Write a function that: 
    - Takes the given DataFrame (`item_df`) containing an 'item' column.
    - Loops through each item name in that column.
    - Filters the main orders DataFrame to select only the rows where the 'item' matches.
    - Stores each result in a dictionary where the key is the item name and the value is its total quantity.
        - (*Hint*) Uses ['quantity'].sum() to compute the total quantity
    - Returns that dictionary.

In [24]:
# TODO: Write a function to compute totals and find the max
items = {'item': ['Latte','Espresso','Cappuccino']}
item_df = pd.DataFrame(items)

