# Week 9 Lecture 2
## pandas
- [pandas](https://pandas.pydata.org/) is a Python library for working with DataFrames, the Pyret equivalent of a Table

In [1]:
import pandas as pd

A Pyret table:
```arr
orders = table: date, dish, quantity, order_type
  row: "2023-07-01", "Pasta", 2, "dine-in"
  row: "2023-07-01", "Salad", 1, "takeout"
end
```
- pandas DataFrame

In [2]:
data = {
    'date': ['2023-07-01', '2023-07-01', '2023-07-02'],
    'dish': ['Pasta', 'Salad', 'Burger'],
    'quantity': [2, 1, 3],
    'order_type': ['dine-in', 'takeout', 'dine-in']
}

orders = pd.DataFrame(data)

## Loading and Accessing Data
- pandas provides the `read_csv` method for loading CSV files


Loading in Pyret
```arr
orders = load-table: date, dish, quantity, order_type
  source: csv-table-file("orders.csv", default-options)
end
```


In [3]:
orders = pd.read_csv("orders.csv")

- You can view the first five rows with the `head()` method and the last five with `tail()`

In [None]:
orders.head()

In [None]:
orders.tail()

- Rows can be accessed using the `iloc` accessor and square bracket notation for row numbers

Pyret way:
```arr
orders.row-n(1)["dish"]
```

In [None]:
orders.iloc[1]

In [None]:
orders.iloc[1]["dish"]

- Extracting Columns as Lists

Pyret way:
```arr
quantities = orders.get-column("quantity")
```

In [5]:
quantities = orders['quantity']

- There are methods for computing statistics from a columns

Pyret way:
```arr
mean(orders, "quantity")    # Direct table operation
sum(orders, "quantity")     # Direct table operation
```

In [None]:
orders['quantity'].mean() 

In [None]:
orders['quantity'].sum()

- You can get a Series of unique values using the [unique](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.unique.html#pandas.Series.unique) method

In [None]:
orders['order_type'].unique()

- We can get `value_counts` of columns with categorical values

In [None]:
orders['order_type'].value_counts()

## Filtering
pandas uses boolean based filtering to select values from a Series

In [None]:
orders['order_type'] == "dine-in"

A Series of booleans can be used to select from the original DataFrame

In [None]:
orders[orders['order_type'] == "dine-in"]

They can be combined using the `&` operator

In [None]:
orders[(orders['order_type'] == "dine-in") & (orders['dish'] == "Pasta")]

We can use the `.isn` to check membership of a list

In [None]:
orders['dish'].isin(["Burger", "Chips"])

## Class Exercises
### Creating and Loading DataFrames
- Create a DataFrame manually with `workouts` data `activity` and `duration`. Make at least 5 rows.

In [None]:
# Write your code here

- Load the CSV from `photos.csv` into a DataFrame. Print the first 5 rows.

In [None]:
# Write your code here

### Accessing Data
- Get the second row from your `workouts` DataFrame (remember: Python uses 0-based indexing).

In [None]:
# Write your code here

- Extract the `activity` column and print all unique activity names.

In [None]:
# Write your code here

- Get the duration value from the third workout (combining row and column access).

In [6]:
# Write your code here

- What happens if you try to access a row that doesn't exist? Try it and note the error.


In [None]:
# Write your code here

- What happens if you try to access a column that doesn't exist? Try it and note the error.


In [7]:
# Write your code here

### Extracting Columns & Statistics
- Extract the `duration` column from your `workouts` DataFrame and store it in a variable called `durations`.

In [8]:
# Write your code here

- Work with the `durations` Series to find: `.mean()`, `.sum()`, `.max()`, `.min()`.

In [9]:
# Write your code here

- Calculate the `range` (difference between `max` and `min`) of workout durations.

In [10]:
# Write your code here

- Get the value counts of the `Subject` from the `photos` DataFrame
- Calculate its median number of photos per subject using `.median()`

In [None]:
# Write your code here