# Session 16: Practice - Data Manipulation

Time to put your filtering and aggregation skills to the test! In this practice session, you'll work with a realistic e-commerce dataset and answer business questions using Pandas.

## Instructions

- Complete each exercise in the provided code cells
- Run your code to verify it works
- Expected outputs are provided to help you verify your solutions
- Try to solve exercises without looking at hints first!

In [None]:
# Setup: Run this cell first!
import pandas as pd

# Create our practice dataset: Online Store Orders
orders = pd.DataFrame({
    'order_id': range(1001, 1026),
    'customer_id': ['C001', 'C002', 'C003', 'C001', 'C004', 'C002', 'C005', 'C003', 'C001', 'C006',
                    'C004', 'C007', 'C002', 'C008', 'C005', 'C001', 'C009', 'C003', 'C010', 'C006',
                    'C007', 'C002', 'C008', 'C004', 'C009'],
    'product': ['Laptop', 'Phone', 'Tablet', 'Headphones', 'Laptop', 'Watch', 'Phone', 'Laptop', 'Tablet',
                'Headphones', 'Watch', 'Phone', 'Tablet', 'Laptop', 'Headphones', 'Phone', 'Watch',
                'Headphones', 'Laptop', 'Tablet', 'Phone', 'Laptop', 'Watch', 'Tablet', 'Headphones'],
    'category': ['Computers', 'Mobile', 'Mobile', 'Audio', 'Computers', 'Wearable', 'Mobile', 'Computers',
                 'Mobile', 'Audio', 'Wearable', 'Mobile', 'Mobile', 'Computers', 'Audio', 'Mobile',
                 'Wearable', 'Audio', 'Computers', 'Mobile', 'Mobile', 'Computers', 'Wearable', 'Mobile', 'Audio'],
    'price': [1200, 800, 500, 150, 1350, 350, 750, 1100, 550, 180, 400, 850, 480, 1500, 200, 900,
              320, 160, 1400, 520, 780, 1250, 380, 490, 175],
    'quantity': [1, 2, 1, 3, 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 4, 1, 2, 3, 1, 1, 2, 1, 1, 2, 2],
    'date': pd.to_datetime(['2024-01-15', '2024-01-15', '2024-01-16', '2024-01-16', '2024-01-17',
                            '2024-01-17', '2024-01-18', '2024-01-18', '2024-01-18', '2024-01-19',
                            '2024-01-19', '2024-01-20', '2024-01-20', '2024-01-21', '2024-01-21',
                            '2024-01-22', '2024-01-22', '2024-01-23', '2024-01-23', '2024-01-24',
                            '2024-01-24', '2024-01-25', '2024-01-25', '2024-01-26', '2024-01-26']),
    'region': ['North', 'South', 'East', 'North', 'West', 'South', 'East', 'North', 'West', 'South',
               'East', 'North', 'West', 'South', 'East', 'North', 'West', 'South', 'East', 'North',
               'West', 'South', 'East', 'North', 'West'],
    'payment_method': ['Credit Card', 'PayPal', 'Credit Card', 'Debit Card', 'Credit Card', 'PayPal',
                       'Credit Card', 'Debit Card', 'PayPal', 'Credit Card', 'Debit Card', 'PayPal',
                       'Credit Card', 'Credit Card', 'Debit Card', 'PayPal', 'Credit Card', 'Debit Card',
                       'PayPal', 'Credit Card', 'Debit Card', 'Credit Card', 'PayPal', 'Debit Card', 'Credit Card']
})

# Add total column
orders['total'] = orders['price'] * orders['quantity']

print(f"Dataset: {len(orders)} orders")
orders.head(10)

In [None]:
# Quick overview of the data
orders.info()

---
## Exercise 1: Basic Column Selection

Select only the `order_id`, `product`, and `total` columns and display the first 5 rows.

In [None]:
# Your code here


---
## Exercise 2: Row Selection with iloc

Select rows 5 through 9 (inclusive) using `iloc`.

In [None]:
# Your code here


---
## Exercise 3: Simple Filtering

Find all orders where the total is greater than $1000.

In [None]:
# Your code here


---
## Exercise 4: Filtering with Multiple Conditions

Find all orders from the 'Computers' category in the 'North' region.

In [None]:
# Your code here


---
## Exercise 5: Using isin()

Find all orders where the payment method is either 'Credit Card' or 'PayPal'.

In [None]:
# Your code here


---
## Exercise 6: Using query()

Use the `query()` method to find all orders where:
- price is between 500 and 1000 (inclusive)
- AND quantity is greater than 1

In [None]:
# Your code here


---
## Exercise 7: Sorting

Sort the orders by `total` in descending order and display the top 5 highest-value orders.

In [None]:
# Your code here


---
## Exercise 8: Basic Aggregation

Calculate:
1. The total revenue (sum of all totals)
2. The average order value
3. The total number of items sold (sum of quantity)

In [None]:
# Your code here


---
## Exercise 9: Value Counts

Find how many orders were placed for each product. Which product has the most orders?

In [None]:
# Your code here


---
## Exercise 10: GroupBy - Total Revenue by Category

Calculate the total revenue for each product category. Sort by revenue descending.

In [None]:
# Your code here


---
## Exercise 11: GroupBy - Average by Region

Calculate the average order value for each region. Which region has the highest average?

In [None]:
# Your code here


---
## Exercise 12: GroupBy with Multiple Columns

Calculate the total revenue by category AND region. Reset the index to get a regular DataFrame.

In [None]:
# Your code here


---
## Exercise 13: Multiple Aggregations

For each product category, calculate:
- Total revenue
- Average price
- Number of orders
- Total quantity sold

Use named aggregations for clean column names.

In [None]:
# Your code here


---
## Exercise 14: Apply Function

Create a new column called `order_size` that categorizes orders based on their total:
- 'Small' if total < 500
- 'Medium' if total >= 500 and < 1000
- 'Large' if total >= 1000

Then count how many orders fall into each category.

In [None]:
# Your code here


---
## Exercise 15: Business Question - Top Customers

Find the top 5 customers by total spending. Show their customer_id and total amount spent.

In [None]:
# Your code here


---
## Bonus Exercise 1: Complex Filtering

Find all orders that meet ALL of these criteria:
- Category is 'Computers' OR 'Mobile'
- Total is at least $750
- Payment method is NOT 'Debit Card'

In [None]:
# Your code here


---
## Bonus Exercise 2: Daily Sales Analysis

Calculate the daily total revenue and find:
1. Which day had the highest revenue?
2. Which day had the most orders?

In [None]:
# Your code here


---
## Bonus Exercise 3: Payment Method Analysis

For each payment method, calculate:
- Number of orders
- Total revenue
- Average order value
- Percentage of total orders

Sort by total revenue descending.

In [None]:
# Your code here


---
## Summary

Excellent work! In this practice session, you applied:

- Column and row selection with `iloc` and `loc`
- Boolean filtering with single and multiple conditions
- The `isin()` method for filtering by multiple values
- The `query()` method for readable filtering
- Sorting with `sort_values()`
- Basic aggregations: `sum()`, `mean()`, `count()`
- `value_counts()` for frequency analysis
- `groupby()` for grouped aggregations
- Named aggregations with `agg()`
- The `apply()` function for custom transformations

### Key Takeaways

1. Boolean conditions must be in parentheses when combining with `&` or `|`
2. `query()` offers a cleaner syntax for complex filters
3. `groupby()` is essential for answering "by category" questions
4. Named aggregations make results much more readable
5. Always think about what you want the output to look like

### Next Session

We'll learn about combining DataFrames and handling data quality issues!