# Pandas Practice Exercise: Analyzing Olympic Athlete Data and Coffee Sales

Welcome to your practice exercise! This is designed to reinforce the Pandas concepts covered in the lesson, including creating DataFrames, exploring data (head/tail/info/describe), filtering, sorting, grouping, merging, adding columns, applying functions (lambda and custom def), handling dates, cleaning data, and basic aggregations.

You'll work with two datasets:
- **bios.csv**: Contains biographical data on Olympic athletes (e.g., name, born_date, born_city, height_cm, weight_kg, etc.).
- **coffee.csv**: Contains simple coffee sales data (e.g., Day, Coffee Type, Units Sold).

**Instructions**:
- Use Pandas to complete each task.
- Write your code in the provided code cells.
- For each task, display the result (e.g., using `head()` or printing the output).
- If a task involves creating or modifying a DataFrame, make a copy of the original to avoid overwriting data.
- Bonus: Use comments in your code to explain what each step does.
- Run all cells and ensure no errors.
- Export your notebook as PDF or HTML for submission.
- Share any insights or challenges you faced in the comments.

**Setup Code** (Run this first):

In [None]:
import pandas as pd
import numpy as np

# Load the datasets
bios = pd.read_csv('bios.csv')
coffee = pd.read_csv('coffee.csv')

## Part 1: Basic Data Exploration (Using bios.csv)

**1. Display the first 10 rows** of the `bios` DataFrame using `head()`. Then, display the last 5 rows using `tail()`.

In [None]:
# Your code here


**2. Get dataset info**: Use `info()` to show the structure of `bios`. How many rows and columns are there? What are the data types? (Answer in a comment.)

In [None]:
# Your code here


**3. Summary statistics**: Use `describe()` on the numerical columns (e.g., height_cm, weight_kg). What is the average height of the athletes? (Answer in a comment.)

In [None]:
# Your code here


**4. Columns and Index**: Print the list of columns in `bios`. Convert the index to a list and print the first 5 index values.

In [None]:
# Your code here


## Part 2: Filtering and Sorting (Using bios.csv)

**5. Filter rows**: Create a new DataFrame containing only athletes from France (NOC == 'FRA'). How many such athletes are there? (Use `value_counts()` on the 'NOC' column to verify.)

In [None]:
# Your code here


**6. Multiple conditions**: Filter for athletes who are taller than 180 cm and weigh more than 80 kg. Sort this filtered DataFrame by `height_cm` in descending order. Display the top 5.

In [None]:
# Your code here


**7. Access specific data**: Using `loc`, select the 'name', 'height_cm', and 'weight_kg' for athletes with `athlete_id` between 10 and 20 (inclusive). Then, using `iloc`, select the first 5 rows and first 3 columns of the original `bios`.

In [None]:
# Your code here


## Part 3: Adding Columns and Applying Functions (Using bios.csv)

**8. Handle dates**: Convert the 'born_date' column to datetime format. Add a new column 'born_year' by extracting the year from 'born_date' (use `.dt.year`).

In [None]:
# Your code here


**9. Clean data**: Some heights might have 'cm' appended (check the data). If present, remove 'cm' from 'height_cm' and convert it to float. Fill any missing values in 'height_cm' and 'weight_kg' with the column mean.

In [None]:
# Your code here


**10. Lambda function**: Add a column 'height_category' using a lambda function:
- 'Short' if height_cm < 170
- 'Average' if 170 <= height_cm <= 180
- 'Tall' if height_cm > 180
- 'Unknown' if NaN
Display the value counts for this new column.

In [None]:
# Your code here


**11. Custom def function**: Define a function `bmi_category` that takes a row and returns:
- 'Underweight' if weight_kg / (height_cm/100)^2 < 18.5
- 'Normal' if 18.5 <= BMI < 25
- 'Overweight' if BMI >= 25
- 'Invalid' if height or weight is NaN
Apply this function to each row (use `apply` with axis=1) and add a 'bmi_category' column. Display the first 10 rows of 'name', 'height_cm', 'weight_kg', and 'bmi_category'.

In [None]:
# Your code here


## Part 4: Grouping and Aggregating (Using bios.csv and coffee.csv)

**12. Group by NOC**: Group `bios` by 'NOC' and calculate the mean `height_cm` and median `weight_kg` for each country. Sort by mean height descending and display the top 10 countries.

In [None]:
# Your code here


**13. Aggregate sales**: Using `coffee`, group by 'Day' and calculate the total 'Units Sold'. Which day had the highest sales? (Answer in a comment.)

In [None]:
# Your code here


**14. Pivot table (Bonus)**: Create a pivot table from `coffee` with 'Day' as index, 'Coffee Type' as columns, and sum of 'Units Sold' as values.

In [None]:
# Your code here


## Part 5: Merging DataFrames (Using both CSVs)

**15. Simulate a merge**: Create a small `results` DataFrame as shown below. Merge it with `bios` on 'athlete_id' using an inner join. Display the merged DataFrame.

```python
results_data = {
    'athlete_id': [1, 2, 3, 4, 5],
    'medal': ['Gold', 'Silver', None, 'Bronze', 'Gold'],
    'event': ['Tennis Singles', 'Tennis Doubles', 'Tennis Singles', 'Tennis Doubles', 'Fencing']
}
results = pd.DataFrame(results_data)
```

In [None]:
# Your code here


**16. Coffee enhancements**: Add a new column to `coffee` called 'Price' using a lambda: 3.5 if 'Espresso', 4.5 if 'Latte'. Then, add a 'Revenue' column (Units Sold * Price). Group by 'Coffee Type' and sum the 'Revenue'.

In [None]:
# Your code here


## Bonus (Optional)
Try plotting a histogram of athlete heights using `bios['height_cm'].hist()`. Add appropriate labels and title.

In [None]:
# Your code here


## Submission
- Run all cells and ensure no errors.
- Export your notebook as PDF or HTML.
- Share any insights or challenges you faced in a comment below.

In [None]:
# Your comments here
