# Introduction to Pandas (Exercises)

_This notebook provides exercises to practice basic Pandas concepts and syntax. Exercises are designed to be completed in approximately 90 minutes by students who have some familiarity with the topics._

Note: Exercises in this Jupyter Notebook was originally compiled by Alex Reppel (AR) based on conversations with [ClaudeAI](https://claude.ai/) *(version 3.5 Sonnet)*. For this year's materials, further revisions were made using [Claude Code](https://www.anthropic.com/claude-code) *(Opus 4.1)*, including updated documentation and git commit messages.

In [None]:
import pandas as pd
import numpy as np

## Pandas Series

### Exercise 1

Create a Pandas Series from a list of five fruits. Then, create another Series with the same fruits but with custom numeric indices starting from 1. Print both Series.

In [None]:
# Series with default index
fruits = pd.Series(["Apple", "Banana", "Cherry", "Date", "Elderberry"])

print("Series with default index:")
print(fruits)

# Series with custom index
# fruits_custom =  # add code

print("\nSeries with custom index:")
# print(fruits_custom)

### Exercise 2

Given the Series `s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])`, perform the following operations:

a) Select the value at index 'c'
b) Select the values at indices 'b', 'd', and 'e'
c) Create a new Series containing only the values greater than 25

In [None]:
s = pd.Series([10, 20, 30, 40, 50], index=["a", "b", "c", "d", "e"])

# a) Select the value at index "c"
# print("Value at index 'c':", )  # add code

# b) Select the values at indices "b", "d", and "e"
print("Values at indices 'b', 'd', and 'e':")
# add code

# c) Create a new Series containing only the values greater than 25
# new_s =  # add code

print("Values greater than 25:")
# print(new_s)

## DataFrame

### Exercise 3

Create a DataFrame representing students with columns 'Name', 'Age', and 'Grade'. Include at least 5 students. Then, perform the following tasks:

- Display the first 3 rows of the DataFrame
- Calculate the average age of the students
- Display only the students with a grade above 80

In [None]:
# Create DataFrame
data = {
    "Name": ["Alice", "Bob", "Carol", "Dan", "Eve"],
    "Age": [18, 19, 18, 20, 19],
    "Grade": [85, 92, 78, 95, 88]
}
df = pd.DataFrame(data)

# a) Display the first 3 rows
print("First 3 rows:")
# print(df.head(3))  # add code

# b) Calculate average age
# average_age =  # add code
# print(f"Average age: {average_age:.2f}")

# c) Display students with grade above 80
# high_grades =  # add code
print("Students with grade above 80:")
# print(high_grades)

## File I/O

In [None]:
import os
os.makedirs("assets", exist_ok = True)

### Exercise 4

Read the 'people.csv' file created in the demonstration.

Add a new column 'Salary' with random values between 50,000 and 100,000. Then, save the updated DataFrame back to a CSV file named 'people_with_salary.csv'.

In [None]:
import pandas as pd

# Create the CSV file
data = {
    "Name": ["Alice", "Bob", "Carol", "Dan"],
    "Age": [25, 30, 35, 28],
    "City": ["New York", "San Francisco", "London", "Sydney"],
    "Country": ["USA", "USA", "UK", "Australia"]
}

df = pd.DataFrame(data)
df.to_csv("assets/people.csv", index=False)
print("assets/people.csv has been created.")

In [None]:
# Read the CSV file
df = pd.read_csv("assets/people.csv")

# Add Salary column with random values
# Don't worry about using Numpy (np)
# It's important to know that it exists, but we won't be using it
df["Salary"] = np.random.randint(50000, 100001, size=len(df))

# Save the updated DataFrame
# df.to_csv("assets/people_with_salary.csv", index=False)  # add code

print("Updated DataFrame:")
print(df)
# print("Saved to 'assets/people_with_salary.csv'")

## Data analysis

### Exercise 5

Using the DataFrame from Exercise 4, perform the following tasks:

- Calculate and display the average salary
- Find and display the person with the highest salary
- Create a new DataFrame containing only people from the USA and display it

In [None]:
# Assuming we're continuing from Exercise 4, otherwise read the CSV again
# df = pd.read_csv("assets/people_with_salary.csv")

# a) Calculate and display average salary
# average_salary =  # add code

# print(f"Average salary: ${average_salary:.2f}")

# b) Find and display person with highest salary
# highest_paid =  # add code

print("Person with highest salary:")
# print(highest_paid)

# c) Create DataFrame with only USA people
# usa_people =  # add code

print("People from USA:")
# print(usa_people)