# Pandas Practice Problems for AI Applications

This notebook provides a series of practice problems using the Pandas library in Python. The problems are designed to cover essential data manipulation techniques relevant for working with AI, progressing from beginner to advanced levels.

## 1. Basic DataFrame Operations

In [1]:
import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, 30, 35, 40, 22],
    'Score': [85.5, 92.0, 88.0, 76.5, 91.0]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Score
0,Alice,25,85.5
1,Bob,30,92.0
2,Charlie,35,88.0
3,David,40,76.5
4,Eva,22,91.0


**Exercise 1.1:** Display the first three rows of the DataFrame.

In [2]:
df.head(3)

Unnamed: 0,Name,Age,Score
0,Alice,25,85.5
1,Bob,30,92.0
2,Charlie,35,88.0


**Exercise 1.2:** Get the summary statistics for the numeric columns.

In [3]:
df.describe()

Unnamed: 0,Age,Score
count,5.0,5.0
mean,30.4,86.6
std,7.300685,6.19879
min,22.0,76.5
25%,25.0,85.5
50%,30.0,88.0
75%,35.0,91.0
max,40.0,92.0


## 2. Data Cleaning and Preprocessing

In [15]:
# Introducing missing values and inconsistent data
data = {
    'Name': ['Alice', 'Bob', None, 'David', 'Eva'],
    'Age': [25, 30, None, 40, 22],
    'Score': [85.5, None, 88.0, 76.5, 91.0],
    'Department': ['Sales', 'Sales', 'HR', 'HR', 'Sales']
}

df_dirty = pd.DataFrame(data)
df_dirty

Unnamed: 0,Name,Age,Score,Department
0,Alice,25.0,85.5,Sales
1,Bob,30.0,,Sales
2,,,88.0,HR
3,David,40.0,76.5,HR
4,Eva,22.0,91.0,Sales


**Exercise 2.1:** Drop the rows where any value is missing.

In [14]:
df_dirty = df_dirty.dropna()
df_dirty

Unnamed: 0,Name,Age,Score,Department
0,Alice,25.0,85.5,Sales
3,David,40.0,76.5,HR
4,Eva,22.0,91.0,Sales


**Exercise 2.2:** Fill missing `Score` values with the column mean.

In [21]:
df_clean = df_dirty.copy()
df_clean["Score"] = df_dirty["Score"].fillna(value=df_dirty["Score"].mean())
df_clean

Unnamed: 0,Name,Age,Score,Department
0,Alice,25.0,85.5,Sales
1,Bob,30.0,85.25,Sales
2,,,88.0,HR
3,David,40.0,76.5,HR
4,Eva,22.0,91.0,Sales


## 3. Grouping and Aggregation

**Exercise 3.1:** Group by `Department` and calculate the average `Age` and `Score`.

In [24]:
df_by_Department = df_clean.copy()
df_by_Department = df_by_Department.groupby("Department")[['Age', 'Score']].mean()
df_by_Department

Unnamed: 0_level_0,Age,Score
Department,Unnamed: 1_level_1,Unnamed: 2_level_1
HR,40.0,82.25
Sales,25.666667,87.25


## 4. Merging and Joining

In [26]:
# Sample datasets
df1 = pd.DataFrame({
    'EmployeeID': [1, 2, 3],
    'Name': ['Alice', 'Bob', 'Charlie']
})

df2 = pd.DataFrame({
    'EmployeeID': [1, 2, 4],
    'Salary': [70000, 80000, 65000]
})

**Exercise 4.1:** Perform an inner join on `EmployeeID`.

In [31]:
joined_df = df1.join(df2.set_index('EmployeeID'), how='inner', on='EmployeeID')
joined_df

Unnamed: 0,EmployeeID,Name,Salary
0,1,Alice,70000
1,2,Bob,80000


## 5. Working with Time Series

In [32]:
# Sample time series data
dates = pd.date_range(start='2023-01-01', periods=6, freq='D')
values = [100, 110, 108, 115, 120, 125]

ts_df = pd.DataFrame({'Date': dates, 'Value': values})
ts_df.set_index('Date', inplace=True)
ts_df

Unnamed: 0_level_0,Value
Date,Unnamed: 1_level_1
2023-01-01,100
2023-01-02,110
2023-01-03,108
2023-01-04,115
2023-01-05,120
2023-01-06,125


**Exercise 5.1:** Calculate the rolling mean with a window size of 3.

In [35]:
rolling_mean = ts_df.rolling(window=3).mean()
rolling_mean

Unnamed: 0_level_0,Value
Date,Unnamed: 1_level_1
2023-01-01,
2023-01-02,
2023-01-03,106.0
2023-01-04,111.0
2023-01-05,114.333333
2023-01-06,120.0


## 6. AI-Relevant Applications

**Exercise 6.1:** Given a dataset of feature values and target labels, split the features and target into two separate DataFrames.

In [36]:
# Simulated dataset
data = {
    'feature1': [0.2, 0.4, 0.1, 0.5],
    'feature2': [1.2, 1.5, 1.1, 1.3],
    'label': [0, 1, 0, 1]
}

df_ai = pd.DataFrame(data)
df_ai

Unnamed: 0,feature1,feature2,label
0,0.2,1.2,0
1,0.4,1.5,1
2,0.1,1.1,0
3,0.5,1.3,1


In [41]:
x = df_ai[['feature1', 'feature2']]
y = df_ai[['label']]
x

Unnamed: 0,feature1,feature2
0,0.2,1.2
1,0.4,1.5
2,0.1,1.1
3,0.5,1.3


**Challenge:** Given a large DataFrame of user interaction logs, perform the following:
- Parse timestamps and set them as the index.
- Filter interactions from the past 30 days.
- Compute average interactions per user.