# <b>CaRM Module: Advanced Topics in Data Preparation Using Python (2024/2025)</b>
## <b>Exercises for Session 02 (Solved)</b>

### Exercise 1: Create a Boolean Column with .apply() method

Use the .apply() method to create a boolean column based on a condition.<br> 
Step1. Start creating a dataframe from the dictionary below.<br>
Step2. Define a function that returns True if the score is greater than or equal to 80, and False otherwise.<br>
Step3. Use the .apply() method to create a new boolean column 'Passed' based on the 'Score' column by applying the custom function.<br>
Step4. Display the updated DataFrame to verify the new 'Passed' column.<br>


In [1]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Score': [85, 90, 78, 88, 92]
}

df = pd.DataFrame(data)

def is80ormore(x):
    if x >= 80:
        return True
    else:
        return False
    
df['Passed'] = df['Score'].apply(is80ormore)

print(df)

      Name  Score  Passed
0    Alice     85    True
1      Bob     90    True
2  Charlie     78   False
3    David     88    True
4      Eva     92    True


### Exercise 2: Apply a Built-In Function with .apply() method

Apply a built-in function to transform a DataFrame column.<br>
Step1. Start creating a dataframe from the dictionary below.<br>
Step2. Use the .apply() method to convert the 'Age' column to a string format.<br>
Step3. Verify the transformation by checking the data type of the 'Age' column.<br>

In [7]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [24, 27, 22, 32, 29],
    'Score': [85, 90, 78, 88, 92]
}

df = pd.DataFrame(data)

print('Original data type of Age column:')
print(df['Age'].dtype)

df['Age'] = df['Age'].apply(str)

print('\nNew data type of Age column:')
print(df['Age'].dtype)

# Note 1: Pandas uses native Python strings, which require an object dtype.
# Note 2: Change the data type of columns with .astype() method. For example:
# df['Age'] = df['Age'].astype('str')

Original data type of Age column:
int64

New data type of Age column:
object


### Exercise 3: Apply a Custom Function with .apply() method

Apply a custom function to a DataFrame column.<br>
Step1. Start creating a dataframe from the dictionary below.<br>
Step2. Define a custom function that categorizes scores into grades. Grade A for scores equal or greater than 90; Grade C for scores less than 80; Grade B for intermediate scores.<br>
Step3. Use the .apply() method to create a new column 'Grade' based on the 'Score' column by applying the custom function.<br>
Step4. Display the updated DataFrame to verify the new 'Grade' column.<br>

In [8]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Score': [85, 90, 78, 88, 92]
}

df = pd.DataFrame(data)

def scores2grades(x):
    if x >= 90:
        return 'Grade A'
    elif x < 80:
        return 'Grade C'
    else:
        return 'Grade B'
    
df['Grade'] = df['Score'].apply(scores2grades)

print(df)

      Name  Score    Grade
0    Alice     85  Grade B
1      Bob     90  Grade A
2  Charlie     78  Grade C
3    David     88  Grade B
4      Eva     92  Grade A


### Exercise 4: Apply a Lambda Function with .apply() method

Use a lambda function with the .apply() method to modify a DataFrame column.<br>
Step1. Start creating a dataframe from the dictionary below.<br>
Step2. Use a lambda function with the .apply() method to calculate the discounted price for each product and create a new column 'Discounted_Price'.<br>
Step3. Display the updated DataFrame to verify the new 'Discounted_Price' column.<br>

In [None]:
import pandas as pd

data = {
    'Product': ['A', 'B', 'C', 'D', 'E'],
    'Price': [100, 200, 150, 300, 250],
    'Discount': [10, 20, 15, 30, 25]
}

### Exercise 5: Querying Based on Conditions

Use the .query() function to filter data and create a boolean column.<br>
Step1. Start creating a dataframe from the dictionary below.<br>
Step2. Use the .query() function to filter rows where the 'Score' is greater than 80.<br>
Step3. Create a boolean column 'High_Score' in the original DataFrame that is True if the 'Score' is greater than 80.<br>
Step4. Display the updated DataFrame to verify the new 'High_Score' column.

In [None]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [24, 27, 22, 32, 29],
    'Score': [85, 90, 78, 88, 92]
}

df = pd.DataFrame(data)

print('Original data:')
print(df)

df_filtered = df.query('Score > 80')
print('\nFiltered data (Scores greater than 80):')
print(df_filtered)

df['High_Score'] = df.query('Score > 80')['Score']
df['High_Score'] = df['High_Score'].notna() # This line sets NaNs to False

print('\nNew data:')
print(df)


Original data:
      Name  Age  Score
0    Alice   24     85
1      Bob   27     90
2  Charlie   22     78
3    David   32     88
4      Eva   29     92

Filtered data (Scores greater than 80):
    Name  Age  Score
0  Alice   24     85
1    Bob   27     90
3  David   32     88
4    Eva   29     92

New data:
      Name  Age  Score  High_Score
0    Alice   24     85        85.0
1      Bob   27     90        90.0
2  Charlie   22     78         NaN
3    David   32     88        88.0
4      Eva   29     92        92.0


### Exercise 6: Querying by Combining Multiple Conditions

Use the .query() function to filter data based on multiple conditions and create a boolean column.<br>
Step1. Start creating a dataframe from the dictionary below.<br>
Step2. Use the .query() function to filter rows where the 'Price' is greater than 150 and 'Quantity' is less than 40.<br>
Step3. Create a boolean column 'Pricey_Low_Stock' in the original DataFrame that is True if the 'Price' is greater than 150 and 'Quantity' is less than 40.<br>
Step4. Display the updated DataFrame to verify the new 'Pricey_Low_Stock' column.

In [None]:
import pandas as pd

data = {
    'Product': ['A', 'B', 'C', 'D', 'E'],
    'Price': [100, 200, 150, 300, 250],
    'Quantity': [30, 20, 50, 10, 40]
}

df = pd.DataFrame(data)

print('Original data:')
print(df)

df_filtered = df.query('Price > 150 and Quantity < 40')
print('\nFiltered data (Price greater than 150 and Quantity less than 40):')
print(df_filtered)

df['Pricey_Low_Stock'] = df.query('Price > 150 and Quantity < 40')['Product'] # It can be 'Price' or 'Quantity'
df['Pricey_Low_Stock'] = df['Pricey_Low_Stock'].notna() # This line sets NaNs to False

print('\nNew data:')
print(df)

# Note that when creating the new column with query, you have to refer to a specific column.
# In this case, it can be any column. The method is setting True to rows in the new column
# that are present in the column of reference.

Original data:
  Product  Price  Quantity
0       A    100        30
1       B    200        20
2       C    150        50
3       D    300        10
4       E    250        40

Filtered data (Price greater than 150 and Quantity less than 40):
  Product  Price  Quantity
1       B    200        20
3       D    300        10

New data:
  Product  Price  Quantity  Pricey_Low_Stock
0       A    100        30             False
1       B    200        20              True
2       C    150        50             False
3       D    300        10              True
4       E    250        40             False


### Exercise 7: Using .query() with String Conditions

Use the .query() function to filter data based on string conditions and create a boolean column.<br>
Step1. Start creating a dataframe from the dictionary below.<br>
Step2. Use the .query() function to filter rows where the 'Department' is 'Finance'.<br>
Step3. Create a boolean column 'Is_Finance' in the original DataFrame that is True if the 'Department' is 'Finance'.<br>
Step4. Display the updated DataFrame to verify the new 'Is_Finance' column.

In [25]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Department': ['HR', 'Finance', 'IT', 'Marketing', 'Finance'],
    'Score': [85, 90, 78, 88, 92]
}

df = pd.DataFrame(data)

print('Original data:')
print(df)

df_filtered = df.query('Department == "Finance"')
print('\nFiltered data (Department is Finance):')
print(df_filtered)

df['Is_Finance'] = df.query('Department == "Finance"')['Department'] 
df['Is_Finance'] = df['Is_Finance'].notna() # This line sets NaNs to False

print('\nNew data:')
print(df)

Original data:
      Name Department  Score
0    Alice         HR     85
1      Bob    Finance     90
2  Charlie         IT     78
3    David  Marketing     88
4      Eva    Finance     92

Filtered data (Department is Finance):
  Name Department  Score
1  Bob    Finance     90
4  Eva    Finance     92

New data:
      Name Department  Score  Is_Finance
0    Alice         HR     85       False
1      Bob    Finance     90        True
2  Charlie         IT     78       False
3    David  Marketing     88       False
4      Eva    Finance     92        True
