## Hands-on - Python ReCap - Sample Answer

In [2]:
# Import necessary libraries
import pandas as pd  # pandas is used for handling tabular datasets (dataframes) and performing operations such as reading CSV files
import numpy as np  # numpy is used for numerical computations such as working with arrays and applying mathematical operations

# Load dataset from GitHub URL
file_path = "https://raw.githubusercontent.com/Hamed-Ahmadinia/DASP-2025/main/adult.data.csv"  # URL link to the dataset

# Read the dataset into a pandas dataframe
df = pd.read_csv(file_path, header=0)  # header=0 means the first row in the CSV is used as column names
print("Dataset Preview:")
print(df.head(5))  # Display the first 5 rows of the dataframe

Dataset Preview:
   age         workclass  fnlwgt  education  education-num  \
0   39         State-gov   77516  Bachelors             13   
1   50  Self-emp-not-inc   83311  Bachelors             13   
2   38           Private  215646    HS-grad              9   
3   53           Private  234721       11th              7   
4   28           Private  338409  Bachelors             13   

       marital-status         occupation   relationship   race     sex  \
0       Never-married       Adm-clerical  Not-in-family  White    Male   
1  Married-civ-spouse    Exec-managerial        Husband  White    Male   
2            Divorced  Handlers-cleaners  Not-in-family  White    Male   
3  Married-civ-spouse  Handlers-cleaners        Husband  Black    Male   
4  Married-civ-spouse     Prof-specialty           Wife  Black  Female   

   capital-gain  capital-loss  hours-per-week native-country salary  
0          2174             0              40  United-States  <=50K  
1             0          

### **Exercise 1: Common Python Operations**
**Task:** Print the sum of the first row's `age` and `hours-per-week`.

In [None]:
answer = df['age'].iloc[0] + df['hours-per-week'].iloc[0]  # Adds values from the first row of columns
print("Sum of age and hours-per-week:", answer)

### **Exercise 2: String Manipulation**
**Task:** Convert the first `workclass` entry to lowercase and strip any leading or trailing spaces.

In [None]:
answer = df['workclass'].iloc[0].strip().lower()
print("Workclass (lowercase, stripped):", answer)

### **Exercise 3: Lists in Python**
**Task:** Create a list of the unique values in the `education` column and print its length.

In [None]:
unique_education_levels = list(set(df['education']))  # Create a list of unique education levels
print("Number of unique education levels:", len(unique_education_levels))

### **Exercise 4: Sets in Python**
**Task:** Check if the value "Self-employed" exists in the unique set of `workclass` values.

In [None]:
unique_workclasses = set(df['workclass'].dropna())  # Convert to set for unique values
is_self_employed = "Self-employed" in unique_workclasses
print("Does 'Self-employed' exist in workclasses?:", is_self_employed)

### **Exercise 5: Dictionaries in Python**
**Task:** Create a dictionary from the first row's data and print the `occupation` value.

In [None]:
first_person = df.iloc[0].to_dict()  # Convert first row to dictionary
print("Occupation:", first_person['occupation'])

### **Exercise 6: Conditional Statements in Python**
**Task:** Write a function that returns "Minor" if the age is less than 18, "Adult" if between 18 and 65, and "Senior" if greater than 65.

In [None]:
def categorize_by_age(age):
    if age < 18:
        return "Minor"
    elif age < 65:
        return "Adult"
    else:
        return "Senior"

# Test the function
age_category = categorize_by_age(df['age'].iloc[0])
print("Age category:", age_category)

### **Exercise 7: For Loops**
**Task:** Iterate through the first 5 rows of `workclass` and print each entry.

In [None]:
print("First 5 workclasses:")
for workclass in df['workclass'].head(5):
    print(workclass)

### **Exercise 8: Functions**
**Task:** Define a function `sum_columns` that takes two column names and an index, sums the values at the given index, and returns the result.

In [None]:
def sum_columns(col1, col2, index):
    return df[col1].iloc[index] + df[col2].iloc[index]

# Test the function
sum_result = sum_columns('age', 'hours-per-week', 0)
print("Sum of age and hours-per-week:", sum_result)

### **Exercise 9: File Handling**
**Task:** Save the first 10 rows of the dataframe to `output.csv` and read it back.

In [None]:
# Save the dataframe to a CSV file
sample_df = df.head(10)
sample_df.to_csv("output.csv", index=False)

# Read the CSV file
read_sample = pd.read_csv("output.csv")
print("First 2 rows of read CSV:")
print(read_sample.head(2))

### **Exercise 10: Debugging in Python**
**Task:** Write a try/except block to handle a `KeyError` when accessing a non-existent column `"salary"`.

In [None]:
try:
    df['salary']  # Access a non-existent column
except KeyError:
    print("KeyError: Column 'salary' does not exist!")