## Hands-on - Python ReCap 1

In [39]:
# 1. Import necessary libraries
import pandas as pd  # pandas is used for handling tabular datasets (dataframes) and performing operations such as reading CSV files
import numpy as np  # numpy is used for numerical computations such as working with arrays and applying mathematical operations

# 2. Load dataset from GitHub URL
file_path = "https://raw.githubusercontent.com/Hamed-Ahmadinia/DASP-2025/main/adult.data.csv"  # URL link to the dataset stored on GitHub

# 3. Read the dataset into a pandas dataframe
df = pd.read_csv(file_path, header=0)  # header=0 means the first row in the CSV is used as column names

# 4. Display the first few rows of the dataframe to confirm the data has been loaded correctly
print("Dataset Preview:")  # Print a label for context
print(df.head(5))  # Display the first 5 rows of the dataset

Dataset Preview:
   age         workclass  fnlwgt  education  education-num  \
0   39         State-gov   77516  Bachelors             13   
1   50  Self-emp-not-inc   83311  Bachelors             13   
2   38           Private  215646    HS-grad              9   
3   53           Private  234721       11th              7   
4   28           Private  338409  Bachelors             13   

       marital-status         occupation   relationship   race     sex  \
0       Never-married       Adm-clerical  Not-in-family  White    Male   
1  Married-civ-spouse    Exec-managerial        Husband  White    Male   
2            Divorced  Handlers-cleaners  Not-in-family  White    Male   
3  Married-civ-spouse  Handlers-cleaners        Husband  Black    Male   
4  Married-civ-spouse     Prof-specialty           Wife  Black  Female   

   capital-gain  capital-loss  hours-per-week native-country salary  
0          2174             0              40  United-States  <=50K  
1             0          

### **Exercise 1: Common Python Operations**
**Question:** Print the sum of the first row's `age` and `hours-per-week`.

In [15]:
# Your code here
# Sum of the first row's 'age' and 'hours-per-week'
sum_age_hours = df.iloc[0]['age'] + df.iloc[0]['hours-per-week']
print("Sum of age and hours-per-week for the first row:", sum_age_hours)

Sum of age and hours-per-week for the first row: 79


### **Exercise 2: String Manipulation**
**Question:** Convert the first `workclass` entry to lowercase and strip any leading or trailing spaces.

In [17]:
# Your code here# Convert the first 'workclass' entry to lowercase and strip spaces
first_workclass = df.iloc[0]['workclass'].strip().lower()
print("First workclass entry after conversion:", first_workclass)


First workclass entry after conversion: state-gov


### **Exercise 3: Lists in Python**
**Question:** Create a list of the unique values in the `education` column and print its length.

In [19]:
# Your code here
# Create a list of unique values in the 'education' column
unique_education = df['education'].unique().tolist()
print("Unique education values:", unique_education)
print("Number of unique education values:", len(unique_education))

Unique education values: ['Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college', 'Assoc-acdm', 'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school', '5th-6th', '10th', '1st-4th', 'Preschool', '12th']
Number of unique education values: 16


### **Exercise 4: Sets in Python**
**Question:** Check if the value "Self-employed" exists in the unique set of `workclass` values.

In [24]:
# Your code here
# Check if "Self-employed" exists in the unique set of 'workclass' values
unique_workclass = set(df['workclass'].unique())
print("Does 'Self-employed' exist in workclass?", "Self-employed" in unique_workclass)


Does 'Self-employed' exist in workclass? False


### **Exercise 5: Dictionaries in Python**
**Question:** Create a dictionary from the first row's data. Print the `occupation` value from the dictionary.

In [26]:
# Your code here
# Create a dictionary from the first row's data
first_row_dict = df.iloc[0].to_dict()
print("Occupation from the first row:", first_row_dict['occupation'])


Occupation from the first row: Adm-clerical


### **Exercise 6: Conditional Statements in Python**
**Question:** Write a function that returns "Minor" if the age is less than 18, "Adult" if between 18 and 65, and "Senior" if greater than 65. Test it on the first row's `age`.

In [28]:
# Your code here
# Function to categorize age
def age_category(age):
    if age < 18:
        return "Minor"
    elif 18 <= age <= 65:
        return "Adult"
    else:
        return "Senior"

# Test the function on the first row's age
first_age = df.iloc[0]['age']
print("Age category for the first row:", age_category(first_age))


Age category for the first row: Adult


### **Exercise 7: For Loops**
**Question:** Iterate through the first 5 rows of `workclass` and print each entry.

In [30]:
# Your code here
# Iterate through the first 5 rows of 'workclass' and print each entry
for i in range(5):
    print(df.iloc[i]['workclass'])

State-gov
Self-emp-not-inc
Private
Private
Private


### **Exercise 8: Functions**
**Question:** Define a function `sum_columns` that takes two column names and an index, sums the values of the specified columns at the given index, and returns the result. Test it with `age` and `hours-per-week` at the first row.

In [32]:
# Your code here
# Function to sum two columns at a given index
def sum_columns(col1, col2, index):
    return df.iloc[index][col1] + df.iloc[index][col2]

# Test the function with 'age' and 'hours-per-week' at the first row
result = sum_columns('age', 'hours-per-week', 0)
print("Sum of age and hours-per-week for the first row:", result)


Sum of age and hours-per-week for the first row: 79


### **Exercise 9: File Handling**
**Question:** Save the first 10 rows of the dataframe to `output.csv` and read it back. Print the first 2 rows from the loaded CSV.

In [34]:
# Your code here
# Save the first 10 rows to 'output.csv'
df.head(10).to_csv('output.csv', index=False)

# Read the CSV back
loaded_df = pd.read_csv('output.csv')

# Print the first 2 rows from the loaded CSV
print(loaded_df.head(2))


   age         workclass  fnlwgt  education  education-num  \
0   39         State-gov   77516  Bachelors             13   
1   50  Self-emp-not-inc   83311  Bachelors             13   

       marital-status       occupation   relationship   race   sex  \
0       Never-married     Adm-clerical  Not-in-family  White  Male   
1  Married-civ-spouse  Exec-managerial        Husband  White  Male   

   capital-gain  capital-loss  hours-per-week native-country salary  
0          2174             0              40  United-States  <=50K  
1             0             0              13  United-States  <=50K  


### **Exercise 10: Debugging in Python**
**Question:** Write a try/except block to handle a `KeyError` when accessing a non-existent column `"salary"`. Print an appropriate error message when caught.

In [41]:
# Your code here
# Try/except block to handle KeyError
try:
    salary_column = df['salary']
except KeyError:
    print("Error: The column 'salary' does not exist in the dataframe.")
