## Hands-on - Python ReCap 1

In [1]:
# 1. Import necessary libraries
import pandas as pd  # pandas is used for handling tabular datasets (dataframes) and performing operations such as reading CSV files
import numpy as np  # numpy is used for numerical computations such as working with arrays and applying mathematical operations

# 2. Load dataset from GitHub URL
file_path = "https://raw.githubusercontent.com/Hamed-Ahmadinia/DASP-2025/main/adult.data.csv"  # URL link to the dataset stored on GitHub

# 3. Read the dataset into a pandas dataframe
df = pd.read_csv(file_path, header=0)  # header=0 means the first row in the CSV is used as column names

# 4. Display the first few rows of the dataframe to confirm the data has been loaded correctly
print("Dataset Preview:")  # Print a label for context
print(df.head(5))  # Display the first 5 rows of the dataset

Dataset Preview:
   age         workclass  fnlwgt  education  education-num  \
0   39         State-gov   77516  Bachelors             13   
1   50  Self-emp-not-inc   83311  Bachelors             13   
2   38           Private  215646    HS-grad              9   
3   53           Private  234721       11th              7   
4   28           Private  338409  Bachelors             13   

       marital-status         occupation   relationship   race     sex  \
0       Never-married       Adm-clerical  Not-in-family  White    Male   
1  Married-civ-spouse    Exec-managerial        Husband  White    Male   
2            Divorced  Handlers-cleaners  Not-in-family  White    Male   
3  Married-civ-spouse  Handlers-cleaners        Husband  Black    Male   
4  Married-civ-spouse     Prof-specialty           Wife  Black  Female   

   capital-gain  capital-loss  hours-per-week native-country salary  
0          2174             0              40  United-States  <=50K  
1             0          

### **Exercise 1: Common Python Operations**
**Question:** Print the sum of the first row's `age` and `hours-per-week`.

In [2]:
# Your code here
age_hours = df['age'][0] + df['hours-per-week'][0]
print("Sum of age and hours-per-week for the first row: ", age_hours)

Sum of age and hours-per-week for the first row:  79


### **Exercise 2: String Manipulation**
**Question:** Convert the first `workclass` entry to lowercase and strip any leading or trailing spaces.

In [6]:
# Your code here
workclass_lowercase = df['workclass'][0].lower().strip()
print("Workclass for the first row in lowercase: ", workclass_lowercase)

Workclass for the first row in lowercase:  state-gov


### **Exercise 3: Lists in Python**
**Question:** Create a list of the unique values in the `education` column and print its length.

In [None]:
# Your code here
education_list = df['education'].tolist()
education_list_unique = list(set(education_list))
print("There are",len(education_list_unique), "unique values in the education column.")

There are 16 unique values in the education column


### **Exercise 4: Sets in Python**
**Question:** Check if the value "Self-employed" exists in the unique set of `workclass` values.

In [10]:
# Your code here
unique_workclass = set(df['workclass'].dropna())
check = 'Self-employed' in unique_workclass
if check:
    print("Self-employed is in the workclass column.")
else:
  print("Self-employed is not in the workclass column.")

Self-employed is not in the workclass column.


### **Exercise 5: Dictionaries in Python**
**Question:** Create a dictionary from the first row's data. Print the `occupation` value from the dictionary.

In [11]:
# Your code here
dictionary_first_row = df.iloc[0].to_dict()
print("Occupation for the first row: ", dictionary_first_row['occupation'])

Occupation for the first row:  Adm-clerical


### **Exercise 6: Conditional Statements in Python**
**Question:** Write a function that returns "Minor" if the age is less than 18, "Adult" if between 18 and 65, and "Senior" if greater than 65. Test it on the first row's `age`.

In [12]:
# Your code here
if dictionary_first_row['age'] <= 18:
  print("Minor")
elif dictionary_first_row['age'] <= 65:
  print("Adult")
else:
  print("Senior")


Adult


### **Exercise 7: For Loops**
**Question:** Iterate through the first 5 rows of `workclass` and print each entry.

In [13]:
# Your code here
workclass_first_five = df['workclass'].head(5).tolist()
for i in range(5):
  print("Workclass for row", i, "is", workclass_first_five[i])

Workclass for row 0 is State-gov
Workclass for row 1 is Self-emp-not-inc
Workclass for row 2 is Private
Workclass for row 3 is Private
Workclass for row 4 is Private


### **Exercise 8: Functions**
**Question:** Define a function `sum_columns` that takes two column names and an index, sums the values of the specified columns at the given index, and returns the result. Test it with `age` and `hours-per-week` at the first row.

In [14]:
# Your code here
def sum_coloumns(col1,col2):
  return df[col1][0] + df[col2][0]

test_result = sum_coloumns('age','hours-per-week')
print("Sum of age and hours-per-week for the first row: ", test_result)

Sum of age and hours-per-week for the first row:  79


### **Exercise 9: File Handling**
**Question:** Save the first 10 rows of the dataframe to `output.csv` and read it back. Print the first 2 rows from the loaded CSV.

In [16]:
# Your code here
sample_data = df.head(10).to_csv('sample_data.csv',index = False)
print("Data has been saved to sample_data.csv")

#print fisrt 2 rows in sample_data.csv
sample_data_read = pd.read_csv('sample_data.csv')
print(sample_data_read.head(2))

Data has been saved to sample_data.csv
   age         workclass  fnlwgt  education  education-num  \
0   39         State-gov   77516  Bachelors             13   
1   50  Self-emp-not-inc   83311  Bachelors             13   

       marital-status       occupation   relationship   race   sex  \
0       Never-married     Adm-clerical  Not-in-family  White  Male   
1  Married-civ-spouse  Exec-managerial        Husband  White  Male   

   capital-gain  capital-loss  hours-per-week native-country salary  
0          2174             0              40  United-States  <=50K  
1             0             0              13  United-States  <=50K  


### **Exercise 10: Debugging in Python**
**Question:** Write a try/except block to handle a `KeyError` when accessing a non-existent column `"salary"`. Print an appropriate error message when caught.

In [None]:
# Your code here
try:
  print("Trying to access a column that does not exist.")
  print(df['salary'])
except KeyError:
  print("Column does not exist.")
finally:
  print("Debugging example completed.")


# column "salary" does exist, so I changed it to "nation" which does not exist in the dataset
try:
  print("Trying to access a column that does not exist.")
  print(df['nation'])
except KeyError:
  print("Column does not exist.")
finally:
  print("Debugging example completed.")

Trying to access a column that does not exist.
0        <=50K
1        <=50K
2        <=50K
3        <=50K
4        <=50K
         ...  
32556    <=50K
32557     >50K
32558    <=50K
32559    <=50K
32560     >50K
Name: salary, Length: 32561, dtype: object
Debugging example completed.
Trying to access a column that does not exist.
Column does not exist.
Debugging example completed.


In [None]:
print("Name: Yue Zhang")
print("Student ID: 2421832")