# 🧱 Column Operations: Creating, Renaming, and Dropping Columns

## 🔹 LEARNING GOALS:
- Create new columns using calculations or functions
- Rename one or multiple columns
- Drop unwanted columns safely
- Apply functions across columns using `.apply()` and lambdas


### 🏗️ 1. Setup and Sample Data

In [13]:
import pandas as pd

data = {
    "first_name": ["Alice", "Bob", "Charlie", "David"],
    "last_name": ["Smith", "Jones", "Brown", "Wilson"],
    "math_score": [85, 90, 78, 92],
    "science_score": [88, 85, 82, 95]
}

df = pd.DataFrame(data)
df

Unnamed: 0,first_name,last_name,math_score,science_score
0,Alice,Smith,85,88
1,Bob,Jones,90,85
2,Charlie,Brown,78,82
3,David,Wilson,92,95


### ➕ 2. Creating New Columns

In [14]:
# Simple column arithmetic
df["average_score"] = (df["math_score"] + df["science_score"]) / 2
df

Unnamed: 0,first_name,last_name,math_score,science_score,average_score
0,Alice,Smith,85,88,86.5
1,Bob,Jones,90,85,87.5
2,Charlie,Brown,78,82,80.0
3,David,Wilson,92,95,93.5


In [15]:
# Create full name column
df["full_name"] = df["first_name"] + " " + df["last_name"]
df

Unnamed: 0,first_name,last_name,math_score,science_score,average_score,full_name
0,Alice,Smith,85,88,86.5,Alice Smith
1,Bob,Jones,90,85,87.5,Bob Jones
2,Charlie,Brown,78,82,80.0,Charlie Brown
3,David,Wilson,92,95,93.5,David Wilson


### ✍️ 3. Renaming Columns

In [16]:
# Rename a single column
df.rename(columns={"math_score": "Math", "science_score": "Science"}, inplace=True)
df

Unnamed: 0,first_name,last_name,Math,Science,average_score,full_name
0,Alice,Smith,85,88,86.5,Alice Smith
1,Bob,Jones,90,85,87.5,Bob Jones
2,Charlie,Brown,78,82,80.0,Charlie Brown
3,David,Wilson,92,95,93.5,David Wilson


In [22]:
# Rename all columns to uppercase
df.columns = [col.upper() for col in df.columns]
df

Unnamed: 0,FIRST_NAME,LAST_NAME,MATH,SCIENCE,AVERAGE_SCORE
0,Alice,Smith,85,88,86.5
1,Bob,Jones,90,85,87.5
2,Charlie,Brown,78,82,80.0
3,David,Wilson,92,95,93.5


### ❌ 4. Dropping Columns

In [21]:
# Drop by name
df.drop(columns=["FULL_NAME"], inplace=True)
df

Unnamed: 0,FIRST_NAME,LAST_NAME,MATH,SCIENCE,AVERAGE_SCORE
0,Alice,Smith,85,88,86.5
1,Bob,Jones,90,85,87.5
2,Charlie,Brown,78,82,80.0
3,David,Wilson,92,95,93.5


### 🔁 5. Applying Functions Across Columns

In [23]:
# Categorize based on average score
def grade(score):
    if score >= 90:
        return "A"
    elif score >= 80:
        return "B"
    else:
        return "C"

df["GRADE"] = df["AVERAGE_SCORE"].apply(grade)
df

Unnamed: 0,FIRST_NAME,LAST_NAME,MATH,SCIENCE,AVERAGE_SCORE,GRADE
0,Alice,Smith,85,88,86.5,B
1,Bob,Jones,90,85,87.5,B
2,Charlie,Brown,78,82,80.0,B
3,David,Wilson,92,95,93.5,A


### 🧪 Try It Yourself

- Add a new column called `"NAME_LENGTH"` that contains the length of each `FIRST_NAME`
- Create a column `"MATH_PLUS_5"` which is the math score + 5 points bonus


In [24]:
df["NAME_LENGTH"] = df["FIRST_NAME"].apply(len)
df

Unnamed: 0,FIRST_NAME,LAST_NAME,MATH,SCIENCE,AVERAGE_SCORE,GRADE,NAME_LENGTH
0,Alice,Smith,85,88,86.5,B,5
1,Bob,Jones,90,85,87.5,B,3
2,Charlie,Brown,78,82,80.0,B,7
3,David,Wilson,92,95,93.5,A,5


In [26]:
df["MATH_PLUS_5"] = df["MATH"] + 5
df

Unnamed: 0,FIRST_NAME,LAST_NAME,MATH,SCIENCE,AVERAGE_SCORE,GRADE,NAME_LENGTH,MATH_PLUS_5
0,Alice,Smith,85,88,86.5,B,5,90
1,Bob,Jones,90,85,87.5,B,3,95
2,Charlie,Brown,78,82,80.0,B,7,83
3,David,Wilson,92,95,93.5,A,5,97


### 🧠 Mini-Challenge

> Load `"data/students.csv"` and:
- Create a `"total_score"` column by summing up all numeric test columns
- Rename any columns that contain spaces (e.g., `"Test 1"`) to use underscores
- Drop any column that contains only `NaN` values


In [28]:
df = pd.read_csv("/content/sample_data/california_housing_test.csv")
df

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-122.05,37.37,27.0,3885.0,661.0,1537.0,606.0,6.6085,344700.0
1,-118.30,34.26,43.0,1510.0,310.0,809.0,277.0,3.5990,176500.0
2,-117.81,33.78,27.0,3589.0,507.0,1484.0,495.0,5.7934,270500.0
3,-118.36,33.82,28.0,67.0,15.0,49.0,11.0,6.1359,330000.0
4,-119.67,36.33,19.0,1241.0,244.0,850.0,237.0,2.9375,81700.0
...,...,...,...,...,...,...,...,...,...
2995,-119.86,34.42,23.0,1450.0,642.0,1258.0,607.0,1.1790,225000.0
2996,-118.14,34.06,27.0,5257.0,1082.0,3496.0,1036.0,3.3906,237200.0
2997,-119.70,36.30,10.0,956.0,201.0,693.0,220.0,2.2895,62000.0
2998,-117.12,34.10,40.0,96.0,14.0,46.0,14.0,3.2708,162500.0


In [30]:
df["totalRoomsNotBedrooms"] = df["total_rooms"] - df["total_bedrooms"]
df

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,totalMedianAge,totalRoomsNotBedrooms
0,-122.05,37.37,27.0,3885.0,661.0,1537.0,606.0,6.6085,344700.0,86536.0,3224.0
1,-118.30,34.26,43.0,1510.0,310.0,809.0,277.0,3.5990,176500.0,86536.0,1200.0
2,-117.81,33.78,27.0,3589.0,507.0,1484.0,495.0,5.7934,270500.0,86536.0,3082.0
3,-118.36,33.82,28.0,67.0,15.0,49.0,11.0,6.1359,330000.0,86536.0,52.0
4,-119.67,36.33,19.0,1241.0,244.0,850.0,237.0,2.9375,81700.0,86536.0,997.0
...,...,...,...,...,...,...,...,...,...,...,...
2995,-119.86,34.42,23.0,1450.0,642.0,1258.0,607.0,1.1790,225000.0,86536.0,808.0
2996,-118.14,34.06,27.0,5257.0,1082.0,3496.0,1036.0,3.3906,237200.0,86536.0,4175.0
2997,-119.70,36.30,10.0,956.0,201.0,693.0,220.0,2.2895,62000.0,86536.0,755.0
2998,-117.12,34.10,40.0,96.0,14.0,46.0,14.0,3.2708,162500.0,86536.0,82.0


### 📝 Summary

| Action             | Method                                |
|--------------------|----------------------------------------|
| Create column       | `df["new"] = ...`                     |
| Rename column(s)    | `df.rename(columns={...})`            |
| Rename all columns  | `df.columns = [...]`                 |
| Drop column(s)      | `df.drop(columns=[...])`              |
| Apply function      | `df["col"].apply(func)`               |
