<a href="https://colab.research.google.com/github/aaniaahh/DataScience-2025/blob/main/Completed/06-Working_with_Data/06_column_operations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üß± Column Operations: Creating, Renaming, and Dropping Columns
## üîπ LEARNING GOALS:
* Create new columns using calculations or functions
* Rename one or multiple columns
* Drop unwanted columns safely
* Apply functions across columns using `.apply()` and lambdas

##  üèóÔ∏è 1. Setup and Sample Data

In [None]:
import pandas as pd

data = {
    "first_name": ["Alice", "Bob", "Charlie", "David"],
    "last_name": ["Smith", "Jones", "Brown", "Wilson"],
    "math_score": [85, 90, 78, 92],
    "science_score": [88, 85, 82, 95]
}

df = pd.DataFrame(data)
df

Unnamed: 0,first_name,last_name,math_score,science_score
0,Alice,Smith,85,88
1,Bob,Jones,90,85
2,Charlie,Brown,78,82
3,David,Wilson,92,95


## ‚ûï 2. Creating New Columns

In [None]:
# Simple column arithmetic
df["average_score"] = (df["math_score"] + df["science_score"]) / 2
df

Unnamed: 0,first_name,last_name,math_score,science_score,average_score
0,Alice,Smith,85,88,86.5
1,Bob,Jones,90,85,87.5
2,Charlie,Brown,78,82,80.0
3,David,Wilson,92,95,93.5


In [None]:
# Create full name column
df["full_name"] = df["first_name"] + " " + df["last_name"]
df

Unnamed: 0,first_name,last_name,math_score,science_score,average_score,full_name
0,Alice,Smith,85,88,86.5,Alice Smith
1,Bob,Jones,90,85,87.5,Bob Jones
2,Charlie,Brown,78,82,80.0,Charlie Brown
3,David,Wilson,92,95,93.5,David Wilson


## ‚úçÔ∏è 3. Renaming Columns

In [None]:
# Rename a single column
df.rename(columns={"math_score": "Math", "science_score": "Science"}, inplace=True)
df

Unnamed: 0,first_name,last_name,Math,Science,average_score,full_name
0,Alice,Smith,85,88,86.5,Alice Smith
1,Bob,Jones,90,85,87.5,Bob Jones
2,Charlie,Brown,78,82,80.0,Charlie Brown
3,David,Wilson,92,95,93.5,David Wilson


In [None]:
# Rename all columns to uppercase
df.columns = [col.upper() for col in df.columns]
df

Unnamed: 0,FIRST_NAME,LAST_NAME,MATH,SCIENCE,AVERAGE_SCORE,FULL_NAME
0,Alice,Smith,85,88,86.5,Alice Smith
1,Bob,Jones,90,85,87.5,Bob Jones
2,Charlie,Brown,78,82,80.0,Charlie Brown
3,David,Wilson,92,95,93.5,David Wilson


## ‚ùå 4. Dropping Columns

In [None]:
# Drop by name
df.drop(columns=["FULL_NAME"], inplace=True)
df

Unnamed: 0,FIRST_NAME,LAST_NAME,MATH,SCIENCE,AVERAGE_SCORE
0,Alice,Smith,85,88,86.5
1,Bob,Jones,90,85,87.5
2,Charlie,Brown,78,82,80.0
3,David,Wilson,92,95,93.5


In [None]:
# Categorize based on average score
def grade(score):
    if score >= 90:
        return "A"
    elif score >= 80:
        return "B"
    else:
        return "C"

df["GRADE"] = df["AVERAGE_SCORE"].apply(grade)
df

Unnamed: 0,FIRST_NAME,LAST_NAME,MATH,SCIENCE,AVERAGE_SCORE,GRADE
0,Alice,Smith,85,88,86.5,B
1,Bob,Jones,90,85,87.5,B
2,Charlie,Brown,78,82,80.0,B
3,David,Wilson,92,95,93.5,A


## üß™ Try It Yourself
* Add a new column called `"NAME_LENGTH"` that contains the length of each `FIRST_NAME`
* Create a column `"MATH_PLUS_5"` which is the math score + 5 points bonus

In [None]:
import pandas as pd

data = {
    "first_name": ["Alice", "Bob", "Charlie", "David"],
    "last_name": ["Smith", "Jones", "Brown", "Wilson"],
    "math_score": [85, 90, 78, 92],
    "science_score": [88, 85, 82, 95]
}

df = pd.DataFrame(data)

# Add a new column NAME_LENGTH
df["NAME_LENGTH"] = df["first_name"].str.len()

# Add a new column MATH_PLUS_5
df["MATH_PLUS_5"] = df["math_score"] + 5

df

Unnamed: 0,first_name,last_name,math_score,science_score,NAME_LENGTH,MATH_PLUS_5
0,Alice,Smith,85,88,5,90
1,Bob,Jones,90,85,3,95
2,Charlie,Brown,78,82,7,83
3,David,Wilson,92,95,5,97


## üß† Mini-Challenge
* Load `"data/students.csv"` and:

* Create a `"total_score"` column by summing up all numeric test columns
* Rename any columns that contain spaces (e.g., `"Test 1"`) to use underscores
* Drop any column that contains only `NaN` values

In [14]:
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/rugbyprof/3603-Programming-for-Data-Science/refs/heads/main/data/students.csv")

# 1. Rename columns with spaces ‚Üí underscores
df.columns = df.columns.str.replace(" ", "_")

# 2. Identify numeric test columns
numeric_test_cols = [
    col for col in df.columns
    if ("test" in col.lower()) or ("score" in col.lower())
]
numeric_test_cols = [
    col for col in numeric_test_cols
    if pd.api.types.is_numeric_dtype(df[col])
]

print("Numeric test columns used for total_score:")
print(numeric_test_cols)

# 3. Create total_score column
df["total_score"] = df[numeric_test_cols].sum(axis=1)

# 4. Drop columns that contain ONLY NaN values
df = df.dropna(axis=1, how="all")

# 5. Show results
print("\nUpdated Columns:")
print(df.columns)

print("\nPreview:")
print(df.head())

Numeric test columns used for total_score:
['math_score', 'science_score']

Updated Columns:
Index(['first_name', 'last_name', 'math_score', 'science_score',
       'total_score'],
      dtype='object')

Preview:
  first_name last_name  math_score  science_score  total_score
0   Danielle      Wood         100             95          195
1      Angel     Clark          67             78          145
2     Joshua     Adams          61            100          161
3    Jeffrey    Zuniga          77             99          176
4       Jill      Wong          75             83          158


## üìù Summary

| Action   |	 Method |
| -------- | -------- |
| Create col |	`df["new"] = ...` |
| Rename column(s) |	`df.rename(columns={...})` |
| Rename all columns |	`df.columns = [...]` |
| Drop column(s) |	`df.drop(columns=[...])` |
| Apply function |	`df["col"].apply(func)` |
