

In this subsection, you will learn essential DataFrame operations, including:

- Selecting columns  
- Adding new columns  
- Deleting columns  
- Summary statistics  
- Sorting values  
- Setting and resetting index  

These operations are the foundation of almost every real-world pandas workflow.


üü¶ 1. Creating Sample Data

In [11]:
import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Age": [24, 19, 22, 28, 31],
    "Score": [88, 75, 90, 65, 80],
    "City": ["Toronto", "Montreal", "Vancouver", "Calgary", "Toronto"]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Score,City
0,Alice,24,88,Toronto
1,Bob,19,75,Montreal
2,Charlie,22,90,Vancouver
3,David,28,65,Calgary
4,Eva,31,80,Toronto


üü¶ 2. Selecting Columns

2.1 Single Column

In [12]:
df["Name"]

0      Alice
1        Bob
2    Charlie
3      David
4        Eva
Name: Name, dtype: object

2.2 Multiple Columns

In [13]:
df[["Name", "Score"]]

Unnamed: 0,Name,Score
0,Alice,88
1,Bob,75
2,Charlie,90
3,David,65
4,Eva,80


2.3 Using dot notation

In [14]:
df.Name

0      Alice
1        Bob
2    Charlie
3      David
4        Eva
Name: Name, dtype: object

üü¶ 3. Adding New Columns

3.1 Add with direct assignment

In [15]:
df["Passed"] = df["Score"] >= 70
df

Unnamed: 0,Name,Age,Score,City,Passed
0,Alice,24,88,Toronto,True
1,Bob,19,75,Montreal,True
2,Charlie,22,90,Vancouver,True
3,David,28,65,Calgary,False
4,Eva,31,80,Toronto,True


3.2 Add using calculations

In [16]:
df["Score_Percent"] = df["Score"] / 100
df

Unnamed: 0,Name,Age,Score,City,Passed,Score_Percent
0,Alice,24,88,Toronto,True,0.88
1,Bob,19,75,Montreal,True,0.75
2,Charlie,22,90,Vancouver,True,0.9
3,David,28,65,Calgary,False,0.65
4,Eva,31,80,Toronto,True,0.8


3.3 Add from external list

In [17]:
df["Rank"] = [1, 2, 3, 4, 5]
df

Unnamed: 0,Name,Age,Score,City,Passed,Score_Percent,Rank
0,Alice,24,88,Toronto,True,0.88,1
1,Bob,19,75,Montreal,True,0.75,2
2,Charlie,22,90,Vancouver,True,0.9,3
3,David,28,65,Calgary,False,0.65,4
4,Eva,31,80,Toronto,True,0.8,5


üü¶ 4. Deleting Columns

4.1 Drop column (temporary)

In [18]:
df.drop(columns=["Score_Percent"])

Unnamed: 0,Name,Age,Score,City,Passed,Rank
0,Alice,24,88,Toronto,True,1
1,Bob,19,75,Montreal,True,2
2,Charlie,22,90,Vancouver,True,3
3,David,28,65,Calgary,False,4
4,Eva,31,80,Toronto,True,5


4.5 Drop permanently

In [19]:
df.drop(columns=["Score_Percent"], inplace=True)
df

Unnamed: 0,Name,Age,Score,City,Passed,Rank
0,Alice,24,88,Toronto,True,1
1,Bob,19,75,Montreal,True,2
2,Charlie,22,90,Vancouver,True,3
3,David,28,65,Calgary,False,4
4,Eva,31,80,Toronto,True,5


üü¶ 5. Summary Statistics

5.1 Default describe

In [20]:
df.describe()

Unnamed: 0,Age,Score,Rank
count,5.0,5.0,5.0
mean,24.8,79.6,3.0
std,4.764452,10.163661,1.581139
min,19.0,65.0,1.0
25%,22.0,75.0,2.0
50%,24.0,80.0,3.0
75%,28.0,88.0,4.0
max,31.0,90.0,5.0


5.2 Custom statistics

In [21]:
df["Score"].mean(), df["Score"].max(), df["Score"].min()

(79.6, 90, 65)

üü¶ 6. Sorting Data

6.1 Sort by one column

In [22]:
df.sort_values(by="Score")

Unnamed: 0,Name,Age,Score,City,Passed,Rank
3,David,28,65,Calgary,False,4
1,Bob,19,75,Montreal,True,2
4,Eva,31,80,Toronto,True,5
0,Alice,24,88,Toronto,True,1
2,Charlie,22,90,Vancouver,True,3


6.2 Sort descending

In [23]:
df.sort_values(by="Score", ascending=False)

Unnamed: 0,Name,Age,Score,City,Passed,Rank
2,Charlie,22,90,Vancouver,True,3
0,Alice,24,88,Toronto,True,1
4,Eva,31,80,Toronto,True,5
1,Bob,19,75,Montreal,True,2
3,David,28,65,Calgary,False,4


6.3 Sort by multiple columns

In [24]:
df.sort_values(by=["City", "Score"])

Unnamed: 0,Name,Age,Score,City,Passed,Rank
3,David,28,65,Calgary,False,4
1,Bob,19,75,Montreal,True,2
4,Eva,31,80,Toronto,True,5
0,Alice,24,88,Toronto,True,1
2,Charlie,22,90,Vancouver,True,3


üü¶ 7. Setting and Resetting Index

7.1 Set index

In [25]:
df_indexed = df.set_index("Name")
df_indexed

Unnamed: 0_level_0,Age,Score,City,Passed,Rank
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Alice,24,88,Toronto,True,1
Bob,19,75,Montreal,True,2
Charlie,22,90,Vancouver,True,3
David,28,65,Calgary,False,4
Eva,31,80,Toronto,True,5


7.2 Reset index

In [26]:
df_reset = df_indexed.reset_index()
df_reset

Unnamed: 0,Name,Age,Score,City,Passed,Rank
0,Alice,24,88,Toronto,True,1
1,Bob,19,75,Montreal,True,2
2,Charlie,22,90,Vancouver,True,3
3,David,28,65,Calgary,False,4
4,Eva,31,80,Toronto,True,5


üü¶ 8. Bonus: Renaming Columns

8.1 Rename select columns

In [27]:
df.rename(columns={"Score": "Final_Score", "Age": "Student_Age"})

Unnamed: 0,Name,Student_Age,Final_Score,City,Passed,Rank
0,Alice,24,88,Toronto,True,1
1,Bob,19,75,Montreal,True,2
2,Charlie,22,90,Vancouver,True,3
3,David,28,65,Calgary,False,4
4,Eva,31,80,Toronto,True,5


8.2 Rename all columns

In [28]:
df.columns = [col.lower() for col in df.columns]
df

Unnamed: 0,name,age,score,city,passed,rank
0,Alice,24,88,Toronto,True,1
1,Bob,19,75,Montreal,True,2
2,Charlie,22,90,Vancouver,True,3
3,David,28,65,Calgary,False,4
4,Eva,31,80,Toronto,True,5


## üü¶ 9. Summary

In this subsection, you learned how to:

### üß± Column Operations
- Select single and multiple columns  
- Add new columns using calculations and logic  
- Remove columns safely  

### üìä Data Understanding
- Generate summary statistics (`describe()`, `mean()`, `max()`, `min()`)  

### üîÉ Data Organization
- Sort by one or multiple columns  
- Set and reset DataFrame index  

### ‚≠ê Extra Skill
- Rename columns for cleaner datasets  

You now have full control over DataFrame structure and organization.  

