<a href="https://colab.research.google.com/github/TarisMajor/TarisMajor-DataScience-2025/blob/main/Completed/06-Working_with_Data/07-sorting_and_filtering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧮 Sorting & Filtering in Pandas

## 🔹 LEARNING GOALS:
- Sort rows using `.sort_values()`
- Filter data using conditional logic
- Chain multiple filters together
- Use `.query()` for readable filtering


### 📥 1. Load the Dataset

In [1]:
import pandas as pd

df = pd.read_csv("students.csv")
df.head()

Unnamed: 0,first_name,last_name,math_score,science_score
0,Danielle,Wood,100,95
1,Angel,Clark,67,78
2,Joshua,Adams,61,100
3,Jeffrey,Zuniga,77,99
4,Jill,Wong,75,83


### 🔢 2. Sorting Data

In [2]:
# Sort by math_score
df.sort_values(by="math_score")

Unnamed: 0,first_name,last_name,math_score,science_score
28,Melanie,Baker,60,94
2,Joshua,Adams,61,100
19,Adam,Adams,61,70
13,James,Cohen,61,66
12,Anthony,Hawkins,62,74
44,Christopher,Brown,62,64
14,Debra,Cameron,65,84
9,Anthony,Gray,65,74
37,Christine,Hicks,65,80
49,Michael,Valencia,65,91


In [3]:
# Sort by science_score descending
df.sort_values(by="science_score", ascending=False)

Unnamed: 0,first_name,last_name,math_score,science_score
17,Bridget,Johnson,92,100
2,Joshua,Adams,61,100
34,Nicholas,Ross,73,100
26,Christopher,Rocha,97,100
3,Jeffrey,Zuniga,77,99
25,Matthew,Foster,88,98
46,Craig,Ferrell,94,96
5,Erica,Lynch,74,96
0,Danielle,Wood,100,95
35,Andrea,Robinson,81,95


In [7]:
# Sort by multiple columns
df.sort_values(by=["science_score", "math_score"], ascending=[True, False])

Unnamed: 0,first_name,last_name,math_score,science_score
8,Robert,Martin,94,62
40,Janet,Rodriguez,82,62
38,Michelle,Davis,84,63
24,Susan,Rios,74,64
7,Christopher,Daniel,66,64
44,Christopher,Brown,62,64
11,William,Bowman,87,65
13,James,Cohen,61,66
30,Lindsay,Hoffman,87,70
27,Colin,Ellis,77,70


### 🧪 3. Filtering with Conditions

In [8]:
# Who scored at least 90 in math?
df[df["math_score"] >= 90]

Unnamed: 0,first_name,last_name,math_score,science_score
0,Danielle,Wood,100,95
8,Robert,Martin,94,62
10,Jeffery,Mayo,97,78
17,Bridget,Johnson,92,100
18,Lisa,Stewart,98,83
20,Linda,Clark,95,83
22,Matthew,Walter,94,73
26,Christopher,Rocha,97,100
42,Jeremy,Conner,98,85
46,Craig,Ferrell,94,96


In [11]:
# Who scored less than 70 in either subject?
df[(df["math_score"] < 70) | (df["science_score"] < 70)]

Unnamed: 0,first_name,last_name,math_score,science_score
1,Angel,Clark,67,78
2,Joshua,Adams,61,100
6,Patricia,Jackson,68,72
7,Christopher,Daniel,66,64
8,Robert,Martin,94,62
9,Anthony,Gray,65,74
11,William,Bowman,87,65
12,Anthony,Hawkins,62,74
13,James,Cohen,61,66
14,Debra,Cameron,65,84


### 🔗 4. Combining Filters

In [12]:
# Students with B+ or higher in both
df[(df["math_score"] >= 87) & (df["science_score"] >= 87)]

Unnamed: 0,first_name,last_name,math_score,science_score
0,Danielle,Wood,100,95
17,Bridget,Johnson,92,100
25,Matthew,Foster,88,98
26,Christopher,Rocha,97,100
46,Craig,Ferrell,94,96


### 🧾 5. Using `.query()` for Readability

In [13]:
# Same query using .query()
df.query("math_score >= 87 and science_score >= 87")

Unnamed: 0,first_name,last_name,math_score,science_score
0,Danielle,Wood,100,95
17,Bridget,Johnson,92,100
25,Matthew,Foster,88,98
26,Christopher,Rocha,97,100
46,Craig,Ferrell,94,96


### 🧪 Try It Yourself

- Sort the dataset by first name alphabetically
- Find students with the same math and science score
- Use `.query()` to find students who scored over 95 in *either* subject


In [14]:
df.sort_values(by="first_name")

Unnamed: 0,first_name,last_name,math_score,science_score
19,Adam,Adams,61,70
32,Amanda,Richards,77,84
15,Amy,Parsons,73,77
35,Andrea,Robinson,81,95
1,Angel,Clark,67,78
12,Anthony,Hawkins,62,74
9,Anthony,Gray,65,74
39,Barbara,Williams,66,74
41,Brandon,Burton,82,80
17,Bridget,Johnson,92,100


In [15]:
df[df['math_score'] == df['science_score']]

Unnamed: 0,first_name,last_name,math_score,science_score


In [16]:
df.query("math_score > 95 or science_score > 95")

Unnamed: 0,first_name,last_name,math_score,science_score
0,Danielle,Wood,100,95
2,Joshua,Adams,61,100
3,Jeffrey,Zuniga,77,99
5,Erica,Lynch,74,96
10,Jeffery,Mayo,97,78
17,Bridget,Johnson,92,100
18,Lisa,Stewart,98,83
25,Matthew,Foster,88,98
26,Christopher,Rocha,97,100
34,Nicholas,Ross,73,100


### 🧠 Mini-Challenge

> Load `"data/survey.csv"` and:
- Sort by the most recent signup date
- Filter out anyone who didn't complete the survey (nulls or "N/A")
- Use `.query()` to select rows with a specific condition of your choice


### 📝 Summary

| Task              | Code Example                                 |
|-------------------|-----------------------------------------------|
| Sort by column     | `df.sort_values(by="col")`                  |
| Filter condition   | `df[df["col"] > value]`                     |
| Combine filters    | `df[(cond1) & (cond2)]`                     |
| Query method       | `df.query("col > value and col2 < value")`  |
