# 🧮 Sorting & Filtering in Pandas

## 🔹 LEARNING GOALS:
- Sort rows using `.sort_values()`
- Filter data using conditional logic
- Chain multiple filters together
- Use `.query()` for readable filtering


### 📥 1. Load the Dataset

In [2]:
import pandas as pd

# Create a sample DataFrame
data = {
    'first_name': ['Danielle', 'Angel', 'Joshua', 'Jeffrey', 'Jill', 'Erica', 'Patricia', 'Christopher', 'Robert', 'Anthony'],
    'last_name': ['Wood', 'Clark', 'Adams', 'Zuniga', 'Wong', 'Lynch', 'Jackson', 'Daniel', 'Martin', 'Gray'],
    'math_score': [100, 67, 61, 77, 75, 74, 68, 66, 94, 65],
    'science_score': [95, 78, 100, 99, 83, 96, 72, 64, 62, 74]
}
df_sample = pd.DataFrame(data)

# Save the DataFrame to a CSV file in the sample_data folder
df_sample.to_csv('/content/sample_data/students.csv', index=False)

In [3]:
# load data into a csv
df = pd.read_csv('/content/sample_data/students.csv')
df.head()

Unnamed: 0,first_name,last_name,math_score,science_score
0,Danielle,Wood,100,95
1,Angel,Clark,67,78
2,Joshua,Adams,61,100
3,Jeffrey,Zuniga,77,99
4,Jill,Wong,75,83


### 🔢 2. Sorting Data

In [4]:
# Sort by math_score
df.sort_values(by="math_score")

Unnamed: 0,first_name,last_name,math_score,science_score
2,Joshua,Adams,61,100
9,Anthony,Gray,65,74
7,Christopher,Daniel,66,64
1,Angel,Clark,67,78
6,Patricia,Jackson,68,72
5,Erica,Lynch,74,96
4,Jill,Wong,75,83
3,Jeffrey,Zuniga,77,99
8,Robert,Martin,94,62
0,Danielle,Wood,100,95


In [5]:
# Sort by science_score descending
df.sort_values(by="science_score", ascending=False)

Unnamed: 0,first_name,last_name,math_score,science_score
2,Joshua,Adams,61,100
3,Jeffrey,Zuniga,77,99
5,Erica,Lynch,74,96
0,Danielle,Wood,100,95
4,Jill,Wong,75,83
1,Angel,Clark,67,78
9,Anthony,Gray,65,74
6,Patricia,Jackson,68,72
7,Christopher,Daniel,66,64
8,Robert,Martin,94,62


In [6]:
# Sort by multiple columns
df.sort_values(by=["science_score", "math_score"], ascending=[False, True])

Unnamed: 0,first_name,last_name,math_score,science_score
2,Joshua,Adams,61,100
3,Jeffrey,Zuniga,77,99
5,Erica,Lynch,74,96
0,Danielle,Wood,100,95
4,Jill,Wong,75,83
1,Angel,Clark,67,78
9,Anthony,Gray,65,74
6,Patricia,Jackson,68,72
7,Christopher,Daniel,66,64
8,Robert,Martin,94,62


### 🧪 3. Filtering with Conditions

In [7]:
# Who scored at least 90 in math?
df[df["math_score"] >= 90]

Unnamed: 0,first_name,last_name,math_score,science_score
0,Danielle,Wood,100,95
8,Robert,Martin,94,62


In [8]:
# Who scored less than 70 in either subject?
df[(df["math_score"] < 70) | (df["science_score"] < 70)]

Unnamed: 0,first_name,last_name,math_score,science_score
1,Angel,Clark,67,78
2,Joshua,Adams,61,100
6,Patricia,Jackson,68,72
7,Christopher,Daniel,66,64
8,Robert,Martin,94,62
9,Anthony,Gray,65,74


### 🔗 4. Combining Filters

In [9]:
# Students with B+ or higher in both
df[(df["math_score"] >= 87) & (df["science_score"] >= 87)]

Unnamed: 0,first_name,last_name,math_score,science_score
0,Danielle,Wood,100,95


### 🧾 5. Using `.query()` for Readability

In [10]:
# Same query using .query()
df.query("math_score >= 87 and science_score >= 87")

Unnamed: 0,first_name,last_name,math_score,science_score
0,Danielle,Wood,100,95


### 🧪 Try It Yourself

- Sort the dataset by first name alphabetically
- Find students with the same math and science score
- Use `.query()` to find students who scored over 95 in *either* subject


In [11]:
# sorting values by first name (alphabetically).
df.sort_values(by="first_name")

Unnamed: 0,first_name,last_name,math_score,science_score
1,Angel,Clark,67,78
9,Anthony,Gray,65,74
7,Christopher,Daniel,66,64
0,Danielle,Wood,100,95
5,Erica,Lynch,74,96
3,Jeffrey,Zuniga,77,99
4,Jill,Wong,75,83
2,Joshua,Adams,61,100
6,Patricia,Jackson,68,72
8,Robert,Martin,94,62


In [12]:
# finding students with the same score in math and science.
df[df["math_score"] == df["science_score"]]

Unnamed: 0,first_name,last_name,math_score,science_score


In [13]:
# using .query() to find students who scored over 95 in either subject.
df.query("math_score > 95 and science_score > 95")

Unnamed: 0,first_name,last_name,math_score,science_score


### 🧠 Mini-Challenge

> Load `"data/survey.csv"` and:
- Sort by the most recent signup date
- Filter out anyone who didn't complete the survey (nulls or "N/A")
- Use `.query()` to select rows with a specific condition of your choice


In [27]:
# Create a sample DataFrame for the survey data
survey_data = {
    'signup_date': ['2023-01-15', '2023-02-20', '2023-03-10', '2023-04-05', '2023-05-18', None, '2023-07-22', '2023-08-30', '2023-09-12', '2023-10-01'],
    'completion_status': ['Completed', 'Completed', 'N/A', 'Completed', 'Completed', 'Not Completed', 'Completed', 'Completed', 'N/A', 'Completed'],
    'satisfaction_score': [8, 7, None, 9, 6, 4, 7, 8, None, 9],
    'feedback': ['Good', 'Helpful', None, 'Excellent', 'Average', 'Poor', 'Informative', 'Great', None, 'Very Good']
}
df_survey = pd.DataFrame(survey_data)

# Save the DataFrame to a CSV file in the sample_data folder
df_survey.to_csv('/content/sample_data/survey.csv', index=False)

df2 = pd.read_csv('/content/sample_data/survey.csv')

In [28]:
# sorting by the most recent signup date.
df2.sort_values(by = "signup_date", ascending = False)

Unnamed: 0,signup_date,completion_status,satisfaction_score,feedback
9,2023-10-01,Completed,9.0,Very Good
8,2023-09-12,,,
7,2023-08-30,Completed,8.0,Great
6,2023-07-22,Completed,7.0,Informative
4,2023-05-18,Completed,6.0,Average
3,2023-04-05,Completed,9.0,Excellent
2,2023-03-10,,,
1,2023-02-20,Completed,7.0,Helpful
0,2023-01-15,Completed,8.0,Good
5,,Not Completed,4.0,Poor


In [30]:
# filtering out anyone who didn't complete the survey (NaN or Null)
df2 = df2.dropna(subset=['completion_status'])
df2 = df2[df2['completion_status'] == 'Completed']
df2

Unnamed: 0,signup_date,completion_status,satisfaction_score,feedback
0,2023-01-15,Completed,8.0,Good
1,2023-02-20,Completed,7.0,Helpful
3,2023-04-05,Completed,9.0,Excellent
4,2023-05-18,Completed,6.0,Average
6,2023-07-22,Completed,7.0,Informative
7,2023-08-30,Completed,8.0,Great
9,2023-10-01,Completed,9.0,Very Good


In [31]:
# using .query() to filter satisfaction scores of > 7.0
df2.query("satisfaction_score > 7.0")

Unnamed: 0,signup_date,completion_status,satisfaction_score,feedback
0,2023-01-15,Completed,8.0,Good
3,2023-04-05,Completed,9.0,Excellent
7,2023-08-30,Completed,8.0,Great
9,2023-10-01,Completed,9.0,Very Good


### 📝 Summary

| Task              | Code Example                                 |
|-------------------|-----------------------------------------------|
| Sort by column     | `df.sort_values(by="col")`                  |
| Filter condition   | `df[df["col"] > value]`                     |
| Combine filters    | `df[(cond1) & (cond2)]`                     |
| Query method       | `df.query("col > value and col2 < value")`  |
