# Lecture 10 - Pandas Filtering and Sorting

Thursday 2022/02/17

## Lecture Notes and in-class exercises

▶️ First, run the code cell below to import `unittest`, a module used for **🧭 Check Your Work** sections and the autograder.

In [1]:
import unittest
tc = unittest.TestCase()

#### 👇 Tasks

- ✔️ Import the following Python packages.
    1. `pandas`: Use alias `pd`.
    2. `numpy`: Use alias `np`.

In [2]:
### BEGIN SOLUTION
import pandas as pd
import numpy as np
### END SOLUTION

#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [3]:
import sys
tc.assertTrue("pd" in globals(), "Check whether you have correctly import Pandas with an alias.")
tc.assertTrue("np" in globals(), "Check whether you have correctly import NumPy with an alias.")

---

### 📌 Filtering rows

Let's step back and go back to working with a `Series`.

▶️ Create a `Series` named `nums` with the following four integers: `-20`, `-10`, `10`, `20`. 

In [4]:
### BEGIN SOLUTION
nums = pd.Series([-20, -10, 10, 20])
### END SOLUTION

nums

0   -20
1   -10
2    10
3    20
dtype: int64

👉 Is there a way *filter* the `Series` so that it only contains **positive** values? Let's first try this **manually**.

▶️ Create a new `Series` named `keep` with the following four boolean values: `False`, `False`, `True`, `True`.

In [5]:
### BEGIN SOLUTION
keep = pd.Series([False, False, True, True])
### END SOLUTION

# Check your work
pd.testing.assert_series_equal(keep,
                              pd.Series([0, 0, 1, 1]).astype(bool))

# Display keep
keep

0    False
1    False
2     True
3     True
dtype: bool

Let's visualize the two `Series` (`nums` and `keep`) you've created.

![nums-and-keep](https://github.com/bdi475/images/blob/main/nums-and-keep-series.png?raw=true)

▶️ Now, you can use the boolean `Series` to filter another `Series`. Type in `nums[keep]` below and run the cell.

In [6]:
### BEGIN SOLUTION
nums[keep]
### END SOLUTION

2    10
3    20
dtype: int64

If you're confused about what just happened, the visualization below may give you a better idea.

![nums-and-keep-filter-result](https://github.com/bdi475/images/blob/main/nums-and-keep-filter-result.png?raw=true)

The syntax for filtering a `Series` is `my_series[keep]` where `keep` is a `Series` of boolean values indicating whether to keep an element or not. `keep` should have the exact same number of elements as `my_series`.

▶️ **Uncomment the code cell below first** and run it to see what happens when your `keep` does not have the same number of elements as `my_series`.

(⛔️ **Heads-up**: The code will throw an error! Once you're done running the cell, comment the lines.)

In [7]:
# keep_incorrect = pd.Series([False, False, True])
# nums[keep_incorrect]

👉 Is there a way *filter* the `Series` so that it only contains **positive** values? The last method we've used was inefficient. Imagine if your `Series` contains million elements. You would need to spend a few months continuously typing `True` and `False`! 🤡

As a data analyst, your goal is to perform tasks *programmatically*.

▶️ Type `keep_by_comparison = nums > 0` in the code cell below to perform a comparison on the `nums` Series.

In [8]:
### BEGIN SOLUTION
keep_by_comparison = nums > 0
### END SOLUTION

keep_by_comparison

0    False
1    False
2     True
3     True
dtype: bool

Notice how `keep_by_comparison` is idential to the original `keep` Series?

▶️ Use the `keep_by_comparison` to filter positive values in `nums`.

In [9]:
### BEGIN SOLUTION
nums[keep_by_comparison]
### END SOLUTION

2    10
3    20
dtype: int64

Note that applying a filter returns **a new `Series`** without modifying the original `Series`.

▶️ Run the code below.

In [10]:
print("Negative Values (filtered):")
display(nums[nums < 10])

print("\n\nOriginal Values:")
display(nums)

Negative Values (filtered):


0   -20
1   -10
dtype: int64



Original Values:


0   -20
1   -10
2    10
3    20
dtype: int64

---

### 🎯 Challenge 1: Filter even numbers

#### 👇 Tasks

- ✔️ Using `all_nums`, filter only even numbers.
    - Store the result to a new variable named `even_nums`.
- ✔️ `all_nums` should remain unaltered after your code.

#### 🚀 Hints

- Use the modulo operator (`%`) to check whether a number is even.
    - `some_num % 2 == 0`

In [11]:
all_nums = pd.Series([2, 5, 4, 8, -2, -5, -11, 13, 4])

### BEGIN SOLUTION
even_nums = all_nums[all_nums % 2 == 0]
### END SOLUTION

even_nums

0    2
2    4
3    8
4   -2
8    4
dtype: int64

#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [12]:
pd.testing.assert_series_equal(all_nums, pd.Series([2, 5, 4, 8, -2, -5, -11, 13, 4]))
pd.testing.assert_series_equal(even_nums.reset_index(drop=True),
                               pd.Series([2, 4, 8, -2, 4]))

---

### 📌 Filtering a `DataFrame`

👉 I will keep saying this. A `DataFrame` is a combination of one or more columns. Filtering a `DataFrame` is very similar to filtering a `Series`.

▶️ Run the code cell below to create a new `DataFrame` named `df`.

In [13]:
df = pd.DataFrame({"name": ["John", "Mary", "Tom", "John"], "amount": [-20, -10, 10, 20]})

df

Unnamed: 0,name,amount
0,John,-20
1,Mary,-10
2,Tom,10
3,John,20


To only keep rows where the `name` is `'John'`, we can again supply a `Series` of boolean values. Only the first and last row of the `DataFrame` contain `'John'`.

▶️ Create a new `Series` named `is_john` with the following boolean values - `True`, `False`, `False`, `True`.

In [14]:
### BEGIN SOLUTION
is_john = pd.Series([True, False, False, True])
### END SOLUTION

# Check your work
tc.assertEqual(is_john.to_list(), pd.Series([1, 0, 0, 1]).astype(bool).to_list())

# Display keep
is_john

0     True
1    False
2    False
3     True
dtype: bool

▶️ Type `result = df[is_john]` in the code cell below and run it.

In [15]:
### BEGIN SOLUTION
result = df[is_john]
### END SOLUTION

result

Unnamed: 0,name,amount
0,John,-20
3,John,20


Here is a visualization of how `df[john]` works.

![mini-dataframe-filter-rows](https://github.com/bdi475/images/blob/main/filter-mini-dataframe-result.png?raw=true)

---

### 🎯 Challenge 2: Find all positive transactions

#### 👇 Tasks

- ✔️ Given `df`, filter rows with positive `amount` values.
    - Store the result to a new variable named `df_pos`.
    - `df_pos` should be a `DataFrame`.
- ✔️ `df` should remain unaltered after running your code.

▶️ Run the code cell below to create `df`.

In [16]:
# DO NOT CHANGE THE CODE IN THIS CELL
df = pd.DataFrame({"name": ["John", "Mary", "Tom", "John"], "amount": [-20, -10, 10, 20]})
df

Unnamed: 0,name,amount
0,John,-20
1,Mary,-10
2,Tom,10
3,John,20


In [17]:
### BEGIN SOLUTION
df_pos = df[df["amount"] > 0]
### END SOLUTION

df_pos

Unnamed: 0,name,amount
2,Tom,10
3,John,20


#### 🧭 Check your work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [18]:
df_check = pd.DataFrame({"name": ["John", "Mary", "Tom", "John"], "amount": [-20, -10, 10, 20]})

pd.testing.assert_frame_equal(df, df_check)
pd.testing.assert_frame_equal(df_pos.reset_index(drop=True),
                              df_check.iloc[[2, 3]].reset_index(drop=True))

---

### 📌 Logical operators in pandas `Series`

👉 There are only three *logical* operators in Pandas you need to remember.

- `&`: Logical **AND**
- `|`: Logical **OR**
- `~`: Logical **NOT**

These operators perform element-wise *logical* operations.

#### 📍 Logical AND

👉 A logical AND operator `&` returns `True` only if both the operands are `True`.

![s1_AND_s2](https://github.com/bdi475/images/blob/main/s1-AND-s2.png?raw=true)

▶️ Perform a logical AND operation (`&`) on `s1` and `s2` and store the result to a new variable named `s1_AND_s2`.

In [19]:
s1 = pd.Series([True, True, False, False])
s2 = pd.Series([True, False, True, False])

### BEGIN SOLUTION
s1_AND_s2 = s1 & s2
### END SOLUTION

# 🧭 Check your work
pd.testing.assert_series_equal(s1_AND_s2, pd.Series([1, 0, 0, 0]).astype(bool))

# Display s1, s2, s1_AND_S2 together as a DataFrame
pd.DataFrame({"s1": s1, "s2": s2, "s1_AND_s2": s1_AND_s2})

Unnamed: 0,s1,s2,s1_AND_s2
0,True,True,True
1,True,False,False
2,False,True,False
3,False,False,False


#### 📍 Logical OR

👉 A logical OR operator `|` returns `True` if either of the operands is `True`.

![s1_OR_s2](https://github.com/bdi475/images/blob/main/s1-OR-s2.png?raw=true)

▶️ Perform a logical OR operation (`|`) on `s1` and `s2` and store the result to a new variable named `s1_OR_s2`.

In [20]:
s1 = pd.Series([True, True, False, False])
s2 = pd.Series([True, False, True, False])

### BEGIN SOLUTION
s1_OR_s2 = s1 | s2
### END SOLUTION

# 🧭 Check your work
pd.testing.assert_series_equal(s1_OR_s2, pd.Series([1, 1, 1, 0]).astype(bool))

# Display s1, s2, s1_OR_s2 together as a DataFrame
pd.DataFrame({"s1": s1,
              "s2": s2,
              "s1_OR_s2": s1_OR_s2})

Unnamed: 0,s1,s2,s1_OR_s2
0,True,True,True
1,True,False,True
2,False,True,True
3,False,False,False


#### 📍 Logical NOT

👉 A logical NOT operator `~` reverses each operand.

![NOT_s1](https://github.com/bdi475/images/blob/main/NOT-s1.png?raw=true)

▶️ Perform a logical OR operation (`~`) on `s1` and store the result to a new variable named `NOT_s1`.

In [21]:
s1 = pd.Series([True, True, False, False])

### BEGIN SOLUTION
NOT_s1 = ~s1
### END SOLUTION

# 🧭 Check your work
pd.testing.assert_series_equal(NOT_s1, pd.Series([0, 0, 1, 1]).astype(bool))

# Display s1 and NOT_s1 together as a DataFrame
pd.DataFrame({"s1": s1,
              "NOT_s1": NOT_s1})

Unnamed: 0,s1,NOT_s1
0,True,False
1,True,False
2,False,True
3,False,True


---

### 🎯 Challenge 3: Find John's positive transaction(s)

#### 👇 Tasks

- ✔️ Given `df`, find rows where the name is `'John'` **and** the amount is positive.
    - Store the result to a new variable named `df_john_pos`.
    - `df_john_and_pos` should be a `DataFrame`.
- ✔️ `df` should remain unaltered after running your code.

#### 🚀 Hints

- Create a boolean Series `is_john` using an equality comparison (`df['name'] == 'John'`).
- Create another boolean Series `is_positive` using a *greather than* comparison (`df['amount'] > 0`).
- Use a logical AND operator `&` to combine `is_john` and `is_positive`.

▶️ Run the code cell below to create `df`.

In [22]:
# DO NOT CHANGE THE CODE IN THIS CELL
df = pd.DataFrame({"name": ["John", "Mary", "Tom", "John"], "amount": [-20, -10, 10, 20]})
df

Unnamed: 0,name,amount
0,John,-20
1,Mary,-10
2,Tom,10
3,John,20


In [23]:
### BEGIN SOLUTION
is_john = df["name"] == "John"
is_positive = df["amount"] > 0

df_john_and_pos = df[is_john & is_positive]
### END SOLUTION

df_john_and_pos

Unnamed: 0,name,amount
3,John,20


#### 🧭 Check your work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [24]:
df_check = pd.DataFrame({"name": ["John", "Mary", "Tom", "John"], "amount": [-20, -10, 10, 20]})

pd.testing.assert_frame_equal(df, df_check)
pd.testing.assert_frame_equal(df_john_and_pos.reset_index(drop=True),
                              df_check.iloc[[3]].reset_index(drop=True))

#### ⚜️ A diagram to help your understanding
![is_john_AND_is_positive](https://github.com/bdi475/images/blob/main/is-john-AND-is-positive.png?raw=true)

---

### 🎯 Challenge 4: Find transactions that are made by John OR are positive

#### 👇 Tasks

- ✔️ Given `df`, find rows where the name is `"John"` **or** the amount is positive.
    - Store the result to a new variable named `df_john_or_pos`.
    - `df_john_or_pos` should be a `DataFrame`.
- ✔️ `df` should remain unaltered after running your code.

#### 🚀 Hints

- Create a boolean Series `is_john` using an equality comparison (`df['name'] == 'John'`).
- Create another boolean Series `is_positive` using a *greather than* comparison (`df['amount'] > 0`).
- Use a logical OR operator `|` to combine `is_john` and `is_positive`.

▶️ Run the code cell below to create `df`.

In [25]:
# DO NOT CHANGE THE CODE IN THIS CELL
df = pd.DataFrame({"name": ["John", "Mary", "Tom", "John"], "amount": [-20, -10, 10, 20]})
df

Unnamed: 0,name,amount
0,John,-20
1,Mary,-10
2,Tom,10
3,John,20


In [26]:
### BEGIN SOLUTION
is_john = df["name"] == "John"
is_positive = df["amount"] > 0

df_john_or_pos = df[is_john | is_positive]
### END SOLUTION

df_john_or_pos

Unnamed: 0,name,amount
0,John,-20
2,Tom,10
3,John,20


#### 🧭 Check your work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [27]:
df_check = pd.DataFrame({"name": ["John", "Mary", "Tom", "John"], "amount": [-20, -10, 10, 20]})

pd.testing.assert_frame_equal(df, df_check)
pd.testing.assert_frame_equal(df_john_or_pos.reset_index(drop=True),
                              df_check.iloc[[0, 2, 3]].reset_index(drop=True))

#### ⚜️ A diagram to help your understanding
![is_john_OR_is_positive](https://github.com/bdi475/images/blob/main/is-john-OR-is-positive.png?raw=true)

---

### 🎯 Challenge 5: Find transactions that are NOT made by John

#### 👇 Tasks

- ✔️ Given `df`, find rows where the name is NOT `'John'`.
    - Store the result to a new variable named `df_not_john`.
    - `df_not_john` should be a `DataFrame`.
- ✔️ Although you can do this without the NOT operator (`~`), **your goal is to use `~`**.
- ✔️ `df` should remain unaltered after running your code.

#### 🚀 Hints

- Create a boolean Series `is_john` using an equality comparison (`df['name'] == 'John'`).
- Use a logical NOT operator `~` to reverse `is_john`.

▶️ Run the code cell below to create `df`.

In [28]:
# DO NOT CHANGE THE CODE IN THIS CELL
df = pd.DataFrame({"name": ["John", "Mary", "Tom", "John"], "amount": [-20, -10, 10, 20]})
df

Unnamed: 0,name,amount
0,John,-20
1,Mary,-10
2,Tom,10
3,John,20


In [29]:
### BEGIN SOLUTION
is_john = df["name"] == "John"

df_not_john = df[~is_john]
### END SOLUTION

df_not_john

Unnamed: 0,name,amount
1,Mary,-10
2,Tom,10


#### 🧭 Check your work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [30]:
df_check = pd.DataFrame({"name": ["John", "Mary", "Tom", "John"], "amount": [-20, -10, 10, 20]})

pd.testing.assert_frame_equal(df, df_check)
pd.testing.assert_frame_equal(df_not_john.reset_index(drop=True),
                              df_check.iloc[[1, 2]].reset_index(drop=True))

#### ⚜️ A diagram to help your understanding
![not_john](https://github.com/bdi475/images/blob/main/not-john.png?raw=true)

---

### 📌 Element-wise comparison in a `Series`

▶️ Run the code cell below to create a new `Series` named `countries`.

In [31]:
countries = pd.Series(["United States", "Oman", "United States",
                       "China", "South Korea", "United States"])

display(countries)

0    United States
1             Oman
2    United States
3            China
4      South Korea
5    United States
dtype: object

What happens when you perform an equality comparison on strings?

▶️ Compare `countries` with the string `'United States'` using an equality comparison operator (`==`).

In [32]:
### BEGIN SOLUTION
countries == "United States"
### END SOLUTION

0     True
1    False
2     True
3    False
4    False
5     True
dtype: bool

▶️ Run the code cell below to check the data type of the result.

In [33]:
type(countries == "United States")

pandas.core.series.Series

The result is **another `Series`** containing boolean (`True`/`False`) values. Pandas performs a string comparison (`my_str == 'United States'`) on **each element**.

Remember, you can also supply more than one condition using the following two operators:

1. logical OR (`|`)
2. logical AND (`&`)

▶️ Run the code cell below to check whether a country is **either** `'Oman'` **or** `'China'`.

In [34]:
(countries == "Oman") | (countries == "China")

0    False
1     True
2    False
3     True
4    False
5    False
dtype: bool

In [35]:
countries[(countries == "Oman") | (countries == "China")]

1     Oman
3    China
dtype: object

---

### 📌 Another example of filtering a `DataFrame`

▶️ Run the code cell below to create a new `DataFrame` named `df_cities`.

In [36]:
df_cities = pd.DataFrame({"city": ["Lisle", "Dubai", "Niles", "Shanghai", "Seoul", "Chicago"],
 "country": ["United States", "United Arab Emirates", "United States", "China", "South Korea", "United States"],
 "population": [23270, 3331409, 28938, 26320000, 21794000, 8604203]})

df_cities

Unnamed: 0,city,country,population
0,Lisle,United States,23270
1,Dubai,United Arab Emirates,3331409
2,Niles,United States,28938
3,Shanghai,China,26320000
4,Seoul,South Korea,21794000
5,Chicago,United States,8604203


To only keep rows where the `country` is `'United States'`, we can again supply a `Series` of boolean values.

▶️ Create a new `Series` named `keep` with the following 6 boolean values - `True`, `False`, `True`, `False`, `False`, `True`.

In [37]:
### BEGIN SOLUTION
keep = pd.Series([True, False, True, False, False, True])
# OR
keep = df_cities["country"] == "United States"
### END SOLUTION

# Check your work
pd.testing.assert_series_equal(keep.reset_index(drop=True),
                               pd.Series([1, 0, 1, 0, 0, 1]).astype(bool).reset_index(drop=True),
                               check_names=False)

# Display keep
keep

0     True
1    False
2     True
3    False
4    False
5     True
Name: country, dtype: bool

🤠 You know the drill now.

▶️ Type `df_cities[keep]` in the code cell below and run it.

In [38]:
### BEGIN SOLUTION
df_cities[keep]
### END SOLUTION

Unnamed: 0,city,country,population
0,Lisle,United States,23270
2,Niles,United States,28938
5,Chicago,United States,8604203


---

### 🎯 Challenge 6: Cities with population over a million

#### 👇 Tasks

- ✔️ Using `df_cities`, filter rows with a population greater than a million (`1000000`).
    - Store the result to a new variable named `df_large_cities`.
- ✔️ `df_cities` should remain unaltered after your code.

In [39]:
### BEGIN SOLUTION
df_large_cities = df_cities[df_cities['population'] > 1000000]
### END SOLUTION

df_large_cities

Unnamed: 0,city,country,population
1,Dubai,United Arab Emirates,3331409
3,Shanghai,China,26320000
4,Seoul,South Korea,21794000
5,Chicago,United States,8604203


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [40]:
pd.testing.assert_frame_equal(df_large_cities.reset_index(drop=True),
                              df_cities.query('population > 1000000').reset_index(drop=True))

---

## 👉 Sorting a `DataFrame`

You can sort a `DataFrame` using `df.sort_values()`.

![sort_values usage](https://github.com/bdi475/images/blob/main/pandas/sort-values-01.png?raw=true)

▶️ Run the code cell below to **sort** `df_cities` by `population`.

In [41]:
df_cities.sort_values('population')

Unnamed: 0,city,country,population
0,Lisle,United States,23270
2,Niles,United States,28938
1,Dubai,United Arab Emirates,3331409
5,Chicago,United States,8604203
4,Seoul,South Korea,21794000
3,Shanghai,China,26320000


▶️ Run the code cell below to **sort** `df_cities` by `population` in descending order.

In [42]:
df_cities.sort_values('population', ascending=False)

Unnamed: 0,city,country,population
3,Shanghai,China,26320000
4,Seoul,South Korea,21794000
5,Chicago,United States,8604203
1,Dubai,United Arab Emirates,3331409
2,Niles,United States,28938
0,Lisle,United States,23270


---

## Exercises using the Yeezys dataset

### 📌 Load data

▶️ Run the code cell below to create a new `DataFrame` named `df_sneakers`.

In [43]:
df_sneakers = pd.read_csv("yeezy_sneakers.csv")

# Used to keep a clean copy
df_sneakers_backup = df_sneakers.copy()

# head() displays the first 5 rows of a DataFrame
df_sneakers.head()

Unnamed: 0,brand,product,price
0,Adidas,Yeezy 750 Boost Light Brown,1578
1,Adidas,Yeezy 350 Boost Pirate Black,910
2,Adidas,Yeezy Boost 350 V2 Lundmark Reflective,1009
3,Adidas,Yeezy 350 Boost V2 Black/Red,954
4,Nike,Air Yeezy Blink,3142


The table below describes the columns in `df_sneakers`.

| Column Name             | Description           |
|-------------------------|-----------------------|
| brand                   | Brand of the sneaker  |
| product                 | Name of the sneaker   |
| price                   | Price of the sneaker  |

---

### 📌 Concise summary of a `DataFrame`

▶️ Print out a summary of the `df_sneakers` using the `info()` method.

In [44]:
### BEGIN SOLUTION
df_sneakers.info()
### END SOLUTION

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   brand    15 non-null     object
 1   product  15 non-null     object
 2   price    15 non-null     int64 
dtypes: int64(1), object(2)
memory usage: 488.0+ bytes


---

### 📌 Number of rows and columns in a `DataFrame`

👉 How many rows and columns does `df_sneakers` have?

▶️ Run `df_sneakers.shape` below to see the *shape* of the database.

In [45]:
### BEGIN SOLUTION
df_sneakers.shape
### END SOLUTION

(15, 3)

---

### 🎯 Challenge 7: Find the number of rows and columns in a `DataFrame`

#### 👇 Tasks

- ✔️ Store the number of rows in `df_sneakers` to a new variable named `num_rows`.
- ✔️ Store the number of columns in `df_sneakers` to a new variable named `num_cols`.
- ✔️ Use `.shape`, not `len()`.

In [46]:
### BEGIN SOLUTION
num_rows = df_sneakers.shape[0]
num_cols = df_sneakers.shape[1]
### END SOLUTION

print(num_rows)
print(num_cols)

15
3


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [47]:
tc.assertEqual(num_rows, len(df_sneakers.index), f"Number of rows should be {len(df_sneakers.index)}")
tc.assertEqual(num_cols, len(df_sneakers.columns), f"Number of columns should be {len(df_sneakers.columns)}")

---

### 🎯 Challenge 8: Find `Adidas` sneakers

#### 👇 Tasks

- ✔️ Find Adidas sneakers and store the filtered result to `df_adidas`.
- ✔️ `df_sneakers` should remain unaltered.

#### 🔑 Expected Output

|    | brand   | product                                |   price |
|---:|:--------|:---------------------------------------|--------:|
|  0 | Adidas  | Yeezy 750 Boost Light Brown            |    1578 |
|  1 | Adidas  | Yeezy 350 Boost Pirate Black           |     910 |
|  2 | Adidas  | Yeezy Boost 350 V2 Lundmark Reflective |    1009 |
|  3 | Adidas  | Yeezy 350 Boost V2 Black/Red           |     954 |
| 10 | Adidas  | Yeezy Boost 350 V2 Black Reflective    |    1437 |
| 11 | Adidas  | Yeezy Boost 350 V2 Antlia Reflective   |     912 |
| 12 | Adidas  | Yeezy Boost 350 V2 Synth Reflective    |    1292 |
| 13 | Adidas  | Yeezy 350 Boost Turtledove             |    1279 |
| 14 | Adidas  | Yeezy 750 Boost Glow in the Dark       |     917 |

In [48]:
### BEGIN SOLUTION
df_adidas = df_sneakers[df_sneakers["brand"] == "Adidas"]
### END SOLUTION

df_adidas

Unnamed: 0,brand,product,price
0,Adidas,Yeezy 750 Boost Light Brown,1578
1,Adidas,Yeezy 350 Boost Pirate Black,910
2,Adidas,Yeezy Boost 350 V2 Lundmark Reflective,1009
3,Adidas,Yeezy 350 Boost V2 Black/Red,954
10,Adidas,Yeezy Boost 350 V2 Black Reflective,1437
11,Adidas,Yeezy Boost 350 V2 Antlia Reflective,912
12,Adidas,Yeezy Boost 350 V2 Synth Reflective,1292
13,Adidas,Yeezy 350 Boost Turtledove,1279
14,Adidas,Yeezy 750 Boost Glow in the Dark,917


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [49]:
df_sneakers_copy = df_sneakers_backup.copy()

pd.testing.assert_frame_equal(
    df_sneakers_copy \
        .query("brand == 'Adidas'") \
        .sort_values(df_sneakers_copy.columns.to_list()) \
        .reset_index(drop=True),
    df_adidas.reset_index(drop=True) \
        .sort_values(df_adidas.columns.to_list()) \
        .reset_index(drop=True)
)

---

### 🎯 Challenge 9: Find Sneakers under \\$1,000

#### 👇 Tasks

- ✔️ Find sneakers under \\$1,000 and store the filtered result to `df_under_1000`.
- ✔️ `df_sneakers` should remain unaltered.

#### 🔑 Expected Output

|    | brand   | product                              |   price |
|---:|:--------|:-------------------------------------|--------:|
|  1 | Adidas  | Yeezy 350 Boost Pirate Black         |     910 |
|  3 | Adidas  | Yeezy 350 Boost V2 Black/Red         |     954 |
| 11 | Adidas  | Yeezy Boost 350 V2 Antlia Reflective |     912 |
| 14 | Adidas  | Yeezy 750 Boost Glow in the Dark     |     917 |

In [50]:
### BEGIN SOLUTION
df_under_1000 = df_sneakers[df_sneakers["price"] < 1000]
### END SOLUTION

df_under_1000

Unnamed: 0,brand,product,price
1,Adidas,Yeezy 350 Boost Pirate Black,910
3,Adidas,Yeezy 350 Boost V2 Black/Red,954
11,Adidas,Yeezy Boost 350 V2 Antlia Reflective,912
14,Adidas,Yeezy 750 Boost Glow in the Dark,917


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [51]:
df_sneakers_copy = df_sneakers_backup.copy()

pd.testing.assert_frame_equal(
    df_sneakers_copy \
        .query("price < 1000") \
        .sort_values(df_sneakers_copy.columns.to_list()) \
        .reset_index(drop=True),
    df_under_1000.reset_index(drop=True) \
        .sort_values(df_under_1000.columns.to_list()) \
        .reset_index(drop=True)
)

---

### 🎯 Challenge 10: Find `Nike` Sneakers over \\$3,000

#### 👇 Tasks

- ✔️ Find Nike sneakers over \$3,000 and store the filtered result to `df_nike_over_3000`.
- ✔️ `df_sneakers` should remain unaltered.

#### 🔑 Expected Output

|    | brand   | product                   |   price |
|---:|:--------|:--------------------------|--------:|
|  4 | Nike    | Air Yeezy Blink           |    3142 |
|  5 | Nike    | Air Yeezy 2 Red October   |    6075 |
|  6 | Nike    | Air Yeezy 2 Solar Red     |    4239 |
|  7 | Nike    | Air Yeezy 2 Pure Platinum |    3448 |

In [52]:
### BEGIN SOLUTION
df_nike_over_3000 = df_sneakers[(df_sneakers["brand"] == "Nike") & (df_sneakers["price"] > 3000)]
### END SOLUTION

df_nike_over_3000

Unnamed: 0,brand,product,price
4,Nike,Air Yeezy Blink,3142
5,Nike,Air Yeezy 2 Red October,6075
6,Nike,Air Yeezy 2 Solar Red,4239
7,Nike,Air Yeezy 2 Pure Platinum,3448


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [53]:
df_sneakers_copy = df_sneakers_backup.copy()

pd.testing.assert_frame_equal(
    df_sneakers_copy \
        .query("(brand == 'Nike') & (price > 3000)") \
        .sort_values(df_sneakers_copy.columns.to_list()) \
        .reset_index(drop=True),
    df_nike_over_3000.reset_index(drop=True) \
        .sort_values(df_nike_over_3000.columns.to_list()) \
        .reset_index(drop=True)
)

---

### 🎯 Challenge 11: Sort sneakers by price in descending order

#### 👇 Tasks

- ✔️ Sort sneakers by price in descending order and store the result to `df_sorted_by_price_desc`.
- ✔️ `df_sneakers` should remain unaltered.

#### 🔑 Expected Output

|    | brand   | product                                |   price |
|---:|:--------|:---------------------------------------|--------:|
|  5 | Nike    | Air Yeezy 2 Red October                |    6075 |
|  6 | Nike    | Air Yeezy 2 Solar Red                  |    4239 |
|  7 | Nike    | Air Yeezy 2 Pure Platinum              |    3448 |
|  4 | Nike    | Air Yeezy Blink                        |    3142 |
|  9 | Nike    | Air Yeezy Zen Grey                     |    2139 |
|  8 | Nike    | Air Yeezy Net                          |    1888 |
|  0 | Adidas  | Yeezy 750 Boost Light Brown            |    1578 |
| 10 | Adidas  | Yeezy Boost 350 V2 Black Reflective    |    1437 |
| 12 | Adidas  | Yeezy Boost 350 V2 Synth Reflective    |    1292 |
| 13 | Adidas  | Yeezy 350 Boost Turtledove             |    1279 |
|  2 | Adidas  | Yeezy Boost 350 V2 Lundmark Reflective |    1009 |
|  3 | Adidas  | Yeezy 350 Boost V2 Black/Red           |     954 |
| 14 | Adidas  | Yeezy 750 Boost Glow in the Dark       |     917 |
| 11 | Adidas  | Yeezy Boost 350 V2 Antlia Reflective   |     912 |
|  1 | Adidas  | Yeezy 350 Boost Pirate Black           |     910 |

In [54]:
### BEGIN SOLUTION
df_sorted_by_price_desc = df_sneakers.sort_values("price", ascending=False)
### END SOLUTION

df_sorted_by_price_desc

Unnamed: 0,brand,product,price
5,Nike,Air Yeezy 2 Red October,6075
6,Nike,Air Yeezy 2 Solar Red,4239
7,Nike,Air Yeezy 2 Pure Platinum,3448
4,Nike,Air Yeezy Blink,3142
9,Nike,Air Yeezy Zen Grey,2139
8,Nike,Air Yeezy Net,1888
0,Adidas,Yeezy 750 Boost Light Brown,1578
10,Adidas,Yeezy Boost 350 V2 Black Reflective,1437
12,Adidas,Yeezy Boost 350 V2 Synth Reflective,1292
13,Adidas,Yeezy 350 Boost Turtledove,1279


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [55]:
df_sneakers_copy = df_sneakers_backup.copy()

pd.testing.assert_frame_equal(
    df_sorted_by_price_desc \
        .reset_index(drop=True),
    df_sneakers_copy.sort_values("price").iloc[::-1] \
        .reset_index(drop=True)
)

---

### 🎯 Challenge 12: Sneakers `> 6000` or `< 1000`

#### 👇 Tasks

- ✔️ Find sneakers that are over \\$6,000 **or** under \\$1,000.
- ✔️ Store the result to a new DataFrame named `df_polar`.
- ✔️ `df_sneakers` should remain unaltered.

#### 🔑 Expected Output

|    | brand   | product                              |   price |
|---:|:--------|:-------------------------------------|--------:|
|  1 | Adidas  | Yeezy 350 Boost Pirate Black         |     910 |
|  3 | Adidas  | Yeezy 350 Boost V2 Black/Red         |     954 |
|  5 | Nike    | Air Yeezy 2 Red October              |    6075 |
| 11 | Adidas  | Yeezy Boost 350 V2 Antlia Reflective |     912 |
| 14 | Adidas  | Yeezy 750 Boost Glow in the Dark     |     917 |

In [56]:
### BEGIN SOLUTION
df_polar = df_sneakers[(df_sneakers["price"] > 6000) | (df_sneakers["price"] < 1000)]
### END SOLUTION

df_polar

Unnamed: 0,brand,product,price
1,Adidas,Yeezy 350 Boost Pirate Black,910
3,Adidas,Yeezy 350 Boost V2 Black/Red,954
5,Nike,Air Yeezy 2 Red October,6075
11,Adidas,Yeezy Boost 350 V2 Antlia Reflective,912
14,Adidas,Yeezy 750 Boost Glow in the Dark,917


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [57]:
df_sneakers_copy = df_sneakers_backup.copy()

pd.testing.assert_frame_equal(
    df_polar.reset_index(drop=True) \
        .sort_values(df_polar.columns.to_list()) \
        .reset_index(drop=True),
    df_sneakers_copy \
        .query("(price > 6000) | (price < 1000)") \
        .sort_values(df_sneakers_copy.columns.to_list()) \
        .reset_index(drop=True)
)

---

### 🎯 Challenge 13: Sort by brand ascending and price descending

#### 👇 Tasks

- ✔️ Sort `df_sneakers` by (1) brand ascending and (2) price descending within each brand.
- ✔️ Store the result to a new DataFrame named `df_sorted_by_brand_price`.
- ✔️ `df_sneakers` should remain unaltered.

#### 🔑 Expected Output

|    | brand   | product                                |   price |
|---:|:--------|:---------------------------------------|--------:|
|  0 | Adidas  | Yeezy 750 Boost Light Brown            |    1578 |
| 10 | Adidas  | Yeezy Boost 350 V2 Black Reflective    |    1437 |
| 12 | Adidas  | Yeezy Boost 350 V2 Synth Reflective    |    1292 |
| 13 | Adidas  | Yeezy 350 Boost Turtledove             |    1279 |
|  2 | Adidas  | Yeezy Boost 350 V2 Lundmark Reflective |    1009 |
|  3 | Adidas  | Yeezy 350 Boost V2 Black/Red           |     954 |
| 14 | Adidas  | Yeezy 750 Boost Glow in the Dark       |     917 |
| 11 | Adidas  | Yeezy Boost 350 V2 Antlia Reflective   |     912 |
|  1 | Adidas  | Yeezy 350 Boost Pirate Black           |     910 |
|  5 | Nike    | Air Yeezy 2 Red October                |    6075 |
|  6 | Nike    | Air Yeezy 2 Solar Red                  |    4239 |
|  7 | Nike    | Air Yeezy 2 Pure Platinum              |    3448 |
|  4 | Nike    | Air Yeezy Blink                        |    3142 |
|  9 | Nike    | Air Yeezy Zen Grey                     |    2139 |
|  8 | Nike    | Air Yeezy Net                          |    1888 |

In [58]:
### BEGIN SOLUTION
df_sorted_by_brand_price = df_sneakers.sort_values(["brand", "price"], ascending=[True, False])
### END SOLUTION

df_sorted_by_brand_price

Unnamed: 0,brand,product,price
0,Adidas,Yeezy 750 Boost Light Brown,1578
10,Adidas,Yeezy Boost 350 V2 Black Reflective,1437
12,Adidas,Yeezy Boost 350 V2 Synth Reflective,1292
13,Adidas,Yeezy 350 Boost Turtledove,1279
2,Adidas,Yeezy Boost 350 V2 Lundmark Reflective,1009
3,Adidas,Yeezy 350 Boost V2 Black/Red,954
14,Adidas,Yeezy 750 Boost Glow in the Dark,917
11,Adidas,Yeezy Boost 350 V2 Antlia Reflective,912
1,Adidas,Yeezy 350 Boost Pirate Black,910
5,Nike,Air Yeezy 2 Red October,6075


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [59]:
df_sneakers_copy = df_sneakers_backup.copy()

pd.testing.assert_frame_equal(
    df_sorted_by_brand_price \
        .reset_index(drop=True),
    df_sneakers_copy.sort_values(["brand", "price"], ascending=[False, True]).iloc[::-1] \
        .reset_index(drop=True)
)