# Lecture 13 - Pandas Filter Recap, Sort

Wednesday 2021/03/10

## Lecture Notes and in-class exercises

▶️ First, run the code cell below to import `unittest`, a module used for **🧭 Check Your Work** sections and the autograder.

In [1]:
import unittest
tc = unittest.TestCase()

#### 👇 Tasks

- ✔️ Import the following Python packages.
    1. `pandas`: Use alias `pd`.
    2. `numpy`: Use alias `np`.

In [2]:
# YOUR CODE BEGINS
import pandas as pd
import numpy as np
# YOUR CODE ENDS

#### 🧭 Check your work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [3]:
import sys
tc.assertTrue('pd' in globals(), 'Check whether you have correctly import Pandas with an alias.')
tc.assertTrue('np' in globals(), 'Check whether you have correctly import NumPy with an alias.')

---

### 📌 Filtering rows in a `Series`

We're going to start today's lesson with filtering a `Series`. We discussed how to filter a `Series` last time. But understanding how filtering works is so important that it deserves another discussion! 😺

▶️ Create a `Series` named `nums` with the following four integers: `-20`, `-10`, `10`, `20`. 

In [4]:
# YOUR CODE BEGINS
nums = pd.Series([-20, -10, 10, 20])
# YOUR CODE ENDS

nums

0   -20
1   -10
2    10
3    20
dtype: int64

👉 Is there a way to *filter* the `Series` so that it only contains **positive** values? Let's first try this **manually**.

▶️ Create a new `Series` named `keep` with the following four boolean values: `False`, `False`, `True`, `True`.

In [5]:
# YOUR CODE BEGINS
keep = pd.Series([False, False, True, True])
# YOUR CODE ENDS

# Check your work
pd.testing.assert_series_equal(keep,
                              pd.Series([0, 0, 1, 1]).astype(bool))

# Display keep
keep

0    False
1    False
2     True
3     True
dtype: bool

Let's visualize the two `Series` (`nums` and `keep`) you've created.

![nums-and-keep](https://github.com/bdi475/images/blob/main/nums-and-keep-series.png?raw=true)

The syntax for filtering a `Series` is `my_series[my_filter]` where `my_filter` is a `Series` of boolean values indicating whether to keep an element or not. `my_filter` should have the exact same number of elements as `my_series`.

▶️ Now, you can use the boolean `Series` to filter another `Series`. Type in `nums[keep]` below and run the cell.

In [6]:
# YOUR CODE BEGINS
nums[keep]
# YOUR CODE ENDS

2    10
3    20
dtype: int64

If you're confused about what just happened, the visualization below may give you a better idea.

![nums-and-keep-filter-result](https://github.com/bdi475/images/blob/main/nums-and-keep-filter-result.png?raw=true)

Last time, we've created our `keep` Series manually by typing `True`s and `False`s. We should do this programmatically.

▶️ Type `keep2 = nums > 0` in the code cell below to perform a comparison on the `nums` Series.

In [7]:
# YOUR CODE BEGINS
keep2 = nums > 0
# YOUR CODE ENDS

keep2

0    False
1    False
2     True
3     True
dtype: bool

▶️ Use the `keep2` to filter positive values in `nums`.

In [8]:
# YOUR CODE BEGINS
nums[keep2]
# YOUR CODE ENDS

2    10
3    20
dtype: int64

---

### 🎯 Mini-exercise: Filter large transactions

#### 👇 Tasks

- ✔️ Using `transactions`, filter amounts that exceed $10,000.
    - Store the result to a new variable named `large_transactions`.
    - `large_transactions` should be a `Series` type.
- ✔️ `transactions` should remain unaltered after running your code.

In [9]:
transactions = pd.Series([8161.7, 11873.7, 11922.3, 9741.2, 11676.7, 
                          8375.6, 7226.1, 5788.8, 8185.2, 9175.4])

# YOUR CODE BEGINS
large_transactions = transactions[transactions > 10000]
# YOUR CODE ENDS

large_transactions

1    11873.7
2    11922.3
4    11676.7
dtype: float64

#### 🧭 Check your work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [10]:
pd.testing.assert_series_equal(transactions,
                              pd.Series([8161.7, 11873.7, 11922.3, 9741.2, 11676.7, 
                                         8375.6, 7226.1, 5788.8, 8185.2, 9175.4]))

pd.testing.assert_series_equal(large_transactions.sort_values().reset_index(drop=True),
                              pd.Series([11676.7, 11873.7, 11922.3]))

---

### 🎯 Mini-exercise: Filter odd numbers

#### 👇 Tasks

- ✔️ Using `all_nums`, filter only odd numbers.
    - Store the result to a new variable named `odd_nums`.
    - `odd_nums` should be a `Series` type.
- ✔️ `all_nums` should remain unaltered after running your code.

#### 🚀 Hints

- Use the modulo operator (`%`) to check whether a number is odd.
    - `some_num % 2 == 1`

In [11]:
all_nums = pd.Series([2, 5, 4, 8, -2, -5, -11, 13, 4])

# YOUR CODE BEGINS
is_odd = all_nums % 2 == 1
odd_nums = all_nums[is_odd]
# YOUR CODE ENDS

odd_nums

1     5
5    -5
6   -11
7    13
dtype: int64

#### 🧭 Check your work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [12]:
pd.testing.assert_series_equal(all_nums, pd.Series([2, 5, 4, 8, -2, -5, -11, 13, 4]))
pd.testing.assert_series_equal(odd_nums.reset_index(drop=True),
                               pd.Series([5, -5, -11, 13]))

---

### 🎯 Mini-exercise: Find all `John`s

#### 👇 Tasks

- ✔️ Given `names`, your goal is to find `John`s.
    - Store the result to a new variable named `johns`.
    - `johns` should be a `Series` type.
- ✔️ Use an equality comparison operator `==`.
- ✔️ `names` should remain unaltered after running your code.

In [13]:
names = pd.Series(['John', 'Mary', 'Tom', 'John'])

# YOUR CODE BEGINS
johns = names[names == 'John']
# YOUR CODE ENDS

johns

0    John
3    John
dtype: object

#### 🧭 Check your work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [14]:
pd.testing.assert_series_equal(names,
                              pd.Series(['John', 'Mary', 'Tom', 'John']))
pd.testing.assert_series_equal(johns.reset_index(drop=True),
                              pd.Series(['John', 'John']))

---

### 📌 Filtering a `DataFrame`

👉 I will keep saying this. A `DataFrame` is a combination of one or more columns. Filtering a `DataFrame` is very similar to filtering a `Series`.

▶️ Run the code cell below to create a new `DataFrame` named `df`.

In [15]:
df = pd.DataFrame({'name': ['John', 'Mary', 'Tom', 'John'], 'amount': [-20, -10, 10, 20]})

df

Unnamed: 0,name,amount
0,John,-20
1,Mary,-10
2,Tom,10
3,John,20


To only keep rows where the `name` is `'John'`, we can again supply a `Series` of boolean values. Only the first and last row of the `DataFrame` contain `'John'`.

▶️ Create a new `Series` named `is_john` with the following boolean values - `True`, `False`, `False`, `True`.

In [16]:
# YOUR CODE BEGINS
is_john = pd.Series([True, False, False, True])
# YOUR CODE ENDS

# Check your work
pd.testing.assert_series_equal(is_john,
                               pd.Series([1, 0, 0, 1]).astype(bool))

# Display keep
is_john

0     True
1    False
2    False
3     True
dtype: bool

▶️ Type `result = df[is_john]` in the code cell below and run it.

In [17]:
# YOUR CODE BEGINS
result = df[is_john]
# YOUR CODE ENDS

result

Unnamed: 0,name,amount
0,John,-20
3,John,20


Here is a visualization of how `df[john]` works.

![mini-dataframe-filter-rows](https://github.com/bdi475/images/blob/main/filter-mini-dataframe-result.png?raw=true)

---

### 🎯 Mini-exercise: Find all positive transactions

#### 👇 Tasks

- ✔️ Given `df`, filter rows with positive `amount` values.
    - Store the result to a new variable named `df_pos`.
    - `df_pos` should be a `DataFrame`.
- ✔️ `df` should remain unaltered after running your code.

▶️ Run the code cell below to create `df`.

In [18]:
# DO NOT CHANGE THE CODE IN THIS CELL
df = pd.DataFrame({'name': ['John', 'Mary', 'Tom', 'John'], 'amount': [-20, -10, 10, 20]})
df

Unnamed: 0,name,amount
0,John,-20
1,Mary,-10
2,Tom,10
3,John,20


In [19]:
# YOUR CODE BEGINS
df_pos = df[df['amount'] > 0]
# YOUR CODE ENDS

df_pos

Unnamed: 0,name,amount
2,Tom,10
3,John,20


#### 🧭 Check your work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [20]:
df_check = pd.DataFrame({'name': ['John', 'Mary', 'Tom', 'John'], 'amount': [-20, -10, 10, 20]})

pd.testing.assert_frame_equal(df, df_check)
pd.testing.assert_frame_equal(df_pos.reset_index(drop=True),
                              df_check.iloc[[2, 3]].reset_index(drop=True))

---

### 📌 Logical operators in pandas `Series`

👉 There are only three *logical* operators in Pandas you need to remember.

- `&`: Logical **AND**
- `|`: Logical **OR**
- `~`: Logical **NOT**

These operators perform element-wise *logical* operations.

#### 📍 Logical AND

👉 A logical AND operator `&` returns `True` only if both the operands are `True`.

![s1_AND_s2](https://github.com/bdi475/images/blob/main/s1-AND-s2.png?raw=true)

▶️ Perform a logical AND operation (`&`) on `s1` and `s2` and store the result to a new variable named `s1_AND_s2`.

In [21]:
s1 = pd.Series([True, True, False, False])
s2 = pd.Series([True, False, True, False])

# YOUR CODE BEGINS
s1_AND_s2 = s1 & s2
# YOUR CODE ENDS

# 🧭 Check your work
pd.testing.assert_series_equal(s1_AND_s2, pd.Series([1, 0, 0, 0]).astype(bool))

# Display s1, s2, s1_AND_S2 together as a DataFrame
pd.DataFrame({'s1': s1, 's2': s2, 's1_AND_s2': s1_AND_s2})

Unnamed: 0,s1,s2,s1_AND_s2
0,True,True,True
1,True,False,False
2,False,True,False
3,False,False,False


#### 📍 Logical OR

👉 A logical OR operator `|` returns `True` if either of the operands is `True`.

![s1_OR_s2](https://github.com/bdi475/images/blob/main/s1-OR-s2.png?raw=true)

▶️ Perform a logical OR operation (`|`) on `s1` and `s2` and store the result to a new variable named `s1_OR_s2`.

In [22]:
s1 = pd.Series([True, True, False, False])
s2 = pd.Series([True, False, True, False])

# YOUR CODE BEGINS
s1_OR_s2 = s1 | s2
# YOUR CODE ENDS

# 🧭 Check your work
pd.testing.assert_series_equal(s1_OR_s2, pd.Series([1, 1, 1, 0]).astype(bool))

# Display s1, s2, s1_OR_s2 together as a DataFrame
pd.DataFrame({'s1': s1,
              's2': s2,
              's1_OR_s2': s1_OR_s2})

Unnamed: 0,s1,s2,s1_OR_s2
0,True,True,True
1,True,False,True
2,False,True,True
3,False,False,False


#### 📍 Logical NOT

👉 A logical NOT operator `~` reverses each operand.

![NOT_s1](https://github.com/bdi475/images/blob/main/NOT-s1.png?raw=true)

▶️ Perform a logical OR operation (`~`) on `s1` and store the result to a new variable named `NOT_s1`.

In [23]:
s1 = pd.Series([True, True, False, False])

# YOUR CODE BEGINS
NOT_s1 = ~s1
# YOUR CODE ENDS

# 🧭 Check your work
pd.testing.assert_series_equal(NOT_s1, pd.Series([0, 0, 1, 1]).astype(bool))

# Display s1 and NOT_s1 together as a DataFrame
pd.DataFrame({'s1': s1,
              'NOT_s1': NOT_s1})

Unnamed: 0,s1,NOT_s1
0,True,False
1,True,False
2,False,True
3,False,True


---

### 🎯 Mini-exercise: Find John's positive transaction(s)

#### 👇 Tasks

- ✔️ Given `df`, find rows where the name is `'John'` **and** the amount is positive.
    - Store the result to a new variable named `df_john_pos`.
    - `df_john_and_pos` should be a `DataFrame`.
- ✔️ `df` should remain unaltered after running your code.

#### 🚀 Hints

- Create a boolean Series `is_john` using an equality comparison (`df['name'] == 'John'`).
- Create another boolean Series `is_positive` using a *greather than* comparison (`df['amount'] > 0`).
- Use a logical AND operator `&` to combine `is_john` and `is_positive`.

▶️ Run the code cell below to create `df`.

In [24]:
# DO NOT CHANGE THE CODE IN THIS CELL
df = pd.DataFrame({'name': ['John', 'Mary', 'Tom', 'John'], 'amount': [-20, -10, 10, 20]})
df

Unnamed: 0,name,amount
0,John,-20
1,Mary,-10
2,Tom,10
3,John,20


In [25]:
# YOUR CODE BEGINS
is_john = df['name'] == 'John'
is_positive = df['amount'] > 0

df_john_and_pos = df[is_john & is_positive]
# YOUR CODE ENDS

df_john_and_pos

Unnamed: 0,name,amount
3,John,20


#### 🧭 Check your work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [26]:
df_check = pd.DataFrame({'name': ['John', 'Mary', 'Tom', 'John'], 'amount': [-20, -10, 10, 20]})

pd.testing.assert_frame_equal(df, df_check)
pd.testing.assert_frame_equal(df_john_and_pos.reset_index(drop=True),
                              df_check.iloc[[3]].reset_index(drop=True))

#### ⚜️ A diagram to help your understanding
![is_john_AND_is_positive](https://github.com/bdi475/images/blob/main/is-john-AND-is-positive.png?raw=true)

---

### 🎯 Mini-exercise: Find transactions that are made by John OR are positive

#### 👇 Tasks

- ✔️ Given `df`, find rows where the name is `'John'` **or** the amount is positive.
    - Store the result to a new variable named `df_john_or_pos`.
    - `df_john_or_pos` should be a `DataFrame`.
- ✔️ `df` should remain unaltered after running your code.

#### 🚀 Hints

- Create a boolean Series `is_john` using an equality comparison (`df['name'] == 'John'`).
- Create another boolean Series `is_positive` using a *greather than* comparison (`df['amount'] > 0`).
- Use a logical OR operator `|` to combine `is_john` and `is_positive`.

▶️ Run the code cell below to create `df`.

In [27]:
# DO NOT CHANGE THE CODE IN THIS CELL
df = pd.DataFrame({'name': ['John', 'Mary', 'Tom', 'John'], 'amount': [-20, -10, 10, 20]})
df

Unnamed: 0,name,amount
0,John,-20
1,Mary,-10
2,Tom,10
3,John,20


In [28]:
# YOUR CODE BEGINS
is_john = df['name'] == 'John'
is_positive = df['amount'] > 0

df_john_or_pos = df[is_john | is_positive]
# YOUR CODE ENDS

df_john_or_pos

Unnamed: 0,name,amount
0,John,-20
2,Tom,10
3,John,20


#### 🧭 Check your work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [29]:
df_check = pd.DataFrame({'name': ['John', 'Mary', 'Tom', 'John'], 'amount': [-20, -10, 10, 20]})

pd.testing.assert_frame_equal(df, df_check)
pd.testing.assert_frame_equal(df_john_or_pos.reset_index(drop=True),
                              df_check.iloc[[0, 2, 3]].reset_index(drop=True))

#### ⚜️ A diagram to help your understanding
![is_john_OR_is_positive](https://github.com/bdi475/images/blob/main/is-john-OR-is-positive.png?raw=true)

---

### 🎯 Mini-exercise: Find transactions that are NOT made by John

#### 👇 Tasks

- ✔️ Given `df`, find rows where the name is NOT `'John'`.
    - Store the result to a new variable named `df_not_john`.
    - `df_not_john` should be a `DataFrame`.
- ✔️ Although you can do this without the NOT operator (`~`), **your goal is to use `~`**.
- ✔️ `df` should remain unaltered after running your code.

#### 🚀 Hints

- Create a boolean Series `is_john` using an equality comparison (`df['name'] == 'John'`).
- Use a logical NOT operator `~` to reverse `is_john`.

▶️ Run the code cell below to create `df`.

In [30]:
# DO NOT CHANGE THE CODE IN THIS CELL
df = pd.DataFrame({'name': ['John', 'Mary', 'Tom', 'John'], 'amount': [-20, -10, 10, 20]})
df

Unnamed: 0,name,amount
0,John,-20
1,Mary,-10
2,Tom,10
3,John,20


In [31]:
# YOUR CODE BEGINS
is_john = df['name'] == 'John'

df_not_john = df[~is_john]
# YOUR CODE ENDS

df_not_john

Unnamed: 0,name,amount
1,Mary,-10
2,Tom,10


#### 🧭 Check your work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [32]:
df_check = pd.DataFrame({'name': ['John', 'Mary', 'Tom', 'John'], 'amount': [-20, -10, 10, 20]})

pd.testing.assert_frame_equal(df, df_check)
pd.testing.assert_frame_equal(df_not_john.reset_index(drop=True),
                              df_check.iloc[[1, 2]].reset_index(drop=True))

#### ⚜️ A diagram to help your understanding
![not_john](https://github.com/bdi475/images/blob/main/not-john.png?raw=true)

---

### 📌 Load data

▶️ Run the code cell below to create a new `DataFrame` named `df_you`.

In [33]:
df_you = pd.read_csv('https://raw.githubusercontent.com/bdi475/datasets/main/about-you.csv')

# Used to keep a clean copy
df_you_backup = df_you.copy()

# head() displays the first 5 rows of a DataFrame
df_you.head()

Unnamed: 0,name,major1,major2,city,distance_from_champaign,fav_restaurant,has_iphone
0,Citlalli,Anthropology,,Chicago,125.86,Seven Saints,True
1,Zach,Finance,Information Systems,Glenview,137.04,,False
2,Ori,Information Science,,Skokie,134.94,Culvers,True
3,Dylan,Accountancy,,Chicago,125.86,Signature Grill,True
4,Ajay,Organizational Psychology,Statistics,Fairview Heights,141.24,Chipotle,True


The table below describes each column in `df_you`.

| Column Name             | Description                                               |
|-------------------------|-----------------------------------------------------------|
| name                    | First name                                                |
| major1                  | Major                                                     |
| major2                  | Second major OR minor (blank if no second major or minor) |
| city                    | City the person is from                                   |
| distance_from_champaign | Straight distance from the city to Champaign in miles     |
| fav_restaurant          | Favorite restaurant (blank if no restaurant was given)    |
| has_iphone              | Whether the person use an iPhone                          |

---

### 📌 Concise summary of a `DataFrame`

▶️ Run `df_you.info()` below to see the `info()` method in action.

In [34]:
# YOUR CODE BEGINS
df_you.info()
# YOUR CODE ENDS

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 7 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   name                     40 non-null     object 
 1   major1                   40 non-null     object 
 2   major2                   20 non-null     object 
 3   city                     40 non-null     object 
 4   distance_from_champaign  40 non-null     float64
 5   fav_restaurant           19 non-null     object 
 6   has_iphone               40 non-null     bool   
dtypes: bool(1), float64(1), object(5)
memory usage: 2.0+ KB


---

### 📌 Number of rows and columns in a `DataFrame`

👉 How many rows and columns does `df_you` have?

▶️ Run `df_you.shape` below to see the *shape* of the database.

In [35]:
# YOUR CODE BEGINS
df_you.shape
# YOUR CODE ENDS

(40, 7)

---

### 🎯 Mini-exercise: Find of number of rows and columns in a `DataFrame`

#### 👇 Tasks

- ✔️ Store the number of rows in `df_you` to a new variable named `num_rows`.
- ✔️ Store the number of columns in `df_you` to a new variable named `num_cols`.
- ✔️ Use `.shape`, not `len()`.

In [36]:
# YOUR CODE BEGINS
num_rows = df_you.shape[0]
num_cols = df_you.shape[1]

print(num_rows)
print(num_cols)
# YOUR CODE ENDS

40
7


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [37]:
tc.assertEqual(num_rows, len(df_you.index), f'Number of rows should be {len(df_you.index)}')
tc.assertEqual(num_cols, len(df_you.columns), f'Number of columns should be {len(df_you.columns)}')

---

### 🎯 Mini-exercise: People who does not use an iPhone

▶️ Run the code cell below to see the **first** 3 rows of `df_you`.

In [38]:
# Restore clean df_you
df_you = df_you_backup.copy()

df_you.head(3)

Unnamed: 0,name,major1,major2,city,distance_from_champaign,fav_restaurant,has_iphone
0,Citlalli,Anthropology,,Chicago,125.86,Seven Saints,True
1,Zach,Finance,Information Systems,Glenview,137.04,,False
2,Ori,Information Science,,Skokie,134.94,Culvers,True


#### 👇 Tasks

- ✔️ Using `df_you`, filter rows where the person does not use an iPhone.
    - Store the result to a new variable named `df_no_iphone`.
- ✔️ `df_you` should remain unaltered after your code.

In [39]:
# YOUR CODE BEGINS
df_no_iphone = df_you[df_you['has_iphone'] == False]
# YOUR CODE ENDS

df_no_iphone

Unnamed: 0,name,major1,major2,city,distance_from_champaign,fav_restaurant,has_iphone
1,Zach,Finance,Information Systems,Glenview,137.04,,False
30,Mark,Finance,,Metamora,74.97,Taco Bell,False
34,Michelle,Accountancy,,Chicago,125.86,,False
38,Joe,Agricultural and Consumer Economics,Marketing,Chicago,125.86,,False


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [40]:
pd.testing.assert_frame_equal(df_no_iphone.reset_index(drop=True),
                              df_you[~df_you['has_iphone']].reset_index(drop=True))

---

### 🎯 Mini-exercise: People who are within 200 miles radius from Champaign.

▶️ Run the code cell below to see the **last** 2 rows of `df_you`.

In [41]:
# Restore clean df_you
df_you = df_you_backup.copy()

df_you.tail(2)

Unnamed: 0,name,major1,major2,city,distance_from_champaign,fav_restaurant,has_iphone
38,Joe,Agricultural and Consumer Economics,Marketing,Chicago,125.86,,False
39,Clint,Political Science,,Springfield,85.11,Chipotle,True


#### 👇 Tasks

- ✔️ Using `df_you`, filter rows where the person is from a city that is within 200 miles from Champaign.
    - Use the `distance_from_champaign` column.
    - Store the result to a new variable named `df_nearby`.
- ✔️ `df_you` should remain unaltered after your code.

In [42]:
# YOUR CODE BEGINS
df_nearby = df_you[df_you['distance_from_champaign'] <= 200]
# YOUR CODE ENDS

df_nearby

Unnamed: 0,name,major1,major2,city,distance_from_champaign,fav_restaurant,has_iphone
0,Citlalli,Anthropology,,Chicago,125.86,Seven Saints,True
1,Zach,Finance,Information Systems,Glenview,137.04,,False
2,Ori,Information Science,,Skokie,134.94,Culvers,True
3,Dylan,Accountancy,,Chicago,125.86,Signature Grill,True
4,Ajay,Organizational Psychology,Statistics,Fairview Heights,141.24,Chipotle,True
5,Andrew,Economics,Statistics,Skokie,134.94,,True
6,Sarah,Marketing,Theatre,Morris,86.38,,True
10,James,Accountancy,Informatics,Orland Park,106.11,,True
12,Max,Finance,Informatics,Clarendon Hills,117.12,Chick-fil-A,True
13,Nick,Information Science,,Northbrook,140.84,Potbelly,True


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [43]:
df_check = df_you_backup[df_you_backup['distance_from_champaign'] <= 200]

pd.testing.assert_frame_equal(df_nearby.sort_values(df_nearby.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))

---

### 🎯 Mini-exercise: People who like Chipotle

▶️ Run the code cell below to **randomly select** 3 rows from `df_you`.

In [44]:
# Restore clean df_you
df_you = df_you_backup.copy()

df_you.sample(3)

Unnamed: 0,name,major1,major2,city,distance_from_champaign,fav_restaurant,has_iphone
30,Mark,Finance,,Metamora,74.97,Taco Bell,False
26,Jake,Accountancy,Informatics,Orland Park,106.11,Chipotle,True
31,Yushan,Accountancy,Statistics,Nanjing,7159.39,,True


#### 👇 Tasks

- ✔️ Using `df_you`, filter rows where the person's favorite restaurant is `Chipotle`.
    - Store the result to a new variable named `df_chipotle`.
- ✔️ `df_you` should remain unaltered after your code.

In [45]:
# YOUR CODE BEGINS
df_chipotle = df_you[df_you['fav_restaurant'] == 'Chipotle']
# YOUR CODE ENDS

df_chipotle

Unnamed: 0,name,major1,major2,city,distance_from_champaign,fav_restaurant,has_iphone
4,Ajay,Organizational Psychology,Statistics,Fairview Heights,141.24,Chipotle,True
16,Harsha,Accountancy,,Lisle,116.73,Chipotle,True
18,Jainil,Information Systems,,Niles,133.44,Chipotle,True
26,Jake,Accountancy,Informatics,Orland Park,106.11,Chipotle,True
36,Jim,Accountancy,,Orland Park,106.11,Chipotle,True
39,Clint,Political Science,,Springfield,85.11,Chipotle,True


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [46]:
df_check = df_you_backup[df_you_backup['fav_restaurant'] == 'Chipotle']

pd.testing.assert_frame_equal(df_chipotle.sort_values(df_chipotle.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))

---

### 🎯 Mini-exercise: `Accountancy` or `Finance` people

▶️ Run the code cell below to see the **last** 5 rows of `df_you`.

In [47]:
# Restore clean df_you
df_you = df_you_backup.copy()

df_you.tail(5)

Unnamed: 0,name,major1,major2,city,distance_from_champaign,fav_restaurant,has_iphone
35,Nicole,Community Health,,Lake Bluff,152.55,,True
36,Jim,Accountancy,,Orland Park,106.11,Chipotle,True
37,Tyler,Information Systems,,Homewood-Flossmoor,134.94,Lou Malnati's,True
38,Joe,Agricultural and Consumer Economics,Marketing,Chicago,125.86,,False
39,Clint,Political Science,,Springfield,85.11,Chipotle,True


#### 👇 Tasks

- ✔️ Using `df_you`, filter rows that matches the following criteria:
    - The person's `major1` is `Accountancy`, **OR**
    - The person's `major1` is `Finance`.
- ✔️ Store the filtered `DataFrame` to a new variable named `df_accyfi`.
- ✔️ `df_you` should remain unaltered after your code.

In [48]:
# YOUR CODE BEGINS
df_accyfi = df_you[(df_you['major1'] == 'Accountancy') | (df_you['major1'] == 'Finance')]
# YOUR CODE ENDS

df_accyfi

Unnamed: 0,name,major1,major2,city,distance_from_champaign,fav_restaurant,has_iphone
1,Zach,Finance,Information Systems,Glenview,137.04,,False
3,Dylan,Accountancy,,Chicago,125.86,Signature Grill,True
7,Ahsaas,Finance,,Muscat,7543.35,Five Guys,True
9,Ella,Accountancy,,Hong Kong,7789.61,,True
10,James,Accountancy,Informatics,Orland Park,106.11,,True
12,Max,Finance,Informatics,Clarendon Hills,117.12,Chick-fil-A,True
15,Nicole,Accountancy,Finance,Shanghai,7154.42,,True
16,Harsha,Accountancy,,Lisle,116.73,Chipotle,True
19,Erin,Accountancy,,Highland Park,144.55,,True
21,Victoria,Finance,Accountancy,Chicago,125.86,Pokelab,True


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [49]:
df_check = df_you_backup[(df_you_backup['major1'] == 'Accountancy') | (df_you_backup['major1'] == 'Finance')]

pd.testing.assert_frame_equal(df_accyfi.sort_values(df_accyfi.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))

---

### 📌 Sorting a `DataFrame`

▶️ Run the code cell below to **sort** `df_you` by `distance_from_champaign`.

In [50]:
df_you.sort_values('distance_from_champaign')

Unnamed: 0,name,major1,major2,city,distance_from_champaign,fav_restaurant,has_iphone
30,Mark,Finance,,Metamora,74.97,Taco Bell,False
39,Clint,Political Science,,Springfield,85.11,Chipotle,True
6,Sarah,Marketing,Theatre,Morris,86.38,,True
32,Maura,Consumer Economics,Finance,Glenview,104.55,Wildfire,True
24,Marissa,Finance,,Plainfield,104.74,,True
36,Jim,Accountancy,,Orland Park,106.11,Chipotle,True
26,Jake,Accountancy,Informatics,Orland Park,106.11,Chipotle,True
10,James,Accountancy,Informatics,Orland Park,106.11,,True
17,Keziah,Agricultural and Consumer Economics,Communications,Bolingbrook,109.64,,True
23,Kevin,Accountancy,Statistics,Naperville,115.46,,True


▶️ Run the code cell below to **sort** `df_you` by `major1` and then by `major2` for people with same `major1` values.

In [51]:
df_you.sort_values(['major1', 'major2'])

Unnamed: 0,name,major1,major2,city,distance_from_champaign,fav_restaurant,has_iphone
15,Nicole,Accountancy,Finance,Shanghai,7154.42,,True
10,James,Accountancy,Informatics,Orland Park,106.11,,True
26,Jake,Accountancy,Informatics,Orland Park,106.11,Chipotle,True
22,Bella,Accountancy,Information Systems,Seoul,6623.7,,True
23,Kevin,Accountancy,Statistics,Naperville,115.46,,True
31,Yushan,Accountancy,Statistics,Nanjing,7159.39,,True
25,Hwanjae,Accountancy,Technology & Management,Seoul,6623.7,,True
3,Dylan,Accountancy,,Chicago,125.86,Signature Grill,True
9,Ella,Accountancy,,Hong Kong,7789.61,,True
16,Harsha,Accountancy,,Lisle,116.73,Chipotle,True


▶️ Run the code cell below to **sort** `df_you` by `distance_from_champaign` in descending order.

In [52]:
df_you.sort_values('distance_from_champaign', ascending=False)

Unnamed: 0,name,major1,major2,city,distance_from_champaign,fav_restaurant,has_iphone
8,Jennifer,Food Science,Human Nutrition,Macau,7807.02,,True
9,Ella,Accountancy,,Hong Kong,7789.61,,True
29,Claudia,Economics,,Hong Kong,7789.61,,True
7,Ahsaas,Finance,,Muscat,7543.35,Five Guys,True
31,Yushan,Accountancy,Statistics,Nanjing,7159.39,,True
15,Nicole,Accountancy,Finance,Shanghai,7154.42,,True
11,Jaewon,Acturial Science,,Seoul,6623.7,Jimmy Johns,True
22,Bella,Accountancy,Information Systems,Seoul,6623.7,,True
25,Hwanjae,Accountancy,Technology & Management,Seoul,6623.7,,True
14,Jackie,Supply Chain Management,Marketing,Wheeling,397.48,Portillos,True
