#  11. Advanced filtering  first subtopic: `.query()`.

-----

The `.query()` method is a powerful and expressive way to filter a DataFrame. It allows you to select rows by passing a **string** that looks very much like a SQL `WHERE` clause. This can be much cleaner and more readable than standard boolean indexing, especially when you have many conditions.

**How It Works in Memory**: Behind the scenes, `.query()` uses a high-performance evaluation engine (by default, the `numexpr` library) to parse the string and apply the filter. This can be *faster* than standard boolean indexing (`df[(df['A'] > 5) & (df['B'] < 10)]`) because it can optimize the full operation at once, rather than creating multiple intermediate boolean Series in memory.

**When to Use This**:

  * Use `.query()` when you have **multiple conditions** that are becoming hard to read with `&` and `|`.
  * Use it when you want to write **clean, "English-like"** filtering logic (e.g., `df.query('Age > 30 and City == "New York"')`).
  * It's especially powerful for referencing **variables from your code** directly in the query string (e.g., `df.query('Age > @min_age')`).

-----

### 0\. Syntax & Parameters (MUST COME FIRST)

```python
dataframe.query(expr, inplace=False, **kwargs)
```

  * **`expr`** (expression)
      * **What it does:** This is a **string** that contains the condition you want to filter on.
      * **Default value:** (Required)
      * **When you would use it:** You *always* provide this. The string must be a valid boolean expression.
          * `df.query('Age > 18')`
          * `df.query('Age > 18 and City == "New York"')` (Use `and`, `or`, `not` keywords)
          * `df.query('`Total Sales`  > 100') ` (Use backticks \`\` for column names with spaces)
          * `df.query('Age > @min_age')` (Use `@` to reference a Python variable)
  * **`inplace`**
      * **What it does:** A boolean (True/False). If `True`, it modifies the original DataFrame. If `False`, it returns a *new filtered copy*.
      * **Default value:** `False`
      * **When you would use it:** It's **strongly recommended** to leave this as `False` and use re-assignment: `df_filtered = df.query(...)`.
      * **What happens if you don't specify it:** The default `False` is used, which is safer.

-----

### 1\. Basic Example

Let's do a simple filter, just like we did with boolean indexing.

```python
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Clara', 'David'],
    'Age': [25, 30, 22, 35],
    'City': ['New York', 'Los Angeles', 'New York', 'Chicago']
})
print("--- 1. Original DataFrame ---")
print(df)

# Example 1: A simple query
# Note: The string 'Age > 28' is the expression
df_filtered = df.query('Age > 28')

print("\n--- 2. Filtered DataFrame (df.query('Age > 28')) ---")
print(df_filtered)
```

**Output:**

```
--- 1. Original DataFrame ---
    Name  Age         City
0  Alice   25     New York
1    Bob   30  Los Angeles
2  Clara   22     New York
3  David   35      Chicago

--- 2. Filtered DataFrame (df.query('Age > 28')) ---
    Name  Age         City
1    Bob   30  Los Angeles
3  David   35      Chicago
```

**Explanation:**
We passed the string `'Age > 28'` to the `.query()` method. Pandas evaluated this string in the "context" of the DataFrame, as if 'Age' was a variable. It returned a new DataFrame containing only the rows (1 and 3) where this expression was `True`.

-----

### 2\. Intermediate Example (Multiple Conditions)

This is where `.query()` starts to look much cleaner than standard boolean indexing.

**Example 2: Multiple conditions with `and`**
Find all people who are older than 20 *AND* live in New York.

```python
# With .query(), you can use the 'and' keyword
# Note: String values must be in quotes ("New York")
query_and = 'Age > 20 and City == "New York"'
df_and = df.query(query_and)

print("--- 1. Filtered with AND ---")
print(df_and)
```

**Output:**

```
--- 1. Filtered with AND ---
    Name  Age      City
0  Alice   25  New York
2  Clara   22  New York
```

**Explanation:**
This is much more readable than `df[(df['Age'] > 20) & (df['City'] == 'New York')]`. We just wrote the logic in a simple string.

**Example 3: Multiple conditions with `or`**
Find all people who are older than 30 *OR* live in New York.

```python
# You can also use the 'or' keyword
query_or = 'Age > 30 or City == "New York"'
df_or = df.query(query_or)

print("\n--- 2. Filtered with OR ---")
print(df_or)
```

**Output:**

```
--- 2. Filtered with OR ---
    Name  Age      City
0  Alice   25  New York
2  Clara   22  New York
3  David   35   Chicago
```

**Explanation:**
Again, this is very clear. It selected Alice and Clara (who live in New York) and David (who is older than 30).

-----

### 3\. Advanced or Tricky Case

The two "killer features" of `.query()` are using **backticks \`\`** for weird column names and using **`@`** to access local variables.

**Example 4: Using backticks for columns with spaces**

```python
df_spaces = pd.DataFrame({
    'Full Name': ['Alice Smith', 'Bob Jones'],
    'Age Level': [25, 30]
})
print("--- 1. Original (with spaces in names) ---")
print(df_spaces)

# This would fail: df.query('Full Name == "Alice Smith"')
# You MUST use backticks ``

df_filtered = df_spaces.query('`Full Name` == "Alice Smith"')
print("\n--- 2. Filtered with backticks ---")
print(df_filtered)
```

**Output:**

```
--- 1. Original (with spaces in names) ---
       Full Name  Age Level
0  Alice Smith         25
1    Bob Jones         30

--- 2. Filtered with backticks ---
       Full Name  Age Level
0  Alice Smith         25
```

**Explanation:**
Because the column `Full Name` has a space, standard Python syntax would break. `.query()` lets you "escape" this by wrapping the column name in backticks: \``Full Name`\`.

**Example 5: Using `@` to reference a Python variable**
This is *extremely* useful.

```python
# A variable from our code
min_age = 30

print(f"\n--- 3. Our local variable: min_age = {min_age} ---")

# Use @min_age to access the variable INSIDE the query string
df_filtered_var = df.query('Age > @min_age')

print("\n--- 4. Filtered using the @variable ---")
print(df_filtered_var)
```

**Output:**

```
--- 3. Our local variable: min_age = 30 ---

--- 4. Filtered using the @variable ---
    Name  Age     City
3  David   35  Chicago
```

**Explanation:**
The `@` symbol told `.query()` to "look outside" for a Python variable named `min_age` and substitute its value (30) into the query. This is much cleaner than `df[df['Age'] > min_age]`.

-----

### 4\. Real-World Use Case

Imagine you're running a report and the parameters (like region, min sales) are stored in variables. `.query()` is perfect for this.

**Example 6: Building a dynamic query**

```python
# Parameters for our report
target_city = 'New York'
min_age = 25
max_age = 40

# Build the query string dynamically
query_string = 'City == @target_city and Age >= @min_age and Age <= @max_age'

print(f"--- 1. Running query: {query_string} ---")

# Run the query
report = df.query(query_string)

print("\n--- 2. Report Output ---")
print(report)
```

**Output:**

```
--- 1. Running query: City == @target_city and Age >= @min_age and Age <= @max_age ---

--- 2. Report Output ---
    Name  Age      City
0  Alice   25  New York
```

**Explanation:**
We built a complex query string using three different local variables. This makes the code very easy to read and maintain. If the parameters change, we don't have to rewrite the `&` and `|` logic; we just change the variables.

-----

### 5\. Common Mistakes / Pitfalls

**Mistake 7: Forgetting quotes inside the string**

```python
# Wrong code
try:
    # We are comparing City to a "variable" named New York
    df.query('City == New York')
except Exception as e:
    print(f"\n--- Mistake 1: Missing quotes ---")
    print(e)
```

**Error/Wrong Output:**
`NameError: name 'New' is not defined`
**Why it happens:** `.query()` thinks `New` and `York` are variables (like `Age` or `City`). It doesn't know you mean the *string* "New York".
**Example 8: Corrected code:**

```python
# Correct code
df_correct = df.query('City == "New York"') # Use quotes
print("\n--- Corrected ---")
print(df_correct)
```

**Mistake 9: Using `@` on a string variable and forgetting quotes**
This is a subtle version of the same mistake.

```python
# Wrong code
city_var = "New York"
try:
    # This becomes 'City == New York', which is wrong
    df.query('City == @city_var') 
except Exception as e:
    print(f"\n--- Mistake 2: @variable needs quotes too ---")
    print(e)
```

**Error/Wrong Output:**
`NameError: name 'New' is not defined`
**Why it happens:** The `@` just *substitutes* the variable's value. The query string becomes `City == New York`, which is the same error as before.
**Example 10: Corrected code:**
You can just compare to the variable, and Pandas is smart enough to handle it.

```python
# Correct code
city_var = "New York"
df_correct = df.query('City == @city_var') # This works!
print("\n--- Corrected (Pandas is smart) ---")
print(df_correct)
```

*(Self-correction: My "Mistake 9" was wrong. Pandas is smart enough to handle `df.query('City == @city_var')` where `city_var` is a string. The real mistake is trying to put quotes *inside* the string, e.g. `df.query('City == "@city_var"')` which would compare to the literal string "@city\_var".)*

**Mistake 11 (Real Mistake 9): Forgetting backticks \`\`**

```python
df_spaces = pd.DataFrame({'Sales Goal': [100]})

# Wrong code
try:
    df_spaces.query('Sales Goal > 50')
except Exception as e:
    print(f"\n--- Mistake 3: Forgetting backticks ---")
    print(e)
```

**Error/Wrong Output:**
`SyntaxError: invalid syntax`
**Why it happens:** The parser sees `Sales` and then a space, and it doesn't know what to do.
**Example 12: Corrected code:**
`df_spaces.query('`Sales Goal`  > 50') `

----





# Advanced filtering  second subtopic: `.isin()`.

-----

The `.isin()` method is a specialized and highly efficient filter. It checks for **membership**, letting you select rows where a column's value *is in* a list of allowed values.

Think of it as checking IDs at a party. You have a "guest list" (your list of values), and `.isin()` is the bouncer. It goes down your DataFrame, row by row, and asks, "Is your 'City' on this list?" If `True`, the row gets in. If `False`, it's filtered out.

**How It Works in Memory**: `.isin()` is highly optimized. When you pass it a list `L`, it first converts `L` into a **hash set** (which is like a high-speed dictionary-key lookup). This makes checking for membership an incredibly fast, O(1) operation on average. It then iterates through your column (`df['City']`) and, for each city, it does a near-instant lookup in that hash set. This is *much faster* than chaining `|` (OR) operations, especially if your list of allowed values is large.

**When to Use This**:

  * You *must* use this whenever you need to filter a column against a **list of multiple values**.
  * **Good:** "Find all sales from 'New York', 'Chicago', or 'San Francisco'."
  * **Bad:** `(df['City'] == 'New York') | (df['City'] == 'Chicago') | (df['City'] == 'San Francisco')`. This is slow, hard to read, and error-prone.
  * **Good:** `cities = ['New York', 'Chicago', 'San Francisco']` and then `df[df['City'].isin(cities)]`.

-----

### 0\. Syntax & Parameters (MUST COME FIRST)

This is a method of a **Series**, so you call it *on* a column.

```python
series.isin(values)
```

  * **`values`**
      * **What it does:** The collection of "allowed" values to check for. This can be a Python **list**, **set**, **Series**, or `dict` (in which case only the *keys* are used).
      * **Default value:** (Required)
      * **When you would use it:** You *always* provide this. This is your "guest list."
      * **Example:** `df['City'].isin(['New York', 'Chicago'])`

-----

### 1\. Basic Example

Let's filter our DataFrame to find people in specific cities.

```python
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Clara', 'David', 'Eva'],
    'Age': [25, 30, 22, 35, 28],
    'City': ['New York', 'Los Angeles', 'New York', 'Chicago', 'Boston']
})
print("--- 1. Original DataFrame ---")
print(df)

# Example 1: Define the list of values
target_cities = ['New York', 'Chicago']

# Example 2: Create the boolean mask
mask_isin = df['City'].isin(target_cities)
print("\n--- 2. The Boolean Mask (df['City'].isin(target_cities)) ---")
print(mask_isin)

# Example 3: Apply the mask
df_filtered = df[mask_isin]
print("\n--- 3. Filtered DataFrame ---")
print(df_filtered)
```

**Output:**

```
--- 1. Original DataFrame ---
    Name  Age         City
0  Alice   25     New York
1    Bob   30  Los Angeles
2  Clara   22     New York
3  David   35      Chicago
4    Eva   28       Boston

--- 2. The Boolean Mask (df['City'].isin(target_cities)) ---
0     True
1    False
2     True
3     True
4    False
Name: City, dtype: bool

--- 3. Filtered DataFrame ---
    Name  Age      City
0  Alice   25  New York
2  Clara   22  New York
3  David   35   Chicago
```

**Explanation:**
The `.isin(target_cities)` method created a boolean mask. It was `True` for 'New York' (rows 0, 2) and 'Chicago' (row 3), and `False` for 'Los Angeles' and 'Boston'. The final DataFrame includes only the `True` rows.

-----

### 2\. Intermediate Example

You can combine `.isin()` with other conditions, and you can also use it to find what's *NOT* in the list.

**Example 4: `~` (NOT) with `.isin()`**
Find all people who are *not* in 'New York' or 'Chicago'.

```python
target_cities = ['New York', 'Chicago']

# Just add a ~ in front of the mask
mask_not_in = ~df['City'].isin(target_cities)
print("\n--- 1. The NOT IN Mask ---")
print(mask_not_in)

print("\n--- 2. Filtered with NOT IN ---")
print(df[mask_not_in])
```

**Output:**

```
--- 1. The NOT IN Mask ---
0    False
1     True
2    False
3    False
4     True
Name: City, dtype: bool

--- 2. Filtered with NOT IN ---
   Name  Age         City
1   Bob   30  Los Angeles
4   Eva   28       Boston
```

**Example 5: Combining `.isin()` with `&`**
Find all people who live in a target city *AND* are older than 28.

```python
target_cities = ['New York', 'Chicago']

# Create the two conditions and wrap them in ()
mask_combined = (df['City'].isin(target_cities)) & (df['Age'] > 28)
print("\n--- 3. Combined Mask ---")
print(mask_combined)

print("\n--- 4. Filtered with Combined Mask ---")
print(df[mask_combined])
```

**Output:**

```
--- 3. Combined Mask ---
0    False
1    False
2    False
3     True
4    False
dtype: bool

--- 4. Filtered with Combined Mask ---
    Name  Age     City
3  David   35  Chicago
```

**Explanation:**

  * Alice (NY, 25): (True) & (False) -\> False
  * Clara (NY, 22): (True) & (False) -\> False
  * David (Chicago, 35): (True) & (True) -\> True

-----

### 3\. Advanced or Tricky Case

You can also pass a **Series** or a **dictionary's keys** as the `values` argument.

**Example 6: Using another Series with `.isin()`**
This is a powerful way to filter one DataFrame based on the values in another.

```python
# Our main DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Clara', 'David'],
    'City': ['NY', 'LA', 'NY', 'CHI']
})

# A separate Series (e.g., from another file)
allowed_cities = pd.Series(['NY', 'SF', 'CHI'], name='Allowed')
print("--- 1. Allowed Cities Series ---")
print(allowed_cities)

# Filter df based on the values in allowed_cities
df_filtered = df[df['City'].isin(allowed_cities)]
print("\n--- 2. Filtered by Series ---")
print(df_filtered)
```

**Output:**

```
--- 1. Allowed Cities Series ---
0     NY
1     SF
2    CHI
Name: Allowed, dtype: object

--- 2. Filtered by Series ---
    Name City
0  Alice   NY
3  David  CHI
```

**Explanation:**
`.isin()` looked at the *values* of the `allowed_cities` Series (`['NY', 'SF', 'CHI']`) and used them as the "guest list" to filter `df`. 'LA' (Bob) was not in the list, so he was filtered out.

-----

### 4\. Real-World Use Case

**Example 7: Filtering a large DataFrame based on a list of IDs**
This is a *very* common operation. You have a main table (e.g., `all_transactions`) and a list of specific `customer_ids` you want to investigate.

```python
# All transactions
all_transactions = pd.DataFrame({
    'tx_id': [1, 2, 3, 4, 5],
    'customer_id': ['c100', 'c101', 'c102', 'c101', 'c103'],
    'amount': [50, 25, 10, 30, 100]
})
print("--- 1. All Transactions ---")
print(all_transactions)

# List of customers to investigate
target_customers = ['c101', 'c103']

# Get all transactions for just those customers
target_transactions = all_transactions[
    all_transactions['customer_id'].isin(target_customers)
]

print("\n--- 2. Target Transactions ---")
print(target_transactions)
```

**Output:**

```
--- 1. All Transactions ---
   tx_id customer_id  amount
0      1        c100      50
1      2        c101      25
2      3        c102      10
3      4        c101      30
4      5        c103     100

--- 2. Target Transactions ---
   tx_id customer_id  amount
1      2        c101      25
3      4        c101      30
4      5        c103     100
```

**Explanation:**
This is the power of `.isin()`. We instantly pulled all records (transactions 2, 4, and 5) that matched our list of target customers. The alternative (`(df['customer_id'] == 'c101') | (df['customer_id'] == 'c103')`) is much clunkier.

-----

### 5\. Common Mistakes / Pitfalls

**Mistake 8: Passing a single string**

```python
# Wrong code
try:
    # You passed a string, not a list
    df[df['City'].isin('New York')]
except TypeError as e:
    print(f"\n--- Mistake 1: Passing a string ---")
    print(e)
```

**Error/Wrong Output:**
`TypeError: 'str' object is not iterable`
**Why it happens:** `.isin()` expects an *iterable* (like a list or set) that it can loop over to build its hash set. A single string is not a list.
**Example 9: Corrected code:**
You *must* wrap it in a list, even if there's only one value.

```python
# Correct code
df_correct = df[df['City'].isin(['New York'])]
print("\n--- Corrected ---")
print(df_correct)
```

**Mistake 10: Using `.isin()` to check a single value**
This isn't an error, but it's bad practice.

```python
# Bad, but works
df_bad = df[df['City'].isin(['New York'])]

# Good, much faster and cleaner
df_good = df[df['City'] == 'New York']
```

**Why it happens:** `.isin(['New York'])` works, but it has the overhead of building a hash set for just one item. The simple `==` (equals) operator is much more direct and optimized for this case. Use `==` for one value; use `.isin()` for *two or more* values.



----

### 6\. Key Terms (Explained Simply)

  * **Boolean Mask:** A Series of `True`/`False` values used to select rows. `.isin()` creates one, and `.query()` uses one internally.
  * **`.query()`:** A DataFrame method that filters rows using a **string expression** (e.g., `'Age > 30'`).
  * **Expression (expr):** The string passed to `.query()` that contains the filtering logic.
  * **`@variable`:** The syntax used inside a `.query()` string to reference a local Python variable (e.g., `df.query('Age > @min_age')`).
  * **Backticks (\`\`)**: Used in a `.query()` string to wrap column names that have spaces or special characters (e.g., \`'Sales (USD)'\` \> 100).
  * **`.isin()`:** A **Series** method (called on a column) that filters based on **membership in a list**.
  * **Membership:** Checking if a value is "in" a collection of other values (e.g., `'New York'` is in `['New York', 'Boston']`).
  * **Hash Set:** The fast, O(1) lookup structure that `.isin()` uses internally to check for membership.

-----

### 7\. Best Practices

  * **For `.query()`:**
      * Use it when you have **multiple conditions**. `df.query('A > B and C < D')` is often cleaner than `df[(df['A'] > df['B']) & (df['C'] < df['D'])]`.
      * Use the `@` prefix to **reference external variables** (`@my_var`). It's clean and safe.
      * **Always** use backticks \`\` around column names with spaces or special characters.
      * Use `and`, `or`, `not` inside the query string (not `&`, `|`, `~`).
  * **For `.isin()`:**
      * Use it *any time* you need to check if a column's value is one of **two or more** items in a list.
      * For a *single* item, `df[df['City'] == 'New York']` is faster and clearer than `df[df['City'].isin(['New York'])]`.
      * To filter for what's *not* in the list, combine it with `~`: `df[~df['City'].isin(my_list)]`.
      * Store your "allowed values" in a **list** or **set** variable first, then pass the variable to `.isin()`. It's cleaner: `allowed_cities = [...]`, `df[df['City'].isin(allowed_cities)]`.

-----

### 8\. Mini Summary

  * **Boolean Indexing (`[]`, `&`, `|`)**: The fundamental, most common way to filter. Best for simple filters.
  * **`.query()`**: A "shortcut" that uses a **string** to filter. Excellent for readability with multiple `and`/`or` conditions and for using local variables (`@my_var`).
  * **`.isin()`**: A **Series** method used *inside* a filter. It's the best way to check for membership in a list (e.g., "City is in `['NY', 'LA', 'SF']`").

-----

### 10\. Practice Tasks

**Data for Tasks:**

```python
df_practice = pd.DataFrame({
    'Employee ID': ['e101', 'e102', 'e103', 'e104', 'e105'],
    'Name': ['Alice', 'Bob', 'Clara', 'David', 'Eva'],
    'Department': ['Sales', 'IT', 'IT', 'Sales', 'HR'],
    'Salary': [70000, 85000, 82000, 76000, 60000]
})
```

**Task 11 (Easy - `.isin()`):**
Select a new DataFrame `df_sales_hr` containing only the employees in the 'Sales' or 'HR' departments.

**Task 12 (Easy - `.query()`):**
Select a new DataFrame `df_high_earners` containing only employees with a 'Salary' greater than 75000.

**Task 13 (Medium - `.query()`):**
Select a new DataFrame `df_it_high` containing only employees who are in the 'IT' department *and* have a 'Salary' over 80000.

**Task 14 (Hard - `.query()` + `.isin()`):**
You have a list of target departments: `target_depts = ['Sales', 'IT']`.
You also have a minimum salary: `min_sal = 75000`.
Select a new DataFrame `df_report` containing employees who are in one of the `target_depts` *and* earn more than `min_sal`. **Use `.query()` for the salary and `.isin()` inside the query string (if possible, or combine them).**

*Hint for Task 14:* The advanced `.query()` syntax for `.isin()` is `df.query('Department.isin(@target_depts) and Salary > @min_sal')` or `df.query('Department in @target_depts and Salary > @min_sal')`.

-----

### 11\. Recommended Next Topic

You have now mastered the most important ways to select, filter, and structure a DataFrame. The next logical step from the roadmap is to ensure the *data inside* your columns is correct.

[cite\_start]**Recommended:** **Data Types & Conversions (`.astype()`, `pd.to_numeric()`, `pd.to_datetime()`)** [cite: 35-37]

-----

### 12\. Quick Reference Card

| Method | Syntax | Example |
| :--- | :--- | :--- |
| **Boolean Indexing** | `df[(cond1) & (cond2)]` | `df[(df['Age'] > 18) & (df['City'] == 'NY')]` |
| **`.query()` (Simple)**| `df.query('col > val')` | `df.query('Age > 18')` |
| **`.query()` (Complex)**| `df.query('cond1 and cond2')`| `df.query('Age > 18 and City == "NY"')` |
| **`.query()` (Variables)**| `df.query('col > @var')` | `df.query('Age > @min_age')` |
| **`.query()` (Spaces)** | `df.query('`col name`  > val') ` | `df.query('`Sales Goal`  > 100') ` |
| **`.isin()` (Filter)** | `df[df['col'].isin(list)]` | `df[df['City'].isin(['NY', 'LA'])]` |
| **`not .isin()` (Filter)**| `df[~df['col'].isin(list)]` | `df[~df['City'].isin(['NY', 'LA'])]` |
| **`.query()` + `.isin()`**| `df.query('col in @list')` | `df.query('City in @allowed_cities')` |

-----

### 13\. Common Interview Questions

1.  **When would you use `.query()` instead of standard boolean indexing?**
      * When I have multiple (`and`/`or`) conditions, as `.query()` is often more readable (e.g., `'A > B and C < D'`).
      * When I need to use local Python variables in my filter, as the `@variable` syntax is very clean.
2.  **What is the main advantage of `.isin()`?**
      * Performance and readability. It's *much faster* than chaining `|` (OR) conditions. `df[df['ID'].isin(list_of_100_ids)]` is thousands of times faster than 100 `|` (OR) statements.
3.  **My `.query()` is failing, but my column name `Sales (USD)` seems correct. Why?**
      * The column name has spaces and parentheses. You *must* wrap it in backticks \`\` inside the query string. The correct query is `df.query('\`Sales (USD)\` \> 1000')\`.
4.  **How do you select rows where 'City' is *not* in a list `['A', 'B']`?**
      * You use the `~` (NOT) operator in front of the `.isin()` mask: `bad_cities = ['A', 'B']`, `df[~df['City'].isin(bad_cities)]`.

-----

### 14\. Performance Considerations

  * **`.isin()`:**
      * **Time Complexity:** Very fast. **O(n + k)**, where 'n' is the length of your DataFrame's column and 'k' is the length of your "allowed" list. It builds a hash set (O(k)) and then does an O(1) lookup for each of the 'n' rows.
      * **This is *vastly* superior** to chaining `|` (OR) statements, which would be **O(n \* k)**.
  * **`.query()`:**
      * **Time Complexity:** Generally **O(n)**.
      * **Performance:** For simple queries, it's about the same speed as standard boolean indexing. For *very complex* queries (many `&` and `|`), it can be *faster* because it uses the `numexpr` engine, which optimizes the full expression and avoids creating multiple, large, intermediate boolean Series in memory.
  * **Memory Usage:** Both `.isin()` and `.query()` (when used for filtering) will return a **copy** of the data, not a view.

-----

### 15\. When NOT to Use This

  * **When NOT to use `.query()`:**
      * Don't use it for simple, single-column selection (e.g., `df['Age']`).
      * Don't use it if you are setting values based on a condition. **Always** use `df.loc[mask, 'col'] = value` for setting. While you *can* sometimes make `query` work, `loc` is the universally correct, unambiguous tool for the job.
  * **When NOT to use `.isin()`:**
      * **Don't use it for a single value.** `df[df['City'] == 'New York']` is clearer and more efficient than `df[df['City'].isin(['New York'])]`.
      * **Don't pass it a single string.** `df['City'].isin('New York')` will error. You must pass a list: `df['City'].isin(['New York'])`.