# 14. Detection methods  first subtopic: `.isna()`, `.notna()`, and `.isnull().sum()`.

-----

These functions are your "missing data detectors." Their job is to scan your Series or DataFrame and return a `True` or `False` for every single cell, telling you if it's empty or not.

  * **`.isna()` / `.isnull()`:** These are **identical**. They ask the question, "Is this cell missing?" They return `True` for `np.nan`, `None`, and `pd.NaT` (Not a Time).
  * **`.notna()`:** This is the exact **opposite**. It asks, "Does this cell have a value?" It returns `True` for any cell that is *not* missing.
  * **`.sum()`:** This is a "chain" method. When you call `.sum()` on the `True`/`False` mask that `.isna()` gives you, it cleverly counts all the `True` values (since `True` acts like `1` and `False` acts like `0`). `df.isna().sum()` is the standard, fastest way to get a *count* of missing data in every column.

**How It Works in Memory**: These methods are highly optimized, vectorized operations. `df.isna()` creates a *new* DataFrame of the exact same shape, but filled with boolean (`True`/`False`) values. This "boolean mask" is very memory-light. When you call `.sum()` on this mask, it performs a fast C-level operation to sum the `True` values (as `1`s) down each column, resulting in a small Series of counts.

**When to Use This**:

  * You will use `df.isna().sum()` **every time** you load a new dataset. It is step \#1 (along with `df.info()`) for assessing data quality.
  * Use `s.isna()` (or `s.notna()`) inside a filter to *find* the actual rows that are missing or complete (e.g., `df[df['email'].notna()]`).
  * Use `.isnull()` if you see it in older code; it's just an alias for `.isna()`. The modern standard is `.isna()` and `.notna()`.

-----

### 0\. Syntax & Parameters (MUST COME FIRST)

These are methods, but they typically have no parameters. They are often chained.

```python
# On a Series (or DataFrame)
series_or_df.isna()

# Alias for .isna()
series_or_df.isnull()

# The opposite of .isna()
series_or_df.notna()
```

  * **Parameters:** These methods take no parameters.
  * **Returns:** A Series or DataFrame of the same shape, filled with `True`/`False` values.

-----

#### Chaining with `.sum()`

```python
# Get counts for each COLUMN (default axis=0)
dataframe.isna().sum()

# Get counts for each ROW
dataframe.isna().sum(axis=1)
```

  * **`axis=0` (or `'index')`**: The default. Sums "down" the rows, giving a count *per column*.
  * **`axis=1` (or `'columns')`**: Sums "across" the columns, giving a count *per row*.

-----

### 1\. Basic Example (on a Series)

Let's see the three functions in action.

```python
import pandas as pd
import numpy as np

s = pd.Series([10, 20, np.nan, 40, None])
print("--- 1. Original Series ---")
print(s)

# Example 1: .isna()
# Returns True for np.nan and None
print("\n--- 2. Example 1: s.isna() ---")
print(s.isna())

# Example 2: .isnull() (Identical)
print("\n--- 3. Example 2: s.isnull() ---")
print(s.isnull())

# Example 3: .notna() (The opposite)
print("\n--- 4. Example 3: s.notna() ---")
print(s.notna())

# Example 4: Counting the missing values
print("\n--- 5. Example 4: s.isna().sum() ---")
print(s.isna().sum())
```

**Output:**

```
--- 1. Original Series ---
0    10.0
1    20.0
2     NaN
3    40.0
4     NaN
dtype: float64

--- 2. Example 1: s.isna() ---
0    False
1    False
2     True
3    False
4     True
dtype: bool

--- 3. Example 2: s.isnull() ---
0    False
1    False
2     True
3    False
4     True
dtype: bool

--- 4. Example 3: s.notna() ---
0     True
1     True
2    False
3     True
4    False
dtype: bool

--- 5. Example 4: s.isna().sum() ---
2
```

**Explanation:**
`.isna()` and `.isnull()` correctly identified the two missing values (`np.nan` and `None`) as `True`. `.notna()` did the opposite. `.isna().sum()` added up the `True` values (`1 + 1`) and gave us a total count of `2`.

-----

### 2\. Intermediate Example (on a DataFrame)

This is the most common use case: finding the total missing data *per column*.

```python
df = pd.DataFrame({
    'ID': [100, 101, 102, 103],
    'Name': ['Alice', 'Bob', 'Clara', np.nan],
    'Age': [25, 30, np.nan, 42],
    'Email': [None, 'bob@x.com', 'clara@x.com', np.nan]
})
print("--- 6. Original DataFrame ---")
print(df)

# Example 5: .isna() on a DataFrame
print("\n--- 7. Example 5: df.isna() (Boolean Mask) ---")
print(df.isna())

# Example 6: .isna().sum() (The *key* command)
# This is the summary you almost always want.
print("\n--- 8. Example 6: df.isna().sum() (Count per Column) ---")
print(df.isna().sum())

# Example 7: .notna().sum()
print("\n--- 9. Example 7: df.notna().sum() (Count per Column) ---")
print(df.notna().sum())
```

**Output:**

```
--- 6. Original DataFrame ---
    ID   Name   Age        Email
0  100  Alice  25.0         None
1  101    Bob  30.0    bob@x.com
2  102  Clara   NaN  clara@x.com
3  103    NaN  42.0          NaN

--- 7. Example 5: df.isna() (Boolean Mask) ---
      ID   Name    Age  Email
0  False  False  False   True
1  False  False  False  False
2  False  False   True  False
3  False   True  False   True

--- 8. Example 6: df.isna().sum() (Count per Column) ---
ID       0
Name     1
Age      1
Email    2
dtype: int64

--- 9. Example 7: df.notna().sum() (Count per Column) ---
ID       4
Name     3
Age      3
Email    2
dtype: int64
```

**Explanation:**
The output of `df.isna().sum()` is a new Series. The *index* of this new Series is the *columns* of `df`, and the *values* are the *counts* of `NaN`s in those columns. This instantly tells us: 'Name' is missing 1 value, 'Age' is missing 1, and 'Email' is missing 2.

-----

### 3\. Advanced or Tricky Case (Dates, Filtering, and Axis)

**Example 8: `NaT` (Not a Time) is also `NaN`**
`.isna()` is smart enough to find all "missing" types.

```python
s_time = pd.Series([
    pd.to_datetime('2025-01-01'), 
    pd.NaT, 
    pd.to_datetime('2025-01-03')
])
print("--- 10. Series with NaT ---")
print(s_time)

# Example 9: .isna() detects NaT
print("\n--- 11. Example 9: s_time.isna() ---")
print(s_time.isna())
```

**Output:**

```
--- 10. Series with NaT ---
0   2025-01-01
1          NaT
2   2025-01-03
dtype: datetime64[ns]

--- 11. Example 9: s_time.isna() ---
0    False
1     True
2    False
dtype: bool
```

**Example 10: Summing missing values by *row***
Use `axis=1` to find out *how many* columns are missing for each row.

```python
# Use the same df from Example 6
print("\n--- 12. Example 10: df.isna().sum(axis=1) (Count per Row) ---")
print(df.isna().sum(axis=1))
```

**Output:**

```
--- 12. Example 10: df.isna().sum(axis=1) (Count per Row) ---
0    1
1    0
2    1
3    2
dtype: int64
```

**Explanation:** This tells us: Row 0 is missing 1 value (`Email`). Row 1 is complete. Row 3 is missing 2 values (`Name` and `Email`).

**Example 11: Filtering for "good" rows with `.notna()`**
This is a very common use.

```python
# Get all rows that have a NON-missing email
print("\n--- 13. Example 11: Filter with .notna() ---")
good_rows = df[df['Email'].notna()]
print(good_rows)
```

**Output:**

```
--- 13. Example 11: Filter with .notna() ---
    ID   Name   Age        Email
1  101    Bob  30.0    bob@x.com
2  102  Clara   NaN  clara@x.com
```

**Example 12: Filtering for "bad" rows with `.isna()`**

```python
# Get all rows that are missing an email
print("\n--- 14. Example 12: Filter with .isna() ---")
bad_rows = df[df['Email'].isna()]
print(bad_rows)
```

**Output:**

```
--- 14. Example 12: Filter with .isna() ---
    ID   Name   Age Email
0  100  Alice  25.0  None
3  103    NaN  42.0   NaN
```

-----

### 4\. Real-World Use Case

**Example 13: The `df.info()` vs. `df.isna().sum()` Workflow**
This is the standard "data check" workflow.

```python
# 1. df.info() gives you "non-null" counts
print("--- 15. Example 13: The df.info() view ---")
df.info()

# 2. df.isna().sum() gives you "null" counts
print("\n--- 16. The df.isna().sum() view (more direct) ---")
print(df.isna().sum())
```

**Output:**

```
--- 15. Example 13: The df.info() view ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   ID      4 non-null      int64  
 1   Name    3 non-null      object 
 2   Age     3 non-null      float64
 3   Email   2 non-null      object 
dtypes: float64(1), int64(1), object(2)
memory usage: 256.0+ bytes

--- 16. The df.isna().sum() view (more direct) ---
ID       0
Name     1
Age      1
Email    2
dtype: int64
```

**Explanation:** `info()` shows 4 total entries, and 'Email' has 2 "non-nulls" (so you have to do the math: 4 - 2 = 2 missing). `isna().sum()` *directly* tells you: 'Email' has 2 missing. It's often clearer.

**Example 14: Finding columns *with any* missing data**

```python
print("\n--- 17. Example 14: Programmatic list of bad columns ---")
missing_counts = df.isna().sum()
cols_with_missing = missing_counts[missing_counts > 0]
print(cols_with_missing)
```

**Output:**

```
--- 17. Example 14: Programmatic list of bad columns ---
Name     1
Age      1
Email    2
dtype: int64
```

**Explanation:** This is a powerful programming pattern. We get the counts, then use a boolean filter *on the counts* to find only the columns that have 1 or more missing values.

-----

### 5\. Common Mistakes / Pitfalls

**Mistake 15: `np.nan == np.nan` (The Classic)**
You *cannot* use `==` to find `NaN`.

```python
print("\n--- 18. Mistake 15: The np.nan == np.nan pitfall ---")
print(f"np.nan == np.nan is: {np.nan == np.nan}")

# Wrong code
s = pd.Series([1, np.nan])
print("\n--- 19. Trying to filter with == np.nan (FAILS) ---")
print(s[s == np.nan])
```

**Error/Wrong Output:**

```
--- 18. Mistake 15: The np.nan == np.nan pitfall ---
np.nan == np.nan is: False

--- 19. Trying to filter with == np.nan (FAILS) ---
Series([], dtype: float64)
```

**Why it happens:** By definition in computing, `NaN` (Not a Number) is *never* equal to anything, not even itself.
**Correction:** You *must* use `s.isna()`: `s[s.isna()]`.

**Mistake 16: Forgetting `.sum()`**

```python
# Wrong code
print("\n--- 20. Mistake 16: Forgetting .sum() ---")
print(df.isna())
```

**Error/Wrong Output:**
(This prints a giant boolean DataFrame, not the counts you wanted.)
**Why it happens:** This is a common beginner mistake. You get the boolean "mask" but forget to "count" the `True` values with `.sum()`.

**Mistake 17: `.isna().count()` vs. `.isna().sum()`**

```python
s = pd.Series([1, 2, np.nan])
print("\n--- 21. Mistake 17: .count() is not .sum() ---")
print(f".isna().sum(): {s.isna().sum()}")   # Correct (1)
print(f".isna().count(): {s.isna().count()}") # Wrong (3)
```

**Error/Wrong Output:**

```
--- 21. Mistake 17: .count() is not .sum() ---
.isna().sum(): 1
.isna().count(): 3
```

**Why it happens:** `.count()` counts *all* entries in the Series. `.sum()` adds up the `True` values. You want `.sum()`.

**Mistake 18: `df.sum()` vs. `df.isna().sum()`**

```python
df_nums = pd.DataFrame({'A': [1, 10, np.nan]})
print("\n--- 22. Mistake 18: Confusing .sum() ---")
print(f"df.sum(): \n{df_nums.sum()}")
print(f"\ndf.isna().sum(): \n{df_nums.isna().sum()}")
```

**Error/Wrong Output:**

```
--- 22. Mistake 18: Confusing .sum() ---
df.sum(): 
A    11.0
dtype: float64

df.isna().sum(): 
A    1
dtype: int64
```

**Why it happens:** `df.sum()` adds the *data*. `df.isna().sum()` adds the *missing value flags*. They are totally different.



#  second subtopic: Detecting Missing Data Patterns.

-----

This topic is about moving from *counting* missing values to *finding the specific rows* that contain them. The two tools for this are `.any()` and `.all()`.

  * **`.any()`** asks the question: "Is there *at least one* `True` value here?"
      * `df.isna().any(axis=1)`: "Show me any **row** that has *at least one* missing value."
  * **`.all()`** asks the question: "Are *all* the values `True` here?"
      * `df.isna().all(axis=1)`: "Show me any **row** that is *completely empty* (all values are `NaN`)."

This is how you find the "problem rows" that you need to either drop or fix.

**How It Works in Memory**: First, `df.isna()` creates the `True`/`False` boolean mask, which is fast. Then, `.any(axis=1)` or `.all(axis=1)` performs a "reduction" *across* the rows (horizontally). It looks at all the `True`/`False` values in a single row and "reduces" them to a single `True` or `False` answer. This is a highly optimized C-level operation, so it's very fast. The result is a new, small `pd.Series` of booleans (one for each row) which you can then use as a filter.

**When to Use This**:

  * [cite\_start]Use `df[df.isna().any(axis=1)]` to get a "to-do list" of all rows that need cleaning[cite: 45].
  * Use `df[df.notna().all(axis=1)]` to find all "perfectly clean" rows that have *no* missing data.
  * Use `df[df.isna().all(axis=1)]` to find and delete *completely blank* rows, which are sometimes in data files.
  * Use `df.isna().any(axis=0)` (or just `df.isna().any()`) to find out *which columns* contain *any* `NaN`s.

-----

### 0\. Syntax & Parameters (MUST COME FIRST)

You call `.any()` or `.all()` on a boolean DataFrame (like the one `df.isna()` returns).

```python
dataframe_mask.any(axis=0, ...)
dataframe_mask.all(axis=0, ...)
```

  * **`axis`**
      * **What it does:** This is the most important parameter. It tells Pandas which *direction* to "collapse" or "reduce" the data.
      * **Default value:** `0` (or `'index'`)
      * **When you would use it:**
          * `axis=0` (default): Collapses "down" the rows. It asks, "Does this **column** have any `True`s?" This will return one answer *per column*.
          * `axis=1` (or `'columns'`): Collapses "across" the columns. It asks, "Does this **row** have any `True`s?" This will return one answer *per row*.
      * **Mnemonic:** `axis=1` works on rows. I remember this as "axis 1 goes on the fritz, scanning rows is its biz." Or, `axis=1` moves horizontally.

-----

### 1\. Basic Example

Let's see the two axes in action.

```python
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'ID': [100, 101, 102, 103],
    'Name': ['Alice', 'Bob', 'Clara', np.nan],
    'Age': [25, 30, np.nan, 42],
    'Email': [None, 'bob@x.com', 'clara@x.com', np.nan]
})
print("--- 1. Original DataFrame ---")
print(df)

# Example 1: Create the boolean mask
mask_df = df.isna()
print("\n--- 2. Example 1: The .isna() mask ---")
print(mask_df)

# Example 2: .any(axis=0) (the default)
# "Does this COLUMN have any True (missing) values?"
print("\n--- 3. Example 2: mask_df.any(axis=0) (per COLUMN) ---")
print(mask_df.any(axis=0))

# Example 3: .any(axis=1)
# "Does this ROW have any True (missing) values?"
print("\n--- 4. Example 3: mask_df.any(axis=1) (per ROW) ---")
print(mask_df.any(axis=1))
```

**Output:**

```
--- 1. Original DataFrame ---
    ID   Name   Age        Email
0  100  Alice  25.0         None
1  101    Bob  30.0    bob@x.com
2  102  Clara   NaN  clara@x.com
3  103    NaN  42.0          NaN

--- 2. Example 1: The .isna() mask ---
      ID   Name    Age  Email
0  False  False  False   True
1  False  False  False  False
2  False  False   True  False
3  False   True  False   True

--- 3. Example 2: mask_df.any(axis=0) (per COLUMN) ---
ID       False
Name      True
Age       True
Email     True
dtype: bool

--- 4. Example 3: mask_df.any(axis=1) (per ROW) ---
0     True
1    False
2     True
3     True
dtype: bool
```

**Explanation:**

  * `axis=0` (per column): 'ID' had *no* `True`s, so it's `False`. 'Name', 'Age', and 'Email' all had *at least one* `True`, so they are `True`.
  * `axis=1` (per row): Row 0 had a `True` (in 'Email'), so it's `True`. Row 1 was all `False`, so it's `False`. Rows 2 and 3 both had `True`s, so they are `True`.

-----

### 2\. Intermediate Example (Using as a Filter)

Now we can use the `True`/`False` Series from `axis=1` as a filter to *select* the problem rows.

```python
# Use the same mask_df from the previous example
mask_df = df.isna()
print("--- 5. Original DataFrame ---")
print(df)

# Example 4: Get the mask for "problem rows"
problem_rows_mask = mask_df.any(axis=1)
print("\n--- 6. Example 4: Mask for rows with ANY NaN ---")
print(problem_rows_mask)

# Example 5: Use the mask to filter the DataFrame
# [cite_start]This is the key pattern from the roadmap [cite: 45]
problem_rows = df[problem_rows_mask]
print("\n--- 7. Example 5: df[df.isna().any(axis=1)] ---")
print(problem_rows)
```

**Output:**

```
--- 5. Original DataFrame ---
    ID   Name   Age        Email
0  100  Alice  25.0         None
1  101    Bob  30.0    bob@x.com
2  102  Clara   NaN  clara@x.com
3  103    NaN  42.0          NaN

--- 6. Example 4: Mask for rows with ANY NaN ---
0     True
1    False
2     True
3     True
dtype: bool

--- 7. Example 5: df[df.isna().any(axis=1)] ---
    ID   Name   Age        Email
0  100  Alice  25.0         None
2  102  Clara   NaN  clara@x.com
3  103    NaN  42.0          NaN
```

**Explanation:**
This is the core concept. We generated the `problem_rows_mask` (`[True, False, True, True]`) and then used it as a filter on the original `df`. The result is a new DataFrame `problem_rows` containing *only* the rows (0, 2, and 3) that had at least one `NaN`.

**Example 6: Filtering for "perfect rows"**
We can do the opposite by using `.notna()` and `.all()`.

```python
# "Find rows where ALL values are NOTNA"
perfect_rows_mask = df.notna().all(axis=1)
print("\n--- 8. Example 6: Mask for 'perfect' rows ---")
print(perfect_rows_mask)

print("\n--- 9. 'Perfect' rows ---")
print(df[perfect_rows_mask])
```

**Output:**

```
--- 8. Example 6: Mask for 'perfect' rows ---
0    False
1     True
2    False
3    False
dtype: bool

--- 9. 'Perfect' rows ---
    ID Name   Age      Email
1  101  Bob  30.0  bob@x.com
```

**Explanation:**
`df.notna()` creates the opposite mask. `all(axis=1)` then checks which rows are `True` in *all* columns. Only Row 1 (Bob) was "perfectly" complete.

-----

### 3\. Advanced or Tricky Case (Using `.all()`)

`.all()` is less common, but useful for finding *completely* blank rows.

```python
# Add a completely blank row
df.loc[4] = [np.nan, np.nan, np.nan, np.nan]
print("--- 10. DataFrame with blank row ---")
print(df)

# Example 7: Find rows that are ALL NaN
all_nan_mask = df.isna().all(axis=1)
print("\n--- 11. Example 7: Mask for ALL NaN rows ---")
print(all_nan_mask)

print("\n--- 12. Blank rows ---")
print(df[all_nan_mask])

# Example 8: Find rows that are missing BOTH Name AND Email
cols_of_interest = ['Name', 'Email']
mask_both_missing = df[cols_of_interest].isna().all(axis=1)
print("\n--- 13. Example 8: Mask for missing Name AND Email ---")
print(mask_both_missing)

print("\n--- 14. Rows missing both ---")
print(df[mask_both_missing])
```

**Output:**

```
--- 10. DataFrame with blank row ---
      ID   Name   Age        Email
0  100.0  Alice  25.0         None
1  101.0    Bob  30.0    bob@x.com
2  102.0  Clara   NaN  clara@x.com
3  103.0    NaN  42.0          NaN
4    NaN    NaN   NaN          NaN

--- 11. Example 7: Mask for ALL NaN rows ---
0    False
1    False
2    False
3    False
4     True
dtype: bool

--- 12. Blank rows ---
    ID Name  Age Email
4 NaN  NaN  NaN   NaN

--- 13. Example 8: Mask for missing Name AND Email ---
0    False
1    False
2    False
3     True
4     True
dtype: bool

--- 14. Rows missing both ---
      ID Name   Age Email
3  103.0  NaN  42.0   NaN
4    NaN  NaN   NaN   NaN
```

**Explanation:**

  * **Example 7:** `df.isna().all(axis=1)` only returned `True` for row 4, which was `NaN` in every single column.
  * **Example 8:** This is an advanced trick. We first *sliced* the DataFrame (`df[cols_of_interest]`) and *then* ran our `isna().all(axis=1)` check. This let us find rows (3 and 4) that were missing *all* of the specific columns we cared about.

-----

### 4\. Real-World Use Case

**Example 9: Dropping all "problem" rows**
You have a dataset and you decide that any row with *any* missing data is unusable and must be dropped.

```python
print("--- 15. Example 9: Original DataFrame ---")
print(df) # This is our 5-row df

# 1. Get the mask for "problem" rows
mask = df.isna().any(axis=1)

# 2. Get the *indexes* of those rows
rows_to_drop = df[mask].index
print("\n--- 16. Indexes to drop ---")
print(rows_to_drop)

# 3. Drop them
df_clean = df.drop(rows_to_drop)
print("\n--- 17. Cleaned DataFrame ---")
print(df_clean)
```

**Output:**

```
--- 15. Example 9: Original DataFrame ---
      ID   Name   Age        Email
0  100.0  Alice  25.0         None
1  101.0    Bob  30.0    bob@x.com
2  102.0  Clara   NaN  clara@x.com
3  103.0    NaN  42.0          NaN
4    NaN    NaN   NaN          NaN

--- 16. Indexes to drop ---
Index([0, 2, 3, 4], dtype='int64')

--- 17. Cleaned DataFrame ---
      ID Name   Age      Email
1  101.0  Bob  30.0  bob@x.com
```

**Explanation:**
This workflow found all rows (0, 2, 3, 4) with at least one `NaN`, got their index labels, and then used `df.drop()` to remove them, leaving only the one "perfect" row. (Note: `df.dropna()` is the shortcut for this exact operation, which will be in the next topic).

**Example 10: Dropping only *completely blank* rows**
This is a safer cleaning step. You don't want to drop `row 0` (Alice, missing email), but you *do* want to drop `row 4` (all `NaN`).

```python
# 1. Get the mask for *all* NaN rows
mask = df.isna().all(axis=1)

# 2. Get the indexes
rows_to_drop = df[mask].index
print("\n--- 18. Example 10: Blank row indexes ---")
print(rows_to_drop)

# 3. Drop them
df_safer_clean = df.drop(rows_to_drop)
print("\n--- 19. Safer Clean (only blank row dropped) ---")
print(df_safer_clean)
```

**Output:**

```
--- 18. Example 10: Blank row indexes ---
Index([4], dtype='int64')

--- 19. Safer Clean (only blank row dropped) ---
      ID   Name   Age        Email
0  100.0  Alice  25.0         None
1  101.0    Bob  30.0    bob@x.com
2  102.0  Clara   NaN  clara@x.com
3  103.0    NaN  42.0          NaN
```

**Explanation:**
This time, the mask only identified `row 4`. The resulting `df_safer_clean` still has the partially-null rows, but the *completely useless* blank row is gone.

-----

### 5\. Common Mistakes / Pitfalls

**Mistake 11: Confusing `axis=0` and `axis=1`**
This is the \#1 mistake. You want to find *problem rows* but you use the *default axis*.

```python
# Wrong code
df = pd.DataFrame({'A': [1, np.nan], 'B': [3, 4]})
print("\n--- 20. Example 11: Original ---")
print(df)

# You want to find the problem ROW (row 1)
# But you use the default axis=0
mask_wrong = df.isna().any() # axis=0 is default
print("\n--- 21. Wrong: .any(axis=0) ---")
print(mask_wrong)

print("\n--- 22. Filter (FAILS) ---")
try:
    df[mask_wrong]
except KeyError as e:
    print(e)
```

**Error/Wrong Output:**

```
--- 20. Example 11: Original ---
     A  B
0  1.0  3
1  NaN  4

--- 21. Wrong: .any(axis=0) ---
A     True
B    False
dtype: bool

--- 22. Filter (FAILS) ---
"['A', 'B'] not in index"
```

**Why it happens:** `df.isna().any(axis=0)` returned a boolean Series *for the columns* (`A` is `True`, `B` is `False`). When you used this as a filter `df[mask_wrong]`, Pandas tried to find *columns* named `True` and `False`, which failed.
**Correction:** You *must* use `axis=1` to get a mask for the *rows*: `df[df.isna().any(axis=1)]`.

**Mistake 12: Using `.any()` to get "perfect rows"**
This is a logic mistake.

```python
# Wrong code
df = pd.DataFrame({'A': [1, np.nan], 'B': [3, 4]})
print("\n--- 23. Example 12: Logic mistake ---")
# You want row 0, but this...
mask = df.notna().any(axis=1) # "Has at least one good value"
print(df[mask])
```

**Error/Wrong Output:**

```
--- 23. Example 12: Logic mistake ---
     A  B
0  1.0  3
1  NaN  4
```

**Why it happens:** You asked "Show me rows with *any* good value." Row 1 has a `NaN`... but it *also* has a `4`, which is a good value. So `any()` returns `True`.
**Correction:** To find "perfect rows", you must use `.all()`: `df[df.notna().all(axis=1)]`.

**Mistake 13: Using `df.any()` instead of `df.isna().any()`**

```python
# Wrong code
df = pd.DataFrame({'A': [1, 0], 'B': [0, 0]})
print("\n--- 24. Example 13: .any() on numbers ---")
print(df.any(axis=1))
```

**Error/Wrong Output:**

```
--- 24. Example 13: .any() on numbers ---
0    True
1    False
dtype: bool
```

**Why it happens:** `.any()` on a numeric DataFrame checks for `0` (False) vs. non-zero (True). This has *nothing* to do with `NaN`s.
**Correction:** Always start with `.isna()` or `.notna()` to create the boolean mask *first*.

**Mistake 14: Confusing `.sum()` and `.any()`**

```python
# Both can be used for filtering, but one is clearer
df = pd.DataFrame({'A': [1, np.nan], 'B': [3, 4]})

# Using .sum()
mask_sum = df.isna().sum(axis=1) > 0
print("\n--- 25. Example 14: Filtering with .sum() ---")
print(df[mask_sum])

# Using .any()
mask_any = df.isna().any(axis=1)
print("\n--- 26. Filtering with .any() ---")
print(df[mask_any])
```

**Output:**

```
--- 25. Example 14: Filtering with .sum() ---
     A  B
1  NaN  4

--- 26. Filtering with .any() ---
     A  B
1  NaN  4
```

**Why it happens:** Both work. `sum() > 0` (is the sum of `True`s greater than 0?) is logically identical to `.any()` (is there *any* `True`?). However, `.any()` is more direct, more readable, and slightly faster as it can "short-circuit" (stop checking a row as soon as it finds one `True`), whereas `.sum()` has to count them all.

**Mistake 15: Not subsetting columns for `.all()`**

```python
df = pd.DataFrame({
    'A': [1, np.nan], 
    'B': [np.nan, 4], 
    'C': [5, 6]
})
print("\n--- 27. Example 15: .all() on whole DF ---")
# You're looking for rows missing both A and B
# But you do this...
mask = df.isna().all(axis=1)
print(df[mask])
```

**Error/Wrong Output:**

```
--- 27. Example 15: .all() on whole DF ---
Empty DataFrame
Columns: [A, B, C]
Index: []
```

**Why it happens:** You asked, "Show me rows where *all* columns (A, B, *and C*) are `NaN`." No row matched that.
**Correction:** You must *subset* the columns first: `df[df[['A', 'B']].isna().all(axis=1)]`.

----

Here are the combined remaining sections for all **Detection Methods** (`.isna()`, `.notna()`, `.isnull()`, `.sum()`, `.any()`, `.all()`).

-----

### 6\. Key Terms (Explained Simply)

  * **`NaN` (Not a Number):** The standard marker for missing *numeric* data.
  * **`None`:** The Python object for "nothing." It's also treated as a missing value in Pandas.
  * **`NaT` (Not a Time):** The "missing value" marker for `datetime64[ns]` columns. `.isna()` finds this too.
  * **Boolean Mask:** A DataFrame or Series of the *same shape* as the original, but filled with only `True` and `False` values. `df.isna()` returns one.
  * **`.isna()` / `.isnull()`:** Identical methods. They create a boolean mask, returning `True` for any missing value (`NaN`, `None`, `NaT`).
  * **`.notna()`:** The opposite. It returns `True` for any cell that *has a value*.
  * **`.sum()`:** When chained (e.g., `df.isna().sum()`), it treats `True` as `1` and `False` as `0`, effectively *counting* the missing values.
  * **`.any()`:** A method that "collapses" a boolean mask. It checks if *any* `True` value exists along an axis.
  * **`.all()`:** A method that "collapses" a boolean mask. It checks if *all* values are `True` along an axis.
  * **`axis=0`**: The "down" axis. When used in `.sum()`, `.any()`, or `.all()`, it produces one result *per column*.
  * **`axis=1`**: The "across" axis. When used in `.sum()`, `.any()`, or `.all()`, it produces one result *per row*.

-----

### 7\. Best Practices

  * **Always Check:** Run `df.info()` and `df.isna().sum()` *every time* you load a new dataset. This is the first step of data cleaning.
  * **Prefer `.isna()`:** Use `.isna()` and `.notna()`. They are the modern, clear standard. `.isnull()` is just an alias.
  * **Use `.sum()` for Counts:** The standard "missing value report" is `df.isna().sum()`. This gives you a fast count of `NaN`s *per column*.
  * **Use `.any(axis=1)` to Find Rows:** The standard way to find *all rows* that have *any* missing data is `df[df.isna().any(axis=1)]`.
  * **Use `.all(axis=1)` to Find Blank Rows:** The standard way to find *completely blank* rows is `df[df.isna().all(axis=1)]`.
  * **Never Use `== np.nan`:** You cannot find `NaN`s with `df == np.nan`. This will always return `False`. You *must* use `df.isna()`.

-----

### 8\. Mini Summary

  * Missing values are `NaN`, `None`, or `NaT`.
  * **`.isna()`** (or `.isnull()`) returns a `True`/`False` mask where `True` means "missing."
  * **`.notna()`** is the opposite, where `True` means "has a value."
  * **`df.isna().sum()`** (with default `axis=0`) is the most important command. It gives you a **count of missing values in every column**.
  * **`df.isna().any(axis=1)`** creates a mask to find **rows with *at least one* `NaN`**.
  * **`df.isna().all(axis=1)`** creates a mask to find **rows that are *completely* `NaN`**.

-----

### 10\. Practice Tasks

**Data for Tasks:**

```python
df_practice = pd.DataFrame({
    'name': ['Frodo', 'Sam', 'Pippin', 'Merry', np.nan],
    'age': [50, 33, np.nan, 36, 35],
    'email': ['f@shire.com', 's@shire.com', 'p@shire.com', np.nan, np.nan]
})
```

**Task 22 (Easy):**
Run the single command that will show you the *total count* of missing values for *each column* in `df_practice`.

**Task 23 (Medium):**
Find all rows in `df_practice` that are **missing an 'email'** address.

**Task 24 (Hard):**
Find all rows in `df_practice` that have **at least one missing value** (in *any* column).

**Bonus Task 25 (Hardest):**
Find all rows in `df_practice` that are "fully complete" (have **zero missing values**).

-----

### 11\. Recommended Next Topic

You have now mastered *detecting* missing data. The next logical step from the roadmap is to learn the specific methods for *handling* and *fixing* them.

[cite\_start]**Recommended:** **Handling strategies (`.dropna()`, `.fillna()`, `.interpolate()`)** [cite: 46-48]

-----

### 12\. Quick Reference Card

| Method | What It Does | Example |
| :--- | :--- | :--- |
| **`.isna()`** | (Alias `.isnull()`) Returns `True`/`False` mask. `True` = missing. | `df.isna()` |
| **`.notna()`** | Returns `True`/`False` mask. `True` = has value. | `df.notna()` |
| **`.isna().sum()`** | **(Most common)** Counts `NaN`s *per column*. | `df.isna().sum()` |
| **`.isna().sum(axis=1)`**| Counts `NaN`s *per row*. | `df.isna().sum(axis=1)` |
| **`.any(axis=1)`** | (On `isna()` mask) Returns `True` for rows with *any* `NaN`. | `df.isna().any(axis=1)` |
| **`.all(axis=1)`** | (On `isna()` mask) Returns `True` for rows that are *all* `NaN`. | `df.isna().all(axis=1)` |
| **`.any()` (default)** | (On `isna()` mask) Returns `True` for *columns* with *any* `NaN`. | `df.isna().any()` |

-----

### 13\. Common Interview Questions

1.  **What's the first thing you do when you get a new DataFrame?**
      * I run `df.info()` and `df.isna().sum()`. `info()` tells me the `dtypes` and non-null counts, and `isna().sum()` gives me a direct, scannable list of missing value counts per column.
2.  **How do you find all rows that have *any* missing data?**
      * I use a boolean filter with `.any()` on `axis=1`: `df[df.isna().any(axis=1)]`.
3.  **How do you find all rows that are *perfectly* complete, with no missing data?**
      * I use `.notna()` with `.all()` on `axis=1`: `df[df.notna().all(axis=1)]`.
4.  **What's the difference between `df.isna().sum()` and `df.isna().count()`?**
      * `.sum()` adds the `True` values (which are `1`s) to *count the `NaN`s*. This is what you want.
      * `.count()` simply counts *all* the boolean values in the mask (the total number of rows), which is not useful.
5.  **Why doesn't `df[df['col'] == np.nan]` work?**
      * `np.nan` is a special float, and by definition, it is *never* equal to anything, including itself. You *must* use the `df['col'].isna()` method.

-----

### 14\. Performance Considerations

  * **Time Complexity:** All these methods (`.isna()`, `.notna()`, `.sum()`, `.any()`, `.all()`) are highly optimized C-level operations. They are **O(n\*m)**, where 'n' is rows and 'm' is columns. For a Series, they are **O(n)**.
  * **Vectorization:** These are all fully vectorized. They are *the* fastest way to detect `NaN`s. A Python `for` loop to check for `NaN`s would be thousands of times slower.
  * **Memory Usage:** `df.isna()` creates a *new DataFrame* of booleans, which is the same size as the original but very memory-light (booleans are 1 byte each). The chained methods (`.sum()`, `.any()`, etc.) create a *new, very small Series* as their output. The memory impact is minimal.
  * **Short-Circuiting:** `.any()` and `.all()` are "short-circuiting." When checking a row, `.any()` will stop and return `True` on the *first* `True` it finds. `.all()` will stop and return `False` on the *first* `False` it finds. This makes them slightly faster than `.sum()`, which has to check every value.

-----

### 15\. When NOT to Use This

  * **Don't use `== np.nan`:** (As mentioned). It will fail.
  * **Don't use `.any()` or `.all()` on a non-boolean DataFrame:** `df.any()` will check if values are non-zero, not if they are `NaN`. Always start with `.isna()` or `.notna()` first.
  * **Don't confuse the axes:** `df.isna().sum(axis=1)` (count *per row*) is very different from `df.isna().sum()` (count *per column*). Be sure you know which one you're asking for.
  * **Don't use `.count()` to find `NaN`s:** `df.count()` counts *non-missing* values. `df.isna().sum()` counts *missing* values.