In [None]:
Here is the complete explanation for `.fillna()` and `.dropna()`.

-----

`NaN` (Not a Number) values are the "empty cells" in your data. They are poisonous to calculations; for example, `10 + NaN = NaN`. `.fillna()` and `.dropna()` are the two primary tools you *must* use to clean this missing data before you can do any analysis.

Think of `.fillna()` as a "patch" - it fills the empty hole with a value you choose (like 0, the average, or the last known value). Think of `.dropna()` as "surgery" - it completely removes the entire row or entry that has a hole.

**How It Works in Memory**: Both methods, by default, create a **new** Series (a copy) in memory. `.fillna()` creates a new Series of the *same size*, with the `NaN` values replaced. `.dropna()` creates a *smaller* new Series, as it has filtered out some of the original rows. If you use `inplace=True`, Pandas avoids creating this new copy and modifies your original Series directly in its existing memory, which can be faster but is generally discouraged.

**When to Use This**: This is one of the first steps in any data cleaning workflow.

  * Use **`.fillna(0)`** when a missing value truly means "zero" (e.g., "sales" for a product that didn't sell).
  * Use **`.fillna(s.mean())`** when you want to fill a missing number without skewing the dataset's average (e.g., a missing test score).
  * Use **`.fillna(method='ffill')`** (forward-fill) when data is sequential (like time-series) and you want to carry the last known value forward.
  * Use **`.dropna()`** as a last resort, when you have enough data that you can afford to lose the entire entry, or when the entry is so incomplete that it's unusable.

-----

### 0\. Syntax & Parameters (MUST COME FIRST)

#### `series.fillna()`

Fills missing `NaN` values using a specified method.

```python
series.fillna(value=None, method=None, inplace=False, limit=None, ...)
```

  * **`value`**

      * **What it does:** The value you want to use to fill the holes. This can be a single value (like `0`), a dictionary, or another Series.
      * **Default value:** `None`
      * **When you would use it:** This is the most common parameter. Use it to fill all `NaN`s with a specific value (e.g., `s.fillna(0)`).
      * **What happens if you don't specify it:** You must specify either `value` or `method`.

  * **`method`**

      * **What it does:** Specifies *how* to fill the `NaN`s, based on other values in the Series. The two you must know are `'ffill'` (forward-fill) and `'bfill'` (backward-fill).
      * **Default value:** `None`
      * **When you would use it:** Use `'ffill'` (or its alias `'pad'`) to fill a `NaN` with the *last valid value* that came before it. Use `'bfill'` (or its alias `'backfill'`) to fill with the *next valid value* after it. This is essential for time-series data.
      * **What happens if you don't specify it:** No fill method is used.

  * **`inplace`**

      * **What it does:** A boolean (True/False). If `True`, it modifies the original Series *directly* and returns `None`. If `False`, it returns a *new copy* of the Series with the changes, leaving the original untouched.
      * **Default value:** `False`
      * **When you would use it:** You use `inplace=True` to save memory, as it avoids creating a new Series. However, this is generally **discouraged** as it's less predictable and breaks method chaining.
      * **What happens if you don't specify it:** The default `False` is used, which is safer.

  * **`limit`**

      * **What it does:** An integer specifying the maximum number of *consecutive* `NaN`s to fill.
      * **Default value:** `None`
      * **When you would use it:** Use this when you only want to patch "small" gaps. For example, `limit=1` would fill a single `NaN` but leave a gap of two `NaN`s untouched.
      * **What happens if you don't specify it:** All `NaN`s are filled.

-----

#### `series.dropna()`

Removes (drops) missing `NaN` values.

```python
series.dropna(inplace=False, ...)
```

  * **`inplace`**
      * **What it does:** Same as in `fillna`. If `True`, it removes the `NaN`s from the original Series and returns `None`. If `False`, it returns a *new, smaller* copy.
      * **Default value:** `False`
      * **When you would use it:** Same as `fillna`.
      * **What happens if you don't specify it:** The default `False` is used.

-----

### 1\. Basic Example

Let's see the two methods in their simplest forms.

**Example 1: Using `.dropna()`**

```python
import pandas as pd
import numpy as np

scores = pd.Series([85, 92, np.nan, 78], index=['Alice', 'Bob', 'Clara', 'David'])
print("--- Original Series ---")
print(scores)

# Drop the NaN entry
scores_dropped = scores.dropna()

print("\n--- After .dropna() ---")
print(scores_dropped)

print("\n--- Original is Unchanged ---")
print(scores)
```

**Output:**

```
--- Original Series ---
Alice    85.0
Bob      92.0
Clara     NaN
David    78.0
dtype: float64

--- After .dropna() ---
Alice    85.0
Bob      92.0
David    78.0
dtype: float64

--- Original is Unchanged ---
Alice    85.0
Bob      92.0
Clara     NaN
David    78.0
dtype: float64
```

**Explanation:**
`.dropna()` created a new Series `scores_dropped` that simply omits the `'Clara'` entry. Notice the index for 'Clara' is gone. The original `scores` Series is left unchanged because `inplace=False` is the default.

**Example 2: Using `.fillna()` with a value**

```python
# Use the same original 'scores' Series
print("--- Original Series ---")
print(scores)

# Fill the NaN with 0 (e.g., if Clara didn't take the test)
scores_filled = scores.fillna(0)

print("\n--- After .fillna(0) ---")
print(scores_filled)
```

**Output:**

```
--- Original Series ---
Alice    85.0
Bob      92.0
Clara     NaN
David    78.0
dtype: float64

--- After .fillna(0) ---
Alice    85.0
Bob      92.0
Clara     0.0
David    78.0
dtype: float64
```

**Explanation:**
`.fillna(0)` created a new Series `scores_filled` that is the *same size* as the original, but the `NaN` value at index 'Clara' has been replaced with `0.0`.

-----

### 2\. Intermediate Example

The `method` parameter in `fillna` is powerful, especially for sequential data.

**Example 3: Forward-fill (`ffill`)**
This fills a `NaN` with the last *good* value that came before it.

```python
# Stock prices, with 'Wed' and 'Thu' missing
prices = pd.Series(
    [100, 102, np.nan, np.nan, 101], 
    index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri']
)
print("--- Original Prices ---")
print(prices)

# Forward-fill: carry the last known price (102 from Tue)
prices_ffill = prices.fillna(method='ffill')

print("\n--- After method='ffill' ---")
print(prices_ffill)
```

**Output:**

```
--- Original Prices ---
Mon    100.0
Tue    102.0
Wed      NaN
Thu      NaN
Fri    101.0
dtype: float64

--- After method='ffill' ---
Mon    100.0
Tue    102.0
Wed    102.0
Thu    102.0
Fri    101.0
dtype: float64
```

**Explanation:**
The `NaN` at 'Wed' was filled with the value from 'Tue' (102.0). Then, the `NaN` at 'Thu' was filled with the *newly filled* value from 'Wed' (also 102.0).

**Example 4: Backward-fill (`bfill`)**
This fills a `NaN` with the next *good* value that comes after it.

```python
# Use the same original 'prices' Series
print("--- Original Prices ---")
print(prices)

# Backward-fill: fill with the next known price (101 from Fri)
prices_bfill = prices.fillna(method='bfill')

print("\n--- After method='bfill' ---")
print(prices_bfill)
```

**Output:**

```
--- Original Prices ---
Mon    100.0
Tue    102.0
Wed      NaN
Thu      NaN
Fri    101.0
dtype: float64

--- After method='bfill' ---
Mon    100.0
Tue    102.0
Wed    101.0
Thu    101.0
Fri    101.0
dtype: float64
```

**Explanation:**
The `NaN` at 'Thu' was filled with the value from 'Fri' (101.0). The `NaN` at 'Wed' was then filled with the *newly filled* value from 'Thu' (also 101.0).

-----

### 3\. Advanced or Tricky Case

**Example 5: Using `inplace=True`**
This is a common point of confusion. `inplace=True` modifies the *original* object and returns `None`.

```python
s = pd.Series([1, 2, np.nan])
print("--- Original s (id) ---")
print(id(s))
print(s)

# This returns NOTHING
return_value = s.fillna(0, inplace=True)

print("\n--- Return Value from inplace=True ---")
print(return_value)

print("\n--- Original s is now CHANGED ---")
print(id(s)) # Note: The id is the SAME
print(s)
```

**Output:**

```
--- Original s (id) ---
2216110996880
0    1.0
1    2.0
2    NaN
dtype: float64

--- Return Value from inplace=True ---
None

--- Original s is now CHANGED ---
2216110996880
0    1.0
1    2.0
2    0.0
dtype: float64
```

**Explanation:**
The `return_value` is `None`. The original Series `s` was modified in-place, which is why its `id` (memory location) did not change. This is efficient but dangerous if you aren't expecting it.

**Example 6: Using `limit`**
This is useful for patching small gaps but leaving large ones.

```python
# A Series with a 1-day gap and a 2-day gap
s = pd.Series([10, np.nan, 20, np.nan, np.nan, 30])
print("--- Original ---")
print(s)

# Fill, but only 1 step forward
s_filled = s.fillna(method='ffill', limit=1)

print("\n--- After ffill with limit=1 ---")
print(s_filled)
```

**Output:**

```
--- Original ---
0    10.0
1     NaN
2    20.0
3     NaN
4     NaN
5    30.0
dtype: float64

--- After ffill with limit=1 ---
0    10.0
1    10.0
2    20.0
3    20.0
4     NaN
5    30.0
dtype: float64
```

**Explanation:**

  * The `NaN` at index 1 was filled with `10.0` from index 0.
  * The `NaN` at index 3 was filled with `20.0` from index 2.
  * The `NaN` at index 4 was *not* filled because the `limit=1` was already "used up" by the fill at index 3.

-----

### 4\. Real-World Use Case

**Example 7: Filling with the mean**
This is a very common technique in data science to avoid losing a row of data.

```python
# Ages of survey respondents, with one missing
ages = pd.Series([25, 30, 42, np.nan, 28])
print("--- Original Ages ---")
print(ages)

# Calculate the mean of the *existing* data
mean_age = ages.mean()
print(f"\nMean age (calculated): {mean_age}")

# Fill the missing value with the mean
ages_filled = ages.fillna(mean_age)

print("\n--- Filled Ages ---")
print(ages_filled)
```

**Output:**

```
--- Original Ages ---
0    25.0
1    30.0
2    42.0
3     NaN
4    28.0
dtype: float64

Mean age (calculated): 31.25

--- Filled Ages ---
0    25.00
1    30.00
2    42.00
3    31.25
4    28.00
dtype: float64
```

**Explanation:**
We filled the missing age with `31.25`. This is a *statistical* imputation. It keeps the row in our dataset (so we don't lose the person's other data) and does so in a way that doesn't change the original mean.

-----

### 5\. Common Mistakes / Pitfalls

**Mistake 8: Forgetting to assign the result (The \#1 Mistake)**

```python
# Wrong code
s = pd.Series([1, 2, np.nan])
print("--- Before ---")
print(s)

# Try to fill, but don't assign the result
s.fillna(0) 

print("\n--- After (Still has NaN!) ---")
print(s)
```

**Error/Wrong Output:**

```
--- Before ---
0    1.0
1    2.0
2    NaN
dtype: float64

--- After (Still has NaN!) ---
0    1.0
1    2.0
2    NaN
dtype: float64
```

**Why it happens:**
`s.fillna(0)` creates a *new* Series with the fill, but you didn't save it. The original `s` is unchanged because `inplace=False` is the default.

**Example 9: Corrected code (Two ways)**

```python
# Way 1: Re-assignment (Preferred)
s = pd.Series([1, 2, np.nan])
s = s.fillna(0) 
print("--- Corrected 1 (Re-assigned) ---")
print(s)

# Way 2: In-place
s = pd.Series([1, 2, np.nan])
s.fillna(0, inplace=True)
print("\n--- Corrected 2 (In-place) ---")
print(s)
```

**Output:**

```
--- Corrected 1 (Re-assigned) ---
0    1.0
1    2.0
2    0.0
dtype: float64

--- Corrected 2 (In-place) ---
0    1.0
1    2.0
2    0.0
dtype: float64
```

-----

### 6\. Key Terms (Explained Simply)

  * **`NaN` (Not a Number):** The standard marker for missing data in Pandas (and NumPy).
  * **Imputation:** The act of filling in missing data with substitute values (e.g., filling with the mean).
  * **`inplace`:** An operation that modifies the data *in its original memory location* instead of creating a new copy.
  * **`ffill` (Forward-fill):** Filling a `NaN` with the last valid value that appeared *before* it.
  * **`bfill` (Backward-fill):** Filling a `NaN` with the next valid value that appears *after* it.
  * **Re-assignment:** The practice of saving the result of an operation back to the original variable (e.g., `s = s.fillna(0)`).

-----

### 7\. Best Practices

  * **Prefer re-assignment over `inplace=True`**. It's safer, more readable, and works with method chaining (e.g., `s.fillna(0).astype(int)`).
  * **Think about your fill value.** Don't just `fillna(0)`. Is `0` a meaningful value, or should you use the `mean()`, `median()`, `mode()`?
  * **Use `ffill`/`bfill` for ordered data only.** It makes sense for time-series or a sorted list, but not for unordered data like customer IDs.
  * **Drop data sparingly.** `dropna()` is a last resort. You are *deleting* information, which can bias your results if not done carefully.
  * **Check `dtype`:** Filling an `int` Series with `NaN` changes its `dtype` to `float`. Filling it back with `0` will keep it `float`. You may need to use `s.astype(int)` after.

-----

### 8\. Mini Summary

  * Missing data is represented by **`NaN`**.
  * **`.dropna()`** removes rows with `NaN`s (returns a smaller Series).
  * **`.fillna()`** replaces `NaN`s (returns a same-sized Series).
  * You can fill with a specific value (`.fillna(0)`), a statistic (`.fillna(s.mean())`), or a `method` (`'ffill'`, `'bfill'`).
  * By default, these methods return a **new copy**. You must re-assign (`s = s.fillna(0)`) or use `inplace=True`.

-----

### 10\. Practice Tasks

**Data for Tasks:**
`s1 = pd.Series([5, 10, np.nan, 20])`
`s2 = pd.Series([np.nan, 100, 102, np.nan, np.nan, 103])`

**Task 12 (Easy):**
Create a new Series `s1_clean` from `s1` by simply removing the missing value.

**Task 13 (Medium):**
Create a new Series `s1_imputed` from `s1` by filling the missing value with the *average* of the *other* values in `s1`.

**Task 14 (Hard):**
Create a new Series `s2_patched` from `s2` by:

1.  Forward-filling the first `NaN` (at index 0).
2.  Backward-filling the *last* two `NaN`s (at index 3 and 4).
    (Hint: You may need to chain methods or use `limit`).

-----

### 11\. Recommended Next Topic

After handling missing values, the next common data cleaning task is to find and handle *duplicate* entries.

**Recommended:** **Handling Duplicates (`.duplicated()`, `.drop_duplicates()`)**

-----

### 12\. Quick Reference Card

| Method | Main Parameter | What It Does |
| :--- | :--- | :--- |
| **`.fillna()`** | `value=...` | Fills `NaN` with a specific scalar (e.g., `0`) or statistic (e.g., `s.mean()`). |
| **`.fillna()`** | `method='ffill'` | Fills `NaN` with the *last* valid value. |
| **`.fillna()`** | `method='bfill'` | Fills `NaN` with the *next* valid value. |
| **`.fillna()`** | `limit=...` | Max number of *consecutive* `NaN`s to fill. |
| **`.dropna()`** | (None) | Returns a new, smaller Series with all `NaN` entries removed. |
| **(Both)** | `inplace=True` | **Modifies the original Series** and returns `None`. (Use with caution\!) |

-----

### 13\. Common Interview Questions

1.  **How do you handle missing data in a Pandas Series?**
      * **Two main ways:**
      * **Imputation (Filling):** Use `.fillna()`. You can fill with a constant (`0`), a statistic (`s.mean()` or `s.median()`), or a propagation method like `method='ffill'` (forward-fill) for time-series.
      * **Removal (Dropping):** Use `.dropna()`. This removes all entries with `NaN` values.
2.  **I ran `s.fillna(0)` but my Series still has `NaN`s. What happened?**
      * You forgot to **re-assign** the result. `.fillna()` returns a *new copy* by default. The correct way is `s = s.fillna(0)` or, alternatively, `s.fillna(0, inplace=True)`.
3.  **When would you use `ffill` vs. filling with the mean?**
      * Use **`ffill`** when the data is **sequential** and *ordered*, like a stock price. The last known price is a better guess for today than the all-time average price.
      * Use the **mean** when the data is **not sequential**, like the ages of survey respondents. The average age is a good statistical substitute, whereas the "last person's age" (`ffill`) is meaningless.

-----

### 14\. Performance Considerations

  * **Time Complexity:** Both `.fillna()` and `.dropna()` are **O(n)**, where 'n' is the number of elements in the Series. They must iterate through the entire Series once.
  * **Memory Usage (Copy vs. View):**
      * By default (`inplace=False`), both methods return a **new copy**. This temporarily doubles memory usage (original + new). `.dropna()` will likely result in a *smaller* final object.
      * Using `inplace=True` avoids this copy, modifying the data in-place. This is more memory-efficient and can be slightly faster, but it's less safe and breaks the popular "method chaining" paradigm, so it's generally discouraged in modern Pandas code.

-----

### 15\. When NOT to Use This

  * **Don't use `.fillna(0)` blindly.** If `0` is a valid, real value (e.g., temperature in Celsius), filling `NaN`s with `0` will artificially skew your data. You might use `s.mean()` instead.
  * **Don't use `.dropna()` on small datasets.** If you have 100 rows and 30 have a `NaN`, dropping them means you've thrown away 30% of your data, which will likely bias your results. Imputation (`fillna`) is better here.
  * **Don't use `ffill` on unsorted data.** If you have time-series data that isn't sorted by date, `ffill` will fill `NaN`s with the wrong "last" value. Always sort your data first (e.g., `s.sort_index(inplace=True)`).
  * **Don't fill `NaN`s before you understand *why* they are `NaN`.** Does `NaN` mean "Not Applicable"? "Not Measured"? "Zero"? The correct "fix" depends entirely on the *meaning* of the missing data.