## ✅ 8. Handling Missing or Invalid Data in NumPy

In real-world data (like sensor data, Excel sheets, or datasets from Kaggle), you often encounter:
- **Missing values**
- **Invalid entries**
- **Corrupted or undefined numbers**

NumPy gives powerful tools to detect, filter, and clean such data.

---

### 🔸 1. `np.nan` – *"Not a Number"*

- `np.nan` is used to **represent missing or undefined values** in float arrays.
- It’s a special **floating-point constant** defined in IEEE 754.



In [2]:
import numpy as np

arr = np.array([1, 2, np.nan, 4, 5])
print(arr)

[ 1.  2. nan  4.  5.]


<mark> note </mark> - nan is not equal to anything , even itself 


In [None]:
print(np.nan == np.nan)  


False


### 2. np.isnan() – Detect Missing Values

- Returns a boolean array where True represents a nan.

In [4]:
arr = np.array([1, 2, np.nan, 4])

mask = np.isnan(arr)
print(mask)
# Output: [False False  True False]

# Filter only non-nan values
print(arr[~mask])  # [1. 2. 4.]


[False False  True False]
[1. 2. 4.]


### 3. np.isfinite() – Check for Valid Numbers

- Returns True for numbers that are not inf, -inf, or nan

In [5]:
arr = np.array([1, 2, np.nan, np.inf, -np.inf, 4])

print(np.isfinite(arr))
# Output: [ True  True False False False  True]

# Filter only valid numbers
print(arr[np.isfinite(arr)])  # [1. 2. 4.]


[ True  True False False False  True]
[1. 2. 4.]


### 4. np.where() – Conditional Filtering

- np.where(condition, value_if_true, value_if_false)

- Useful for replacing or selecting based on a condition.

In [6]:
# example nan with zero 
arr = np.array([1, 2, np.nan, 4])

cleaned = np.where(np.isnan(arr), 0, arr)
print(cleaned)  # [1. 2. 0. 4.]


[1. 2. 0. 4.]


In [7]:
# example negative with 0
a = np.array([1, -2, 3, -4, 5])

print(np.where(a < 0, 0, a))  # [1 0 3 0 5]


[1 0 3 0 5]


###  5. Masked Arrays – Advanced Handling of Missing Data

- A Masked Array allows you to hide (mask) specific values from computation.



In [10]:
import numpy.ma as ma

data = np.array([1, 2, np.nan, 4, 5])
masked = ma.masked_invalid(data)
print(masked)
# Output: [1.0 2.0 -- 4.0 5.0]


[1.0 2.0 -- 4.0 5.0]


In [11]:
#  Computations ignoring masked values:
print(masked.mean())  # Output: 3.0 (ignores nan)



3.0


In [12]:
# Manually mask specific values
arr = np.array([1, 2, 3, 4, 5])
masked_arr = ma.masked_where(arr > 3, arr)
print(masked_arr)
# Output: [1 2 3 -- --]


[1 2 3 -- --]
