### Missingness Types

#### Missing Completely at Random (MCAR)
- **Definition**: For all cases, the probability of missingness is the same. Missingness is highly independent, independent from any variables in the study, including itself. Reasons for missingness are random and external to the data being studied.
- *Example*: 
  - Printer error causing some survey responses to be lost.

---

#### Missing at Random (MAR)
- **Definition**: Missingness is not random, but it is caused by other variables in the dataset that have no missingness themselves. However, the missingness does not relate to the missing data variable itself.
- *Example*: 
  - Men are less likely to report weight. If we control for gender, and gender has no missing data, then the missingness of weight becomes random.

---

#### Missing Not at Random (MNAR)
- **Definition**: The missingness of data is related to the actual values that are missing. The missing data depends on information that is recorded, and this information also predicts the missing data.
- *Example*: 
  - People with higher incomes are less likely to disclose their income on a survey. Here, the missing data on income is related to the income value itself.

### Strategies to Deal with Missingness

#### Discard Entire Row and Column with NaNs
- *Pros/Cons*: 
  - Potential loss of valuable data which could impact the dataset's representativeness and model accuracy.  
---

#### Univariate vs. Multivariate Imputation
- **Univariate Imputation**:
  - *Description*: Imputes values in the i-th feature dimension using only non-missing values in that feature dimension.
  - *Example*: `SimpleImputer` in scikit-learn.
- **Multivariate Imputation**:
  - *Description*: Uses the entire set of available feature dimensions to estimate the missing values.
  - *Example*: `IterativeImputer` in scikit-learn.

In [2]:
"""
df.dropna(subset = [''])
"""

"\ndf.dropna(subset = [''])\n"