Arbitrary value imputation is a method used to handle missing data in a dataset. It involves filling in missing values with a fixed (arbitrary) number, usually chosen manually — not derived from the data like mean, median, or mode.

If you use arbitrary value imputation and decide to fill all missing values with -999 or 0 or 100, that's arbitrary imputation:


In [None]:
import pandas as pd
df = pd.read_csv('x')
df['Age'].fillna(-999, inplace=True)


💡 Why use arbitrary values?
To flag missing data explicitly (e.g., using -999 makes it easy to identify them later).

Useful in tree-based models (like Random Forests) which can treat such values separately.

When you know the missing value should not be treated as a "typical" number from the distribution.

⚠️ Caveats:
Not always a good idea for linear models — arbitrary values can introduce bias.

Should be used carefully, ideally when the value doesn’t overlap with real data values.



| Method                         | Description                                                                                                       |
| ------------------------------ | ----------------------------------------------------------------------------------------------------------------- |
| **Arbitrary Value Imputation** | Replace missing values with a **fixed, manually chosen value**, such as `-999`, `0`, or `100`.                    |
| **Min/Max Value Imputation**   | Replace missing values with the **minimum or maximum value** of the feature column (from the non-missing values). |


| Method    | Advantages                                                                    |
| --------- | ----------------------------------------------------------------------------- |
| Arbitrary | Helps **flag missing data** clearly; especially useful for tree-based models. |
| Min/Max   | Keeps imputed values **within actual data range**, preventing outlier bias.   |
