# Handling Missing Data Values
- As a data analyst or data scientist, you will be dealing with messy data while working on real-life business use cases. 
- How you handle invalid or missing data becomes an important skill.
- Important Real World Concept of `[-99999, -1, 999999, 0(in some contexts)]`

**Why companies use values like -99999 as NaN**

In many real-world systems (especially older or high-performance ones):
- Databases did not support NULL / NaN properly
- Sensors or financial feeds must send a number
- Missing data had to be explicitly marked
- So companies use *sentinel values `[-99999, -1, 999999, 0]`*

**Why this is dangerous if you don’t handle it**

If you don’t clean it:
- Mean, median → completely wrong
- Min values → always -99999
- ML models → learn nonsense
- Visualizations → broken

This is why data cleaning is critical.

In [1]:
import pandas as pd
import numpy as np
import csv

In [3]:
df = pd.read_csv('weather_data.csv')
df

Unnamed: 0,day,temperature,windspeed,event
0,01/01/2017,32,6,Rain
1,01/02/2017,-99999,7,Sunny
2,01/03/2017,28,-99999,Snow
3,01/04/2017,-99999,7,no event
4,01/05/2017,32,-99999,Rain
5,01/06/2017,31,2,Sunny
6,01/06/2017,34,-88888,no event


### `replace()` function
replace() is used to substitute specific values in your dataset with other values.
- **Syntax:** `df.replace('to_replace', 'value')`
    1. 'to_replace': value what you want to change,
    2. 'value': what you want to replace it with.
    3. inplace=False/true(optional)
- Most commonly used to:
    1. Fix sentinel values (like -99999)
    2. Standardize inconsistent data
    3. Clean categorical values
    4. Prepare data for analysis or ML
- replace() cleans known bad values. 
- Used before statistics or ML
- Works on:
    1. Numbers
    2. Strings
    3. Lists
    4. Dictionaries

Essential for real-world messy data: *Finance, Sensor data, Legacy databases*

In [None]:
df.replace(-99999, np.nan)  #will replace `-99999` value to numpy's NaN value.

Unnamed: 0,day,temperature,windspeed,event
0,01/01/2017,32.0,6.0,Rain
1,01/02/2017,,7.0,Sunny
2,01/03/2017,28.0,,Snow
3,01/04/2017,,7.0,no event
4,01/05/2017,32.0,,Rain
5,01/06/2017,31.0,2.0,Sunny
6,01/06/2017,34.0,-88888.0,no event


#### Replace multiple values at once

In [6]:
df.replace([-99999,-88888], np.nan) #here we provided list of values we want to replace

Unnamed: 0,day,temperature,windspeed,event
0,01/01/2017,32.0,6.0,Rain
1,01/02/2017,,7.0,Sunny
2,01/03/2017,28.0,,Snow
3,01/04/2017,,7.0,no event
4,01/05/2017,32.0,,Rain
5,01/06/2017,31.0,2.0,Sunny
6,01/06/2017,34.0,,no event
