
# STEP 5: Fix Incorrect Data Types (Complete Pandas Guide)

This notebook demonstrates **ALL practical ways to detect and fix incorrect data types**
in a Pandas DataFrame.

Focus: **numbers as strings, dates, booleans, categories, and safe conversions**.


In [None]:

import pandas as pd
import numpy as np


## 1. Sample Dataset with Incorrect Data Types

In [None]:

df = pd.DataFrame({
    "order_id": ["1001", "1002", "1003", "1004"],
    "amount": ["2500", "1800.50", "invalid", "3000"],
    "order_date": ["2024-01-01", "2024/01/05", "not_a_date", "2024-02-10"],
    "is_active": ["Yes", "No", "Yes", "No"],
    "city": ["Mumbai", "Delhi", "Mumbai", "Pune"]
})
df


## 2. Inspect Current Data Types

In [None]:

df.dtypes
df.info()


## 3. Convert Numeric Columns Safely

In [None]:

df_numeric = df.copy()
df_numeric['order_id'] = pd.to_numeric(df_numeric['order_id'], errors='coerce')
df_numeric['amount'] = pd.to_numeric(df_numeric['amount'], errors='coerce')
df_numeric


## 4. Convert Date Columns Safely

In [None]:

df_dates = df.copy()
df_dates['order_date'] = pd.to_datetime(df_dates['order_date'], errors='coerce')
df_dates


## 5. Convert Boolean-like Columns

In [None]:

df_bool = df.copy()
df_bool['is_active'] = df_bool['is_active'].map({'Yes': True, 'No': False})
df_bool


## 6. Convert Columns to Category (Memory Optimization)

In [None]:

df_cat = df.copy()
df_cat['city'] = df_cat['city'].astype('category')
df_cat.dtypes


## 7. Force Casting After Validation

In [None]:

df_force = df_numeric.copy()
df_force = df_force[df_force['amount'].notnull()]
df_force['amount'] = df_force['amount'].astype(float)
df_force


## 8. Convert Multiple Columns at Once

In [None]:

df_multi = df.copy()
df_multi = df_multi.apply(pd.to_numeric, errors='ignore')
df_multi.dtypes


## 9. Detect Conversion Errors

In [None]:

df_numeric.isnull()


## 10. Best Practices & Interview Notes

In [None]:

# ✔ Always inspect types before and after conversion
# ✔ Use errors='coerce' to avoid crashes
# ✔ Validate data ranges before force casting
# ✔ Convert object to category for performance



## ✔ Summary
- Incorrect data types break analysis
- `pd.to_numeric()` and `pd.to_datetime()` are safest
- Booleans and categories need explicit handling
- Fixing data types is mandatory before analysis
