üü¶ 1. Import Libraries

In [1]:
import pandas as pd
import numpy as np

üü¶ 2. Create Sample Messy Dataset

In [2]:
data = {
    "id": ["1", "2", "3", "4"],
    "age": ["25", "30", "unknown", "40"],
    "salary": ["70000", "80000", "85000.5", "not available"],
    "hire_date": ["2021-01-05", "2020/03/12", "invalid_date", "2019-08-20"],
    "active": ["True", "False", "True", "False"]
}

df = pd.DataFrame(data)
df

Unnamed: 0,id,age,salary,hire_date,active
0,1,25,70000,2021-01-05,True
1,2,30,80000,2020/03/12,False
2,3,unknown,85000.5,invalid_date,True
3,4,40,not available,2019-08-20,False


üü¶ 3. View Current Data Types

In [3]:
df.dtypes

id           object
age          object
salary       object
hire_date    object
active       object
dtype: object

üü¶ 4. Convert Numeric Columns using astype()

In [4]:
df["id"] = df["id"].astype(int)
df["id"].dtype

dtype('int64')

In [5]:
# ‚ùó This fails because "unknown" is not numeric.
df["age"].astype(int)

ValueError: invalid literal for int() with base 10: 'unknown'

üü¶ 5. Safe Numeric Conversion using pd.to_numeric()

In [6]:
#Now invalid values become NaN

df["age"] = pd.to_numeric(df["age"], errors="coerce")
df["salary"] = pd.to_numeric(df["salary"], errors="coerce")

df

Unnamed: 0,id,age,salary,hire_date,active
0,1,25.0,70000.0,2021-01-05,True
1,2,30.0,80000.0,2020/03/12,False
2,3,,85000.5,invalid_date,True
3,4,40.0,,2019-08-20,False


In [7]:
df.dtypes

id             int64
age          float64
salary       float64
hire_date     object
active        object
dtype: object

üü¶ 6. Convert Date Columns using pd.to_datetime()

In [8]:
df["hire_date"] = pd.to_datetime(df["hire_date"], errors="coerce")
df

Unnamed: 0,id,age,salary,hire_date,active
0,1,25.0,70000.0,2021-01-05,True
1,2,30.0,80000.0,NaT,False
2,3,,85000.5,NaT,True
3,4,40.0,,2019-08-20,False


In [9]:
df.dtypes

id                    int64
age                 float64
salary              float64
hire_date    datetime64[ns]
active               object
dtype: object

üü¶ 7. Convert Boolean-like Column

In [12]:
df["active"] = df["active"].map({"True": True, "False": False})
df

Unnamed: 0,id,age,salary,hire_date,active
0,1,25.0,70000.0,2021-01-05,True
1,2,30.0,80000.0,NaT,False
2,3,,85000.5,NaT,True
3,4,40.0,,2019-08-20,False


In [13]:
df.dtypes

id                    int64
age                 float64
salary              float64
hire_date    datetime64[ns]
active                 bool
dtype: object

In [14]:
df = df.astype({
    "id": int,
    "active": bool
})

df.dtypes


id                    int64
age                 float64
salary              float64
hire_date    datetime64[ns]
active                 bool
dtype: object

üü¶ 8. Using convert_dtypes() (Modern Pandas)

In [15]:
df_auto = df.convert_dtypes()
df_auto.dtypes


id                    Int64
age                   Int64
salary              Float64
hire_date    datetime64[ns]
active              boolean
dtype: object

In [16]:
print("Original dtypes:\n", df.dtypes)
print("\nOptimized dtypes:\n", df_auto.dtypes)


Original dtypes:
 id                    int64
age                 float64
salary              float64
hire_date    datetime64[ns]
active                 bool
dtype: object

Optimized dtypes:
 id                    Int64
age                   Int64
salary              Float64
hire_date    datetime64[ns]
active              boolean
dtype: object


## ‚úÖ Summary ‚Äî Changing Data Types

In this subsection, you learned how to:

‚úî View column data types using `dtypes`  
‚úî Convert numeric columns using `astype()` and `pd.to_numeric()`  
‚úî Safely handle invalid values using `errors="coerce"`  
‚úî Convert date columns using `pd.to_datetime()`  
‚úî Convert string booleans into real boolean types  
‚úî Convert multiple columns at once  
‚úî Use `convert_dtypes()` for smart automatic conversion  
‚úî Identify failed conversions using `NaN` and `NaT`  


