In [1]:
import pandas as pd

# Changing Data Types

In [2]:
# sample DataFrame
df = pd.DataFrame({
    "Age": ["21", "25", "23", "30"],
    "Salary": ["50000", "65000", "55000", "70000"], 
    "Joined": ["2023-01-01", "2022-05-10", "2023-07-15", "2024-02-20"]
})

df

Unnamed: 0,Age,Salary,Joined
0,21,50000,2023-01-01
1,25,65000,2022-05-10
2,23,55000,2023-07-15
3,30,70000,2024-02-20


In the above sample DataFrame the "Age" and and "Salary" column has string data type but in it it has a number. 

## 1. Check Data Types

In [3]:
df.dtypes

Age       object
Salary    object
Joined    object
dtype: object

## 2. Convert data type with -> `.astype()`

`.astype(datatype)` It changes the datatype.  
In belows ex the Age column changed from object to int

In [8]:
df["Age"] = df["Age"].astype(int)
df.dtypes

Age        int64
Salary    object
Joined    object
dtype: object

## 3. Convert entire dataframe

In [10]:
df = df.astype({
    "Age": int,
    "Salary": float
})

In [11]:
df.dtypes

Age         int64
Salary    float64
Joined     object
dtype: object

## 4. Convert to string

In [13]:
df["Age"] = df["Age"].astype(str)

In [14]:
df.dtypes

Age        object
Salary    float64
Joined     object
dtype: object

## 5. Convert to category type (memory saving)

Catrgory type is usefull when values in the column repeat like eg. City, Gender, .....

In [15]:
df["Age"] = df["Age"].astype("category")

In [16]:
df.dtypes

Age       category
Salary     float64
Joined      object
dtype: object

## 6. Convert to bool

In [21]:
df["isTrue"] = ["True", "False", "True", "False"]

In [23]:
df.dtypes

Age       category
Salary     float64
Joined      object
isTrue      object
dtype: object

In [24]:
df["isTrue"] = df["isTrue"].astype(bool)

In [25]:
df.dtypes

Age       category
Salary     float64
Joined      object
isTrue        bool
dtype: object

## Trying 3rd

In [26]:
df = pd.DataFrame({
    "Age": ["21", "25", "23", "30"],       # strings but should be numbers
    "Salary": ["50000", "65000", "55000", "70000"],  # also strings
    "Joined": ["2023-01-01", "2022-05-10", "2023-07-15", "2024-02-20"],
    "isTrue": ["True", "False", "True", "False"]
})

In [27]:
df = df.astype({
    "Age": int,
    "Salary": float,
    "isTrue": bool
})

In [28]:
df.dtypes

Age         int64
Salary    float64
Joined     object
isTrue       bool
dtype: object

## 7. What if conversion fails? (errors)

What if the column have invalid values like "?", "NN", ...

In [29]:
df["New"] = ["1", "2", "3", "?"]

In [32]:
# df["New"] = df["New"].astype(int)
# This generates error
# so insted of that 

df["New"] = pd.to_numeric(df["New"], errors="coerce")

In [33]:
df["New"]

0    1.0
1    2.0
2    3.0
3    NaN
Name: New, dtype: float64