# DataType Conversion
- to save memory
- to have correct type for analysis and modeling
- correct ops
- Num for ML models

DataTypes
1. int64, float64
2. object - string or mixed type
3. bool
4. datetime64[ns]
5. category

#### Checking DataType


In [1]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': ['25', '30', '35', '40'],    # stored as strings
    'Salary': [50000, 60000, 70000, 80000],  # numeric
    'JoiningDate': ['2020-01-15', '2019-07-23', '2021-03-10', '2018-11-05'],  # should be datetime
    'Department': ['HR', 'Finance', 'IT', 'Finance'],  # can be category
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Salary,JoiningDate,Department
0,Alice,25,50000,2020-01-15,HR
1,Bob,30,60000,2019-07-23,Finance
2,Charlie,35,70000,2021-03-10,IT
3,David,40,80000,2018-11-05,Finance


In [None]:
df.dtypes

Name           object
Age            object
Salary          int64
JoiningDate    object
Department     object
dtype: object

##### conversion needed here 
- ```Age``` should be int64 for calc
- ```JoiningDate``` should be datetime 
- ```Department``` should be category to save memory and for category bases ops

---
## Converting Data Types using ```astype()```

In [None]:
# A) 'Age' from obj -> int

df['Age'] = df['Age'].astype(int)  # converted it to int
df.dtypes

Index          132
Name           216
Age             16
Salary          32
JoiningDate    236
Department     270
dtype: int64


Name             object
Age               int32
Salary            int64
JoiningDate      object
Department     category
dtype: object

In [8]:
# B) 'Department' -> category

df['Department'] = df['Department'].astype('category')
df.dtypes

# print(df.memory_usage(deep=True)) = to see the reduced memoryusage

Name             object
Age               int32
Salary            int64
JoiningDate      object
Department     category
dtype: object

In [11]:
df['JoiningDate'] = pd.to_datetime(df['JoiningDate'])   # special way
df.dtypes

print(df['JoiningDate'].dt.year)
print(df['JoiningDate'].dt.month)
print(df['JoiningDate'].dt.day_name())


0    2020
1    2019
2    2021
3    2018
Name: JoiningDate, dtype: int32
0     1
1     7
2     3
3    11
Name: JoiningDate, dtype: int32
0    Wednesday
1      Tuesday
2    Wednesday
3       Monday
Name: JoiningDate, dtype: object


---
## Handling Conversion Errors

In [None]:
df['Age'] = df['Age'].astype(int)  # may throw ValueError

# errors='coerce' = this will convert the invalid parsing to NAN

In [None]:
df = df.infer_objects()
df.dtypes  # tries to convert obj -> more specific dt like int, float

Name                   object
Age                     int32
Salary                  int64
JoiningDate    datetime64[ns]
Department           category
dtype: object