<a href="https://colab.research.google.com/github/Tanu-N-Prabhu/Python/blob/master/Data%20Analysis/Level%201/data_types_and_conversions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Types & Conversions in Pandas
Understanding and managing data types correctly is essential for accurate computations, memory efficiency, and proper data analysis. Pandas supports various data types such as integers, floats, strings (objects), categorical data, and datetime formats.

## 1. Common Pandas Data Types

| Data Type    | Description                               |
| ------------ | ----------------------------------------- |
| `int64`      | Integer numbers                           |
| `float64`    | Decimal numbers                           |
| `object`     | Text/string data (general Python objects) |
| `bool`       | Boolean values (`True`/`False`)           |
| `category`   | Categorical (finite, repeated values)     |
| `datetime64` | Date and time values                      |


### Example

In [1]:
import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Carol'],
    'Age': [25, 32, 40],
    'Score': [88.5, 92.0, 79.5],
    'Passed': ['Yes', 'No', 'Yes'],
    'ExamDate': ['2025-07-01', '2025-07-02', '2025-07-03']
})

df.dtypes

Unnamed: 0,0
Name,object
Age,int64
Score,float64
Passed,object
ExamDate,object


## 2. Type Conversions Using .astype()
### Convert to Integer or Float

In [3]:
df['Age'] = df['Age'].astype('int64')
df['Score'] = df['Score'].astype('float64')
df['Age']

Unnamed: 0,Age
0,25
1,32
2,40


In [4]:
df['Score'  ]

Unnamed: 0,Score
0,88.5
1,92.0
2,79.5


### Convert String to Categorical (saves memory)

In [5]:
df['Passed'] = df['Passed'].astype('category')
df['Passed']

Unnamed: 0,Passed
0,Yes
1,No
2,Yes


### Convert to Boolean

In [6]:
df['PassedBool'] = df['Passed'].map({'Yes': True, 'No': False})
df['PassedBool']

Unnamed: 0,PassedBool
0,True
1,False
2,True


## 3. Converting to DateTime

Working with time-series or dates? Convert string columns to datetime:

In [7]:
df['ExamDate'] = pd.to_datetime(df['ExamDate'])
df['ExamDate'].dt.year  # Access datetime components

Unnamed: 0,ExamDate
0,2025
1,2025
2,2025


## 4. Error Handling in Conversion
You can control how Pandas handles errors during conversion:

In [8]:
# Force conversion, invalid parsing returns NaT
df['ExamDate'] = pd.to_datetime(df['ExamDate'], errors='coerce')
df['ExamDate']

Unnamed: 0,ExamDate
0,2025-07-01
1,2025-07-02
2,2025-07-03


## Summary

| Conversion Task            | Method                               |
| -------------------------- | ------------------------------------ |
| String to int/float        | `.astype('int')`, `.astype('float')` |
| String to category         | `.astype('category')`                |
| String to boolean          | `.map()` or `.astype('bool')`        |
| String to datetime         | `pd.to_datetime()`                   |
| Handling conversion errors | `errors='coerce'`                    |
