# Understanding Data Types and Type Conversion in Pandas

### What Are Data Types in Pandas?

When we load a dataset using Pandas, each column automatically gets assigned a **data type** based on its contents. These data types (also called `dtypes`) define how Pandas stores and processes the values inside each column. The main types we encounter are:

- `int64`: for whole numbers (e.g., 1, 2, 100)
- `float64`: for decimal numbers (e.g., 3.14, 75.5)
- `object`: usually for text or mixed types (e.g., names, categories)
- `bool`: for `True` or `False` values
- `datetime64`: for date and time information

Understanding data types is **essential** in data science because they control how we clean, analyze, and transform our data. For example, if we try to calculate the average of a column marked as `object`, it will fail — even if the values inside look like numbers. Also, machine learning models can only accept numerical or encoded inputs, so converting types is a key step in data preparation.

By learning how to **view**, **interpret**, and **convert** data types properly, we build cleaner, faster, and more reliable pipelines. The Titanic dataset is a great place to practice this, as it contains a mix of numeric, categorical, and missing values.

### Checking Data Types with `.dtypes`

We can use `.dtypes` to inspect the data type of each column in our DataFrame. This gives us a quick overview of how Pandas has interpreted the dataset. Let’s use it on the Titanic dataset:

In [1]:
import pandas as pd

df = pd.read_csv("data/train.csv")
print(df.dtypes)

PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object


### Converting Data Types with `.astype()`

Sometimes, Pandas guesses the wrong data type, or we want to convert it manually — for example, turning a float into an integer, or a numeric column into a category. We can use the `.astype()` method to do this.

In [2]:
# Convert 'Pclass' to string (useful for treating it like a category)
df['Pclass'] = df['Pclass'].astype(str)

# Convert 'Survived' to boolean
df['Survived'] = df['Survived'].astype(bool)

# Convert 'Age' to integer (only if we know there are no decimals or missing values)
df['Age'] = df['Age'].fillna(0).astype(int)

print(df.dtypes)

PassengerId      int64
Survived          bool
Pclass          object
Name            object
Sex             object
Age              int64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object


This approach gives us full control over how columns are interpreted. But we must be careful — converting without handling `NaN` values or mismatched types can throw errors.

### Converting to Datetime with `pd.to_datetime()`

When working with date columns (like timestamps, birthdates, or logs), we use the `pd.to_datetime()` function to convert a column to the `datetime64` type.

In [3]:
# Add a sample 'DOB' column with string dates
df['DOB'] = ['1980-01-01', '1975-05-23', '1990-07-15', '1985-12-30', '1978-03-10'] + ['1982-06-18'] * (len(df) - 5)

# Convert 'DOB' to datetime
df['DOB'] = pd.to_datetime(df['DOB'])

print(df[['Name', 'DOB']].head())

                                                Name        DOB
0                            Braund, Mr. Owen Harris 1980-01-01
1  Cumings, Mrs. John Bradley (Florence Briggs Th... 1975-05-23
2                             Heikkinen, Miss. Laina 1990-07-15
3       Futrelle, Mrs. Jacques Heath (Lily May Peel) 1985-12-30
4                           Allen, Mr. William Henry 1978-03-10


This conversion unlocks new possibilities — we can extract day, month, year, weekday, and perform date-based calculations.

### Detecting Columns by Type

Sometimes we want to know **how many columns are int, float, or object** types. We can do this with `.select_dtypes()` and `.dtypes.value_counts()`:

In [4]:
# Count how many columns are of each data type
print(df.dtypes.value_counts())

# OR select specific types
int_cols = df.select_dtypes(include='int64')
print("Integer columns:\n", int_cols.columns)

object            6
int64             4
bool              1
float64           1
datetime64[ns]    1
Name: count, dtype: int64
Integer columns:
 Index(['PassengerId', 'Age', 'SibSp', 'Parch'], dtype='object')


### Exercises

Q1. Load the Titanic dataset and print the data types of all columns.

In [5]:
data = pd.read_csv("data/train.csv")
print(data.dtypes)

PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object


Q2. Convert the 'Pclass' column to a string type.

In [6]:
data['Pclass'] = data['Pclass'].astype(str)
print(data['Pclass'].head())

0    3
1    1
2    3
3    1
4    3
Name: Pclass, dtype: object


Q3. Convert the 'Survived' column to a boolean type.

In [7]:
data['Survived'] = data['Survived'].astype(bool)
print(data['Survived'].head())

0    False
1     True
2     True
3     True
4    False
Name: Survived, dtype: bool


Q4. Convert the 'Age' column to integers (first fill missing values with 0).

In [8]:
data['Age'] = data['Age'].fillna(0).astype(int)
print(data['Age'].head())

0    22
1    38
2    26
3    35
4    35
Name: Age, dtype: int64


Q5. Print the count of columns by data type.

In [9]:
print(data.dtypes.value_counts())

object     6
int64      4
bool       1
float64    1
Name: count, dtype: int64


Q6. Print the names of all float-type columns.

In [10]:
float_columns = data.select_dtypes(include='float64').columns
print(float_columns)

Index(['Fare'], dtype='object')


### Summary

In this topic, we explored one of the most fundamental yet often overlooked areas of working with Pandas — **data types**. Every column in a DataFrame has a `dtype`, and knowing what it is (and why it matters) helps us avoid unexpected bugs and incorrect results.

We learned that `.dtypes` gives us a full snapshot of our dataset’s structure. We used `.astype()` to manually convert between `int`, `float`, `bool`, and `object`, and we practiced safe type conversion using `fillna()` to handle missing values. We also explored how to analyze our dataset by type using `.select_dtypes()` and `.dtypes.value_counts()`.

Mastering these techniques ensures that our data is ready for the next steps — including encoding, scaling, modeling, and visualization. In real AI/ML projects, cleaning and converting data types is a task we do all the time. The better we get at this, the faster and cleaner our workflows become.