# ðŸ““ Lesson 6: Changing Data Types and Using Categorical Data
ðŸ“˜ What you will learn:
1. How to check and change column data types
2. How to use astype() to convert types
3. How to work with category data to save memory
4. How to convert dates to datetime type

## ðŸ§ª Step 1: Load the Dataset
Weâ€™ll again use Sales_January_2019.csv from the data/ folder.


In [None]:
import pandas as pd

# Load dataset
df = pd.read_csv('../data/Sales_January_2019.csv')

# Show data types
print(df.dtypes)

ðŸ“Œ This tells you what type each column currently is (e.g., object, int64, float64).

## ðŸ”„ Step 2: Convert Strings to Numbers
Some columns like Quantity Ordered may be read as strings. We need to convert them:

In [None]:
# Convert to numeric, set errors='coerce' to handle bad values
df['Quantity Ordered'] = pd.to_numeric(df['Quantity Ordered'], errors='coerce')
df['Price Each'] = pd.to_numeric(df['Price Each'], errors='coerce')

# Drop rows where conversion failed and became NaN
df = df.dropna(subset=['Quantity Ordered', 'Price Each'])

# Check types
print(df.dtypes)

ðŸ§¼ Clean NaN values after conversion:


In [None]:
df = df.dropna(subset=['Quantity Ordered', 'Price Each'])

## ðŸ§­ Step 3: Convert to Categorical Data
If a column contains a small number of repeated values (e.g., city names or products), you can use category type to save memory:

In [None]:
# Before
print(df['Product'].memory_usage(deep=True))

# Convert to category
df['Product'] = df['Product'].astype('category')

# After
print(df['Product'].memory_usage(deep=True))


ðŸ§  What does deep=True do?

By default (deep=False), memory_usage() only shows the shallow memory used â€“ the basic structure of the column.

When you set deep=True, Pandas calculates the true memory usage, including the actual memory consumed by strings or objects inside the column.

This is especially useful for columns with object or string data types.

In [None]:
df = pd.DataFrame({
    'Product': ['iPhone', 'iPhone', 'MacBook', 'iPhone', 'MacBook']
})

print("Without deep:", df['Product'].memory_usage(deep=False))
print("With deep:", df['Product'].memory_usage(deep=True))


As you can see, the memory with deep=True is more accurate because it includes the content of each string, not just the references.

ðŸ“Œ You can also check:

In [None]:
print(df['Product'].value_counts())

## ðŸ—“ Step 4: Convert Dates to datetime
To work with dates (e.g., filtering, sorting, grouping), convert them using pd.to_datetime():

In [None]:
df['Order Date'] = pd.to_datetime(df['Order Date'], errors='coerce')

# Check result
print(df['Order Date'].head())

You can now extract date parts:

In [None]:
df['Month'] = df['Order Date'].dt.month
df['Hour'] = df['Order Date'].dt.hour

print(df)