# ðŸ“Œ Pandas `category` (Short Notes)

- `category` is a **pandas dtype** for categorical variables.
- It is **not a string** and **not one-hot encoded**.
- Internally it stores:
  - **unique categories**
  - **integer codes**

Example:
- Categories: ['Male', 'Female']
- Codes: [0, 1, 0]

### Why use `category`?
- Uses **less memory**
- Faster operations (grouping, counting)
- Keeps labels

### Convert to category:
```python
df["Gender"] = df["Gender"].astype("category")


In [5]:
import sys
from pathlib import Path
import pandas as pd

sys.path.append(str(Path("/home/abdullah/Desktop/EDA-arsenal")))

from datasets import Loader

df = Loader().load_titanicDataset()

In [9]:
df.info()
df["Sex"]

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


0        male
1      female
2      female
3      female
4        male
        ...  
886      male
887    female
888    female
889      male
890      male
Name: Sex, Length: 891, dtype: object

In [10]:
df["Sex"]=df["Sex"].astype("category")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   PassengerId  891 non-null    int64   
 1   Survived     891 non-null    int64   
 2   Pclass       891 non-null    int64   
 3   Name         891 non-null    object  
 4   Sex          891 non-null    category
 5   Age          714 non-null    float64 
 6   SibSp        891 non-null    int64   
 7   Parch        891 non-null    int64   
 8   Ticket       891 non-null    object  
 9   Fare         891 non-null    float64 
 10  Cabin        204 non-null    object  
 11  Embarked     889 non-null    object  
dtypes: category(1), float64(2), int64(5), object(4)
memory usage: 77.7+ KB
