Often in real-time, data includes the text columns, which are repetitive. Features like gender, country, and codes are always repetitive. These are the examples for categorical data.

Categorical variables can take on only a limited, and usually fixed number of possible values. Besides the fixed length, categorical data might have an order but cannot perform numerical operation. Categorical are a Pandas data type.

The categorical data type is useful in the following cases −

* A string variable consisting of only a few different values. Converting such a string variable to a categorical variable will save some memory.

* The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). By converting to a categorical and specifying an order on the categories, sorting and min/max will use the logical order instead of the lexical order.

* As a signal to other python libraries that this column should be treated as a categorical variable (e.g. to use suitable statistical methods or plot types).

In [2]:
import pandas as pd

# Object Creation

````python
pd.Categorical(
    values,
    categories=None,
    ordered=None,
    dtype=None,
    fastpath=False,
)
```

In [8]:
ranks = pd.Categorical(['Dau De', 'Dau De', 'Dau Ton', 'Dau Thanh'], 
                       categories = ['Dau Vuong', 'Dau Tong', 'Dau Ton', 'Dau Thanh', 'Dau De'],
                       ordered = True)
ranks

[Dau De, Dau De, Dau Ton, Dau Thanh]
Categories (5, object): [Dau Vuong < Dau Tong < Dau Ton < Dau Thanh < Dau De]

In [9]:
nums = pd.Categorical(['Three', 'Fourth', 'Fifth', 'Two'], categories = ['One', 'Two', 'Three'], ordered = True)
nums

[Three, NaN, NaN, Two]
Categories (3, object): [One < Two < Three]

In [14]:
#Series with category dype
nums_series = pd.Series(['Three', 'Fourth', 'Fifth', 'Two'], dtype = 'category')
nums_series

0     Three
1    Fourth
2     Fifth
3       Two
dtype: category
Categories (4, object): [Fifth, Fourth, Three, Two]

# Properties

```python
categories : Index
    The categories of this categorical
codes : ndarray
    The codes (integer positions, which point to the categories) of this
    categorical, read only.
ordered : boolean
    Whether or not this Categorical is ordered.
dtype : CategoricalDtype
    The instance of ``CategoricalDtype`` storing the ``categories``
    and ``ordered``.
    ```


## categories

In [19]:
ranks.categories

Index(['Dau Vuong', 'Dau Tong', 'Dau Ton', 'Dau Thanh', 'Dau De'], dtype='object')

In [22]:
#Series.cat to access Categorical Object
nums_series.cat.categories

Index(['Fifth', 'Fourth', 'Three', 'Two'], dtype='object')

## ordered

In [23]:
ranks.ordered

True

In [25]:
nums_series.cat.ordered

False

## dtype

In [26]:
ranks.dtype

CategoricalDtype(categories=['Dau Vuong', 'Dau Tong', 'Dau Ton', 'Dau Thanh', 'Dau De'], ordered=True)

## codes

In [28]:
ranks.codes

array([4, 4, 2, 3], dtype=int8)

# Adding a new category

```python
Categorical.add_categories(new_categories, inplace=False)
```

In [31]:
ranks.add_categories(['Dau Linh', 'Dau Hoang'], inplace = True)
ranks

[Dau De, Dau De, Dau Ton, Dau Thanh]
Categories (7, object): [Dau Vuong < Dau Tong < Dau Ton < Dau Thanh < Dau De < Dau Linh < Dau Hoang]

# Removing categories

```python
Categorical.remove_categories(removals, inplace=False)
```

In [35]:
ranks.remove_categories(['Dau Thanh', 'Dau Linh'], inplace = True)
ranks

[Dau De, Dau De, Dau Ton, NaN]
Categories (5, object): [Dau Vuong < Dau Tong < Dau Ton < Dau De < Dau Hoang]

In [36]:
ranks.categories

Index(['Dau Vuong', 'Dau Tong', 'Dau Ton', 'Dau De', 'Dau Hoang'], dtype='object')

# Rename categories

In [37]:
nums

[Three, NaN, NaN, Two]
Categories (3, object): [One < Two < Three]

In [38]:
nums.categories = ['Mot', 'Hai', 'Ba']
nums

[Ba, NaN, NaN, Hai]
Categories (3, object): [Mot < Hai < Ba]

In [39]:
nums_series

0     Three
1    Fourth
2     Fifth
3       Two
dtype: category
Categories (4, object): [Fifth, Fourth, Three, Two]

In [40]:
nums_series.cat.categories = ['5', '4', '3', '2']
nums_series

0    3
1    4
2    5
3    2
dtype: category
Categories (4, object): [5, 4, 3, 2]

# Comparison of Categorical Data

In [41]:
cats = ['One', 'Two', 'Three', 'Four']
values_1 = pd.Categorical(['One', 'Three', 'Two'], categories=cats, ordered = True)
values_2 = pd.Categorical(['Four', 'One', 'One'], categories=cats, ordered=True)

values_1 < values_2

array([ True, False, False])