## Understanding Categorical Data in Pandas  
**Name:** Taskeen Hussain  
**Email Address:** taskeenuaf@gmail.com

### Categorical data is a type of data that represents discrete categories or labels. In pandas, it can be efficiently managed using the `Categorical` type, improving both performance and memory usage.

In [1]:
# ### Object Creation

import pandas as pd

s = pd.Series(["a","b","c","a"], dtype="category")
print (s)

0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): ['a', 'b', 'c']


In [2]:
# ### pd.Categorical
import pandas as pd

cat = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'])
print (cat)

import pandas as pd

cat = cat=pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a'])
print (cat)


['a', 'b', 'c', 'a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']
['a', 'b', 'c', 'a', 'b', 'c', NaN]
Categories (3, object): ['c', 'b', 'a']


In [3]:

cat = cat=pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a'],ordered=True)
print (cat)

['a', 'b', 'c', 'a', 'b', 'c', NaN]
Categories (3, object): ['c' < 'b' < 'a']


In [4]:
### Description
import pandas as pd
import numpy as np

cat = pd.Categorical(["a", "c", "c", np.nan], categories=["b", "a", "c"])
df = pd.DataFrame({"cat":cat, "s":["a", "c", "c", np.nan]})

print (df.describe())
print (df["cat"].describe())

       cat  s
count    3  3
unique   2  2
top      c  c
freq     2  2
count     3
unique    2
top       c
freq      2
Name: cat, dtype: object


In [5]:
# ### Get the Properties of the Category
import pandas as pd
import numpy as np

s = pd.Categorical(["a", "c", "c", np.nan], categories=["b", "a", "c"])
print (s.categories)

Index(['b', 'a', 'c'], dtype='object')


In [6]:
cat = pd.Categorical(["a", "c", "c", np.nan], categories=["b", "a", "c"])
print (cat.ordered)

False


In [9]:
import pandas as pd

# Create a Series with categorical data
s = pd.Series(["a", "b", "c", "a"], dtype="category")

# Rename the categories
s = s.cat.rename_categories(["Group %s" % g for g in s.cat.categories])

# Print the updated categories
print(s.cat.categories)


Index(['Group a', 'Group b', 'Group c'], dtype='object')


In [10]:
 ### Appending New Categories
import pandas as pd

s = pd.Series(["a","b","c","a"], dtype="category")
s = s.cat.add_categories([4])
print (s.cat.categories)

Index(['a', 'b', 'c', 4], dtype='object')


In [11]:
# ### Removing Categories
import pandas as pd

s = pd.Series(["a","b","c","a"], dtype="category")
print ("Original object:")
print (s)

print ("After removal:")
print (s.cat.remove_categories("a"))


Original object:
0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): ['a', 'b', 'c']
After removal:
0    NaN
1      b
2      c
3    NaN
dtype: category
Categories (2, object): ['b', 'c']
