### Categorical variables 
For category representation a Pandas object **Categorical** is used. Categorical variable can only make a value from a certain range.

Each categorical variable has its **own code** and it's recommended to **provide order of categories beforehand** because categories are sorted according to their **codes** not **values**
### Creating a Categorical type
- **pd.Categorical( [ categories ], categories = [ categories ] )** - from a list with categories. All **unique categories** will be found automatically by Pandas
- **pd.Series( [ categories ], dtype = 'category')** - Categorical variable creation using Series 
- **s = pd.Series( [ categories ] ); s = s.astype( 'category' )** - Categorical variable creation using **Series** and **astype()**
### Methods
- **v.categories** - returns categories (unique)
- **v.get_values()** - returns all values from object Categorical
- **v.sort_values()** - returns sorted list of the categories
- **v.codes** - returns codes of categorical variables
- **s.cat.categories** - returns the categories from Sereis which was created as **Categorical Series**
- **v.rename_categories( [ new_categories ] )** - renames the categories and returns a **copy** which must be saved
- **v.add_categories( [ new_categories ] )** - add new categories ( returns a **copy**)
- **v.remove_categories( [ deleting_categories ] )** - deletes provided categories
- **v.set_categories( [ new_categories ] )** - can add and delete new categories (returns a **copy**)
- **v.describe( )** - describes statistics 
- **v.value_counts( )** - counts number of observations in each category


In [16]:
# Categorical variable creation
categories = ['slow','fast','fast','medium','slow']
categories = pd.Categorical(categories,categories=['slow','medium','fast'])
print(categories)

# list categories
print('\n'+str(categories.categories))

#list values 
print('\n'+str(categories.get_values()))

# list codes of Categories
print('\n'+str(categories.codes)) 

# Let's return sorted list of the categories
print('\n'+str(categories.sort_values()))

[slow, fast, fast, medium, slow]
Categories (3, object): [slow, medium, fast]

Index(['slow', 'medium', 'fast'], dtype='object')

['slow' 'fast' 'fast' 'medium' 'slow']

[0 2 2 1 0]

[slow, slow, medium, fast, fast]
Categories (3, object): [slow, medium, fast]


  # Remove the CWD from sys.path while we load stuff.


In [22]:
# Categorical variable creation using Series
s = pd.Series(['slow','fast','fast','medium','slow'], dtype='category')
print(s)

# Categorical variable creation using Series and .astype()
s = pd.Series(['slow','fast','fast','medium','slow'])
s = s.astype('category')
print('\n'+str(s))

# List the categories 
print('\n'+str(s.cat.categories))

0      slow
1      fast
2      fast
3    medium
4      slow
dtype: category
Categories (3, object): [fast, medium, slow]

0      slow
1      fast
2      fast
3    medium
4      slow
dtype: category
Categories (3, object): [fast, medium, slow]

Index(['fast', 'medium', 'slow'], dtype='object')


In [44]:
# Rename Categories the first way
cat = pd.Categorical(['a','b','c','a'],categories=['a','b','c'])
cat.categories = ['bronze','silver','gold']
print(cat)

# Renmae the Categorries the Second Way
print('\n'+ str(cat.rename_categories(['x','y','z']))) # returns a copy!

# Add a new Category
with_new_category = cat.add_categories(['platinum'])
print('\n'+str(with_new_category))

# Delete a Category
deleted = cat.remove_categories(['bronze'])
print('\n'+str(deleted))

# Set Categorries
s = pd.Series(['one','two','three','four'],dtype='category')
s = s.cat.set_categories(['one','three'])
print('\n'+str(s))

[bronze, silver, gold, bronze]
Categories (3, object): [bronze, silver, gold]

[x, y, z, x]
Categories (3, object): [x, y, z]

[bronze, silver, gold, bronze]
Categories (4, object): [bronze, silver, gold, platinum]

[NaN, silver, gold, NaN]
Categories (2, object): [silver, gold]

0      one
1      NaN
2    three
3      NaN
dtype: category
Categories (2, object): [one, three]


bronze    2
silver    1
gold      1
dtype: int64

In [45]:
# Count number of observations in each category 
cat.value_counts()

#

bronze    2
silver    1
gold      1
dtype: int64