# Pandas `value_counts` tricks

This is a notebook for the medium article [9 Pandas value_counts() tricks to improve your data analysis](https://bindichen.medium.com/9-pandas-value-counts-tricks-to-improve-your-data-analysis-7980a2b46536)

Please check out article for instructions

**License**: [BSD 2-Clause](https://opensource.org/licenses/BSD-2-Clause)

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv(
    'titanic_train.csv'
)
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## 1. Default Parameters

In [3]:
df['Embarked'].value_counts()

S    644
C    168
Q     77
Name: Embarked, dtype: int64

## 2. Sort results in ascending order

In [4]:
df['Embarked'].value_counts(ascending = True)

Q     77
C    168
S    644
Name: Embarked, dtype: int64

## 3. Sort results alphabetically

In [5]:
df['Embarked'].value_counts(ascending = True).sort_index(ascending=True)

C    168
Q     77
S    644
Name: Embarked, dtype: int64

In [6]:
df['Embarked'].value_counts(ascending = True).sort_index(ascending=False)

S    644
Q     77
C    168
Name: Embarked, dtype: int64

## 4. Include `NA` in the result

In [7]:
df['Embarked'].value_counts(dropna=False)

S      644
C      168
Q       77
NaN      2
Name: Embarked, dtype: int64

## 5. Show result in a percentage count

In [8]:
df['Embarked'].value_counts(normalize = True)

S    0.724409
C    0.188976
Q    0.086614
Name: Embarked, dtype: float64

In [9]:
pd.set_option('display.float_format', '{:.2%}'.format)

# Note: By calling the statement, you will update Pandas default display settings 
# and apply to all float values. To reset it, you can call
#      pd.reset_option('display.float_format')

In [10]:
df['Embarked'].value_counts(normalize = True)

S   72.44%
C   18.90%
Q    8.66%
Name: Embarked, dtype: float64

In [11]:
pd.reset_option('display.float_format')

In [12]:
# Thanks for David B Rosen's advice (https://dabruro.medium.com/)
#
# Instead of Pandas display option, which would change the display of all float values, 
# you can can simply do this: 
df['Embarked'].value_counts(normalize = True).to_frame().style.format('{:.2%}')

Unnamed: 0,Embarked
S,72.44%
C,18.90%
Q,8.66%


## 6. Bin continuous data into discrete intervals

In [13]:
df['Fare'].value_counts(bins = 3)

(-0.513, 170.776]     871
(170.776, 341.553]     17
(341.553, 512.329]      3
Name: Fare, dtype: int64

In [14]:
df['Fare'].value_counts(bins = [-1, 20, 100, 550])

(-1.001, 20.0]    515
(20.0, 100.0]     323
(100.0, 550.0]     53
Name: Fare, dtype: int64

## 7. Group by and perform `value_counts()`

In [15]:
df.groupby('Embarked')['Sex'].value_counts()

Embarked  Sex   
C         male       95
          female     73
Q         male       41
          female     36
S         male      441
          female    203
Name: Sex, dtype: int64

## 8. Convert resulting Series into a DataFrame

In [16]:
df.groupby('Embarked')['Sex'].value_counts().to_frame()

Unnamed: 0_level_0,Unnamed: 1_level_0,Sex
Embarked,Sex,Unnamed: 2_level_1
C,male,95
C,female,73
Q,male,41
Q,female,36
S,male,441
S,female,203


## 9. Apply to a DataFrame

In [17]:
df = pd.DataFrame({
    'num_legs': [2, 4, 4, 6],
    'num_wings': [2, 0, 0, 0]},
    index=['falcon', 'dog', 'cat', 'ant']
)
df

Unnamed: 0,num_legs,num_wings
falcon,2,2
dog,4,0
cat,4,0
ant,6,0


In [18]:
df.value_counts()

num_legs  num_wings
4         0            2
6         0            1
2         2            1
dtype: int64

In [19]:
df.value_counts().to_frame()

Unnamed: 0_level_0,Unnamed: 1_level_0,0
num_legs,num_wings,Unnamed: 2_level_1
4,0,2
6,0,1
2,2,1


### Thanks for reading

This is a notebook for the medium article [9 Pandas value_counts() tricks to improve your data analysis](https://bindichen.medium.com/9-pandas-value-counts-tricks-to-improve-your-data-analysis-7980a2b46536)

Please check out article for instructions