# Unique Values, Value Counts, and Membership

In [1]:
import pandas as pd
from pandas import DataFrame , Series
import numpy as np

Another class of related methods extracts information about the values contained in a
one-dimensional Series. To illustrate these, consider this example:

In [2]:
obj = Series(['c', 'a', 'd', 'a', 'a', 'b', 'b', 'c', 'c'])

In [4]:
obj

0    c
1    a
2    d
3    a
4    a
5    b
6    b
7    c
8    c
dtype: object

The first function is unique, which gives you an array of the unique values in a Series:

In [8]:
uniques = obj.unique()
uniques.sort()
uniques

array(['a', 'b', 'c', 'd'], dtype=object)

The unique values are not necessarily returned in sorted order, but could be sorted after
the fact if needed (uniques.sort()). Relatedly, value_counts computes a Series containing
value frequencies:

In [9]:
obj.value_counts()

a    3
c    3
b    2
d    1
dtype: int64

The Series is sorted by value in descending order as a convenience. value_counts is also
available as a top-level pandas method that can be used with any array or sequence:

In [10]:
pd.value_counts(obj.values, sort=False)

c    3
b    2
a    3
d    1
dtype: int64

Lastly, isin is responsible for vectorized set membership and can be very useful in
filtering a data set down to a subset of values in a Series or column in a DataFrame:

In [12]:
mask = obj.isin(['b', 'c'])
mask

0     True
1    False
2    False
3    False
4    False
5     True
6     True
7     True
8     True
dtype: bool

In [13]:
obj[mask]

0    c
5    b
6    b
7    c
8    c
dtype: object

Table 5-11. Unique, value counts, and binning methods

Method Description

isin Compute boolean array indicating whether each Series value is contained in the passed sequence of values.
unique Compute array of unique values in a Series, returned in the order observed.
value_counts Return a Series containing unique values as its index and frequencies as its values, ordered count in
descending order.

In some cases, you may want to compute a histogram on multiple related columns in
a DataFrame. Here’s an example:

In [15]:
data = DataFrame({'Qu1': [1, 3, 4, 3, 4],'Qu2': [2, 3, 1, 2, 3],'Qu3': [1, 5, 2, 4, 4]})
data

Unnamed: 0,Qu1,Qu2,Qu3
0,1,2,1
1,3,3,5
2,4,1,2
3,3,2,4
4,4,3,4


Passing pandas.value_counts to this DataFrame’s apply function gives:

In [20]:
result = data.apply(pd.value_counts).fillna(0)
result

Unnamed: 0,Qu1,Qu2,Qu3
1,1.0,1.0,1.0
2,0.0,2.0,1.0
3,2.0,2.0,0.0
4,2.0,0.0,2.0
5,0.0,0.0,1.0
