In [1]:
import pandas as pd

### Pandas Value Counts

Pandas Value Counts will count the frequency of the unique values in your series. Or simply, "count how many each value occurs."

We will run through 3 examples:
1. Counting frequency of unique values in a series
2. Counting *relative* frequency of unique values in a series (normalizing)
3. Counting a continuous series using bins.

First, let's create our DataFrame

In [2]:
df = pd.DataFrame([('Foreign Cinema', 'Restaurant', 289.0),
                   ('Liho Liho', 'Restaurant', 224.0),
                   ('500 Club', 'bar', 80.5),
                   ('The Square', 'bar', 25.30),
                   ('Liho Liho', 'Restaurant', 124.0),
                   ('The Square', 'bar', 53.30),
                   ('Liho Liho', 'Restaurant', 324.0),
                   ('500 Club', 'bar', 40.5),
                   ('Salzburg', 'bar', 123.5)],
           columns=('name', 'type', 'AvgBill')
                 )
df

Unnamed: 0,name,type,AvgBill
0,Foreign Cinema,Restaurant,289.0
1,Liho Liho,Restaurant,224.0
2,500 Club,bar,80.5
3,The Square,bar,25.3
4,Liho Liho,Restaurant,124.0
5,The Square,bar,53.3
6,Liho Liho,Restaurant,324.0
7,500 Club,bar,40.5
8,Salzburg,bar,123.5


### Counting frequency of unique values in a series
Then let's call value_counts on our "name" column. This will look at the distinct values within that column, and count how many times they appear.

In [3]:
df['name'].value_counts()

Liho Liho         3
500 Club          2
The Square        2
Foreign Cinema    1
Salzburg          1
Name: name, dtype: int64

We could also have the series returned in reverse order (lowest values first) by setting ascending=True. Remember, ascending means to go up, so you'll start low and go up to the higest values

In [4]:
df['name'].value_counts(ascending=True)

Salzburg          1
Foreign Cinema    1
The Square        2
500 Club          2
Liho Liho         3
Name: name, dtype: int64

### Counting relative frequency of unique values in a series (normalizing)
Say you didn't want to get the count of each unique value, but rather see how frequent each value appears compared to the *whole series.* In order to do this, you'll set normalize=True

In [5]:
df['name'].value_counts(normalize=True)

Liho Liho         0.333333
500 Club          0.222222
The Square        0.222222
Foreign Cinema    0.111111
Salzburg          0.111111
Name: name, dtype: float64

Let's break this down quickly. There are a total of 9 items in the Series (run "len(df)" if you don't believe me.)

From value_counts above, we saw that "Liho Liho" appeared 3 times. Since it appears 3 times out of 9 rows, we can do 3 / 9 which equals .333. This is the relative frequency of "Liho Liho" in this series

### Counting relative frequency of unique values in a series (normalizing)

Now let's say we have a longer series of continous values. Think of a continous values as a list of numbers that don't serve as labels. For example: [.2, ,.23, .43, .85, .13]. Say we thought that .2 and .23 were close enough and wanted to count them together. Unfortunately, if we did value_counts regularly, we would count .2 and .23 as separate values.

If you want to group them together, this is where *bins* comes in. In order to create a list of random continuous numbers, I'm going to use numpy

In [14]:
import numpy as np
np.random.seed(seed=42) # To make sure the same values appear each time

random_numbers = np.random.random(size=(10,1), )
random_numbers = pd.DataFrame(random_numbers, columns=['rand_num'])
random_numbers

Unnamed: 0,rand_num
0,0.37454
1,0.950714
2,0.731994
3,0.598658
4,0.156019
5,0.155995
6,0.058084
7,0.866176
8,0.601115
9,0.708073


Now I want split my data into 3 bins and count how many times values appear in those bins.

In [15]:
random_numbers['rand_num'].value_counts(bins=3)

(0.653, 0.951]     4
(0.356, 0.653]     3
(0.0562, 0.356]    3
Name: rand_num, dtype: int64

In this case, bins is returning buckets that are evenly spaced. But what if you wanted to create your own buckets? No problem, just pass a list of values that describe your buckets

In [17]:
random_numbers['rand_num'].value_counts(bins=[0,.2,.6, 1])

(0.6, 1.0]       5
(-0.001, 0.2]    3
(0.2, 0.6]       2
Name: rand_num, dtype: int64