In [27]:
import pandas as pd
import numpy as np

### Pandas Rank

Pandas ranks is a simple but helpful function that will rank your data points in relation with each other. Not only will it apply to an entire Series, but you can also use it in a group by as an aggregate function.

We will run through 3 examples:
1. "Hello World" of Pandas Rank
2. Ranking Ascending True/False
3. Ranking with different methods
4. Ranking via pct
5. Ranking with Group By

But first, let's create our DataFrame

In [36]:
np.random.seed(seed=42)

df = pd.DataFrame(data=np.random.normal(loc=100, scale=50, size=(8,2)),
                  columns=('Parks', 'Schools'),
                  index=['San Francisco', 'San Diego', 'Los Angeles', \
                       'New York', 'Chicago', 'Denver', 'Seattle', 'Portland']
                 )
df = df.astype(int)
df

Unnamed: 0,Parks,Schools
San Francisco,124,93
San Diego,132,176
Los Angeles,88,88
New York,178,138
Chicago,76,127
Denver,76,76
Seattle,112,4
Portland,13,71


### 1. "Hello World" of Pandas Rank

Let's start off with a simple example to see how ranks works. Generally we call .rank() on a Series. Rarely do we want to get ranks for all DataFrame values, but you may.

To demonstrate, I'll copy my original DataFrame, then attach a rank column.

In [38]:
df_copy = df.copy()
df_copy['park_rank'] = df_copy['Parks'].rank()
df_copy

Unnamed: 0,Parks,Schools,park_rank
San Francisco,124,93,6.0
San Diego,132,176,7.0
Los Angeles,88,88,4.0
New York,178,138,8.0
Chicago,76,127,2.5
Denver,76,76,2.5
Seattle,112,4,5.0
Portland,13,71,1.0


In [40]:
df_copy = df.copy()
df_copy.rank()

Unnamed: 0,Parks,Schools
San Francisco,6.0,5.0
San Diego,7.0,8.0
Los Angeles,4.0,4.0
New York,8.0,7.0
Chicago,2.5,6.0
Denver,2.5,3.0
Seattle,5.0,1.0
Portland,1.0,2.0


### 2. Ranking Ascending True/False

Notice how the lowest numbers have the lowest ranks? That's not usually how my brain works. It more intuitive to me to have the higest numbers have the lowest rank (Ex: Highest numbers are ranked #1). To do this, set ascending=False.

In [41]:
df_copy = df.copy()
df_copy['park_rank'] = df_copy['Parks'].rank(ascending=False)
df_copy

Unnamed: 0,Parks,Schools,park_rank
San Francisco,124,93,3.0
San Diego,132,176,2.0
Los Angeles,88,88,5.0
New York,178,138,1.0
Chicago,76,127,6.5
Denver,76,76,6.5
Seattle,112,4,4.0
Portland,13,71,8.0


### 3. Ranking With Different Methods

Let's say that you had a group of identical values. How would you want to rank them? Let's explore a few different methods we can choose.

To see a list of methods and how they affect ranks, check out [our post.](https://dataindependent.com/pandas/pandas-rank)

First I need a DataFrame with similar values.

In [55]:
df2 = pd.DataFrame([1,2,3,4,5,3,5,6,7,7,9], columns=['Sample']).sort_values(by='Sample')
df2

Unnamed: 0,Sample
0,1
1,2
2,3
5,3
3,4
4,5
6,5
7,6
8,7
9,7


In [57]:
df2['average_rank'] = df2['Sample'].rank(method='average')
df2['min_rank'] = df2['Sample'].rank(method='min')
df2['max_rank'] = df2['Sample'].rank(method='max')
df2['first_rank'] = df2['Sample'].rank(method='first')
df2['dense_rank'] = df2['Sample'].rank(method='dense')
df2

Unnamed: 0,Sample,average_rank,min_rank,max_rank,first_rank,dense_rank
0,1,1.0,1.0,1.0,1.0,1.0
1,2,2.0,2.0,2.0,2.0,2.0
2,3,3.5,3.0,4.0,3.0,3.0
5,3,3.5,3.0,4.0,4.0,3.0
3,4,5.0,5.0,5.0,5.0,4.0
4,5,6.5,6.0,7.0,6.0,5.0
6,5,6.5,6.0,7.0,7.0,5.0
7,6,8.0,8.0,8.0,8.0,6.0
8,7,9.5,9.0,10.0,9.0,7.0
9,7,9.5,9.0,10.0,10.0,7.0


### 4. Ranking Via PCT

You can also normalize your ranks to fit between 0-1 using pct=True

In [58]:
df_copy = df.copy()
df_copy['park_rank'] = df_copy['Parks'].rank(ascending=False, pct=True)
df_copy

Unnamed: 0,Parks,Schools,park_rank
San Francisco,124,93,0.375
San Diego,132,176,0.25
Los Angeles,88,88,0.625
New York,178,138,0.125
Chicago,76,127,0.8125
Denver,76,76,0.8125
Seattle,112,4,0.5
Portland,13,71,1.0


### 5. Ranking with Group By

Finally, let's check out ranking within subgroups. You can use .rank() on your group by function as well.

Let's create a DataFrame that will play nicely for this example

In [62]:
df = pd.DataFrame([('Foreign Cinema', 'Restaurant', 289.0),
                   ('Liho Liho', 'Restaurant', 224.0),
                   ('500 Club', 'bar', 80.5),
                   ('The Square', 'bar', 25.30),
                   ('Chambers', 'bar', 35.89)],
           columns=('name', 'type', 'AvgBill')
                 )
df

Unnamed: 0,name,type,AvgBill
0,Foreign Cinema,Restaurant,289.0
1,Liho Liho,Restaurant,224.0
2,500 Club,bar,80.5
3,The Square,bar,25.3
4,Chambers,bar,35.89


In [65]:
df['sub_group_rank'] = df.groupby('type')['AvgBill'].rank(ascending=False)
df

Unnamed: 0,name,type,AvgBill,sub_group_rank
0,Foreign Cinema,Restaurant,289.0,1.0
1,Liho Liho,Restaurant,224.0,2.0
2,500 Club,bar,80.5,1.0
3,The Square,bar,25.3,3.0
4,Chambers,bar,35.89,2.0
