In [1]:
import pandas as pd
import numpy as np

### Pandas idxmax

Pandas idxmax will tell you where your axis highest values are on the *other axis*. It is super useful when using your columns as observation points (vs categorical variables). 

We will run through 3 examples:
1. Find which row's column has it's highest value
2. Find which column's row has it's highest value
3. Using a larger dataframe, find which students scored highest on a each test. 

But first, let's create our DataFrame

In [2]:
np.random.seed(seed=42)

df = pd.DataFrame(data=np.random.randint(0, 100, (4,3)),
           columns=['Test1', 'Test2', 'Test3'],
            index=['Bob','Sally', 'Frank', 'Patty']
                 )
df

Unnamed: 0,Test1,Test2,Test3
Bob,51,92,14
Sally,71,60,20
Frank,82,86,74
Patty,74,87,99


### 1. Find which row's column has it's highest value

In order to find out which column has the highest value for a given row, we need to call idxmax(axis=1). The resulting series will tell us which column/row intersection contains the highest value.

Notice below how the series tells us which test was higest for each of our students.

In [3]:
df.idxmax(axis=1)

Bob      Test2
Sally    Test1
Frank    Test2
Patty    Test3
dtype: object

### 2. Find which column's row has it's highest value

Say we wanted to find the inverse, which row has the higest value for each column? Another way, which student scored highest on each test? To do this, set axis=0 to switch to a column view.

Now we only have 3 items in our resulting series, one for each test with the top student in each.

In [4]:
df.idxmax(axis=0)

Test1    Frank
Test2      Bob
Test3    Patty
dtype: object

### 3. Using a larger dataframe, find which students scored highest on a each test. 

Let's expand our student base. We will create a dataframe with 100 students and 10 tests to see which ones did the best.

Wow that's a lot of students and test scores. Let's also find how many there are.

In [5]:
np.random.seed(seed=42)
num_students = 100
num_tests = 10

df = pd.DataFrame(data=np.random.randint(0, 100, (num_students,num_tests)),
           columns=["Test{}".format(x) for x in range(1, num_tests + 1)],
            index=["Student{}".format(x) for x in range(1, num_students + 1)]
                 )

print ("There are {:,} test scores".format(len(df)* len(df.columns)))
df

There are 1,000 test scores


Unnamed: 0,Test1,Test2,Test3,Test4,Test5,Test6,Test7,Test8,Test9,Test10
Student1,51,92,14,71,60,20,82,86,74,74
Student2,87,99,23,2,21,52,1,87,29,37
Student3,1,63,59,20,32,75,57,21,88,48
Student4,90,58,41,91,59,79,14,61,61,46
Student5,61,50,54,63,2,50,6,20,72,38
...,...,...,...,...,...,...,...,...,...,...
Student96,80,25,35,0,7,98,51,78,46,55
Student97,85,13,89,27,86,77,87,1,25,13
Student98,58,55,6,2,22,17,37,98,14,63
Student99,88,27,73,38,56,16,85,89,43,24


In [6]:
df.idxmax(axis=1)

Student1      Test2
Student2      Test2
Student3      Test9
Student4      Test4
Student5      Test9
              ...  
Student96     Test6
Student97     Test3
Student98     Test8
Student99     Test8
Student100    Test9
Length: 100, dtype: object