# Python Explainer - How does mode() work, and how to extract useful

author: Pieter Overdevest  
date: 2022-11-28

#### Aim

Determine the mode for each categorical feature in a data frame and
infere information from the results. Mode is the most frequently present
value in the concerned feature.

#### Initialization

We start by importing the Pandas package,

In [16]:
import pandas as pd

and create a data frame with some exemplary data.

In [17]:
df = pd.DataFrame({
    
    'animal': ["dog", "cat", "dog", "cat", "dog", "cat", "horse"],
    "id": ["a7", "a5", "a3", "a4", "a1", "a6", "a2"],
    "color": ["red", "red", "red", "red", "red", "red", "blue"]
})

#### Step-by-Step

Applying the `mode()` function, we get a data frame with most frequently
present values in each of the columns. In `df.animal` cat and dog are
present at the same frequency, i.e., 3. Both are listed in `df_mode` in
alphabetical order. Cat is mentioned first even though dog was mentioned
first in `df.animal`. In `df.id` all values are unique, so they all show
up alphabetically ordered in the data frame below. A clear cut case -
where there is just one the winner - is shown in `df.color` where red
occurs most often.

In [18]:
df_mode = df.mode()

df_mode


To obtain the candidates for the most frequently present values in their
respective columns, we simply take the first row.

In [19]:
df_mode.iloc[0]

animal    cat
id         a1
color     red
Name: 0, dtype: object

Though, we should be aware that there can be other values occuring at
the same maximum frequency. Below, we count the number of features that
have more than one value present at the maximum frequency, by evaluating
the second row in `df_mode`.

In [20]:
df_mode.notna().iloc[1].sum()

2

A so-called `list comprehension` is an elegant way to identify the
columns that have at least two or mores value that occur at the same
maximum frequency. See also intermezzo ‘list-comprehensions’.

In [21]:
v_col = [
    
    df.columns[i]
    
    for i in range(len(df_mode.columns))
    
    if df_mode.notna().iloc[1,i]
]

v_col

['animal', 'id']

And, in case we want to know which values in these columns occur at the
same maximum frequency, again, we can use a list comprehension, in
addition to the `value_counts()` function. If list comprehensions are
simple, you can write them in a single line. If they are more complex,
putting different elements at different lines, can make it easier for
others to understand, but also for yourself when you revisit your code
after say 1 month. In order to run the code it makes no difference
whether the code is in one line or not. Note, you cannot put the line
break at any point. E.g., you can, when you put them between `(` and
`)`, as shown below.

In [22]:
[
    #Output
    pd.DataFrame(
        
        df[c_col].value_counts()

    ).sort_values(
        
        by        = c_col,
        ascending = False
    )
    
    # Iteration.
    for c_col in v_col
]

[       animal
 dog         3
 cat         3
 horse       1,
     id
 a7   1
 a5   1
 a3   1
 a4   1
 a1   1
 a6   1
 a2   1]

You can also make use of `\` character, to mark continuation of the
script on the next line.

In [23]:
[
    #Output
    pd.DataFrame(df[c_col].value_counts())\
        
        .sort_values(by = c_col, ascending = False)
    
    # Iteration.
    for c_col in v_col
]

[       animal
 dog         3
 cat         3
 horse       1,
     id
 a7   1
 a5   1
 a3   1
 a4   1
 a1   1
 a6   1
 a2   1]