In [1]:
import pandas as pd
import numpy as np

def dataframe():
    data = {
        'Name': ['John', 'Anna', 'Peter', 'Linda', 'Paul', 'Patrick', 'Steve', 'Linda', 'Newt', 'Anne'],
        'Age': [27, 43, 35, 41, 16, 46, 16, 11, 35, 49],
        'City': ['New York', 'London', 'Tokyo', 'Paris', 'Sydney', 'Toronto', 'Berlin', 'Mumbai', 'Cape Town', 'Sao Paulo'],
        'Birthday': ['1987-05-12', '1993-11-03', '1978-07-26', '2001-09-15', '1984-01-30',
             '1990-06-21', '1969-12-08', '2005-03-10', '1975-08-19', '1998-10-04'],
        'User ID': ['USR1001', 'USR1002', 'USR1003', 'USR1004', 'USR1005', 'USR1006', 'USR1007', 'USR1008', 'USR1009', 'USR1010']
    }

    df = pd.DataFrame(data)

    return df

In [2]:
"""
But np.where has an obvious limitation: it only allows for testing one condition. 
Imagine writing an Excel IF formula that only allows one condition, it'd be very limiting.

So let's explore a different function that allows for multiple conditions: np.select.

Let's get some data
"""

df = dataframe()

"""
Imagine we want to create a new conditional column called "Age Group" based on the age of the person. We'll create three groups: Child, Adult, and Senior.

To accomplish this, we'll use the np.select function.

np.select takes three arguments: 'conditions', 'choices', and 'default'. It looks like this:

df['New Column Name'] = np.select(conditions, choices, default='Some Value')

First we'll create a variable with a list and pass in the conditions. This is equivalent to the "if" part of the if-then-else statement.
"""

conditions = [
    df['Age'] <= 18,
    df['Age'] <= 35,
    df['Age'] > 35
]

"""
Then we'll create a list of choices. This is equivalent to the "then" part of the if-then-else statement.

We'll call the variable "choices", but remember you can call it whatever you want.
"""

choices = [
    'Child',
    'Adult',
    'Senior'
]

"""
To break it down piece by piece:
if age is less than or equal to 18, then the value is 'Child'
if age is less than or equal to 35, then the value is 'Adult'
if age is greater than 35, then the value is 'Senior'

The last element is the default value. If none of the conditions are met, then the value is 'Unknown'.

Below is an example of using the conditions and choices
"""

df['Age Group'] = np.select(conditions, choices, default='Unknown')

df.head()

Unnamed: 0,Name,Age,City,Birthday,User ID,Age Group
0,John,27,New York,1987-05-12,USR1001,Adult
1,Anna,43,London,1993-11-03,USR1002,Senior
2,Peter,35,Tokyo,1978-07-26,USR1003,Adult
3,Linda,41,Paris,2001-09-15,USR1004,Senior
4,Paul,16,Sydney,1984-01-30,USR1005,Child
