In [1]:
import pandas as pd
import numpy as np

def dataframe():
    data = {
        'Name': ['John', 'Anna', 'Peter', 'Linda', 'Paul', 'Patrick', 'Steve', 'Linda', 'Newt', 'Anne'],
        'Age': [27, 43, 35, 41, 16, 46, 16, 11, 35, 49],
        'City': ['New York', 'London', 'Tokyo', 'Paris', 'Sydney', 'Toronto', 'Berlin', 'Mumbai', 'Cape Town', 'Sao Paulo'],
        'Birthday': ['1987-05-12', '1993-11-03', '1978-07-26', '2001-09-15', '1984-01-30',
             '1990-06-21', '1969-12-08', '2005-03-10', '1975-08-19', '1998-10-04'],
        'User ID': ['USR1001', 'USR1002', 'USR1003', 'USR1004', 'USR1005', 'USR1006', 'USR1007', 'USR1008', 'USR1009', 'USR1010']
    }

    df = pd.DataFrame(data)

    return df

In [2]:
"""
One of the most common structures in analytics is the if-then-else statement, or "conditional statements".
Imagine we have a column like the one below:

+-----------+
|    Age    |
+-----------+
|     1     |
+-----------+
|     8     |
+-----------+
|     13    |
+-----------+
|     9     |
+-----------+
|     25    |
+-----------+
|     3     |
+-----------+

Suppose we want to add a column that categorizes the age into two groups: 
"Yes" if the age is greater than 10, and "No" if the age is less than or equal to 10.

In Excel, you would use the IF function like this:

+----------------------------+
|           Age Group        |
+----------------------------+
|  =IF(A1 > 10, "Yes", "No") |
+----------------------------+

Or in SQL:

SELECT
    "Age",
CASE
    WHEN A1 > 10 THEN 'Yes'
    ELSE 'No'
END AS "Age Group"

In Python, you can accomplish the same thing using the np.where function.

Let's get some data from our earlier function
"""

df = dataframe()

In [3]:
"""
Now let's add a column that categorizes the age into two groups: 'Yes' if the age is greater than 10, and 'No' if the age is less than or equal to 10 using np.where

It looks almost identical to an IF statement in Excel:

df['Age Group'] = np.where(df['Age'] > 10, 'Yes', 'No')

Let's break it down into its components:

np.where: this accesses the np library and the where function
df['Age'] > 10: this is the condition we are testing, the first comma after the condition is like "then"
'Yes': this is what we want to return if the condition is true, the comma after 'Yes' is like "else"
'No': this is what we want to return if the condition is false
"""

df['Age Group'] = np.where(df['Age'] > 10, 'Yes', 'No')

df.head()

Unnamed: 0,Name,Age,City,Birthday,User ID,Age Group
0,John,27,New York,1987-05-12,USR1001,Yes
1,Anna,43,London,1993-11-03,USR1002,Yes
2,Peter,35,Tokyo,1978-07-26,USR1003,Yes
3,Linda,41,Paris,2001-09-15,USR1004,Yes
4,Paul,16,Sydney,1984-01-30,USR1005,Yes
