#### Topics

Expressions : CASE, COALESCE, NULLIF

In [1]:
import pyodbc
import os
import pandas as pd

#Check if drivers are installed
#[x for x in pyodbc.drivers() if x.startswith("Microsoft Access Driver")]

# Define the connection string
conn_str = (
    r'DRIVER={ODBC Driver 17 for SQL Server};'
    r'SERVER=localhost;'
    r'DATABASE=BikeStores;'
    r'Trusted_Connection=yes;'
)

# Establish the connection
conn = pyodbc.connect(conn_str, autocommit=True)

# Create a cursor
cursor = conn.cursor()

### CASE

SQL Server CASE expression evaluates a list of conditions and returns one of the multiple specified results. The CASE expression has two formats: simple CASE expression and searched CASE expression. Both of CASE expression formats support an optional ELSE statement.

Because CASE is an expression, you can use it in any clause that accepts an expression such as SELECT, WHERE, GROUP BY, and HAVING.

    CASE input   
        WHEN e1 THEN r1
        WHEN e2 THEN r2
        ...
        WHEN en THEN rn
        [ ELSE re ]   
    END  


In [2]:
cursor.execute('''
SELECT    
    order_status, 
    COUNT(order_id) order_count
FROM    
    sales.orders
WHERE 
    YEAR(order_date) = 2018
GROUP BY 
    order_status;

''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,order_status,order_count
0,1,62
1,2,63
2,3,13
3,4,154


The values in the order_status column are numbers, which is not meaningful in this case. To make the output more understandable, you can use the simple CASE expression as shown in the following query:

In [3]:
cursor.execute('''
SELECT    
    CASE order_status
        WHEN 1 THEN 'Pending'
        WHEN 2 THEN 'Processing'
        WHEN 3 THEN 'Rejected'
        WHEN 4 THEN 'Completed'
    END AS order_status, 
    COUNT(order_id) order_count
FROM    
    sales.orders
WHERE 
    YEAR(order_date) = 2018
GROUP BY 
    order_status;
''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,order_status,order_count
0,Pending,62
1,Processing,63
2,Rejected,13
3,Completed,154


In [4]:
cursor.execute('''
SELECT    
    SUM(CASE
            WHEN order_status = 1
            THEN 1
            ELSE 0
        END) AS 'Pending', 
    SUM(CASE
            WHEN order_status = 2
            THEN 1
            ELSE 0
        END) AS 'Processing', 
    SUM(CASE
            WHEN order_status = 3
            THEN 1
            ELSE 0
        END) AS 'Rejected', 
    SUM(CASE
            WHEN order_status = 4
            THEN 1
            ELSE 0
        END) AS 'Completed', 
    COUNT(*) AS Total
FROM    
    sales.orders
WHERE 
    YEAR(order_date) = 2018;

''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,Pending,Processing,Rejected,Completed,Total
0,62,63,13,154,292


In this example:

    First, the condition in the WHERE clause includes sales order in 2018.
    Second, the CASE expression returns either 1 or 0 based on the order status.
    Third, the SUM() function adds up the number of order for each order status.
    Fourth, the COUNT() function returns the total orders.

In [5]:
cursor.execute('''
SELECT    
    o.order_id, 
    SUM(quantity * list_price) order_value,
    CASE
        WHEN SUM(quantity * list_price) <= 500 
            THEN 'Very Low'
        WHEN SUM(quantity * list_price) > 500 AND 
            SUM(quantity * list_price) <= 1000 
            THEN 'Low'
        WHEN SUM(quantity * list_price) > 1000 AND 
            SUM(quantity * list_price) <= 5000 
            THEN 'Medium'
        WHEN SUM(quantity * list_price) > 5000 AND 
            SUM(quantity * list_price) <= 10000 
            THEN 'High'
        WHEN SUM(quantity * list_price) > 10000 
            THEN 'Very High'
    END order_priority
FROM    
    sales.orders o
INNER JOIN sales.order_items i ON i.order_id = o.order_id
WHERE 
    YEAR(order_date) = 2018
GROUP BY 
    o.order_id;
''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,order_id,order_value,order_priority
0,1324,7150.95,High
1,1325,9399.96,High
2,1326,5999.96,High
3,1327,8819.93,High
4,1328,4259.94,Medium
5,1329,5126.94,High
6,1330,3959.92,Medium
7,1331,7369.95,High
8,1332,2909.94,Medium
9,1333,13157.92,Very High


### COALESCE

The SQL Server COALESCE expression accepts a number of arguments, evaluates them in sequence, and returns the first non-null argument.

The following illustrates the syntax of the COALESCE expression:

COALESCE(e1,[e2,...,en])

In this syntax, e1, e2, … en are scalar expressions that evaluate to scalar values. The COALESCE expression returns the first non-null expression. If all expressions evaluate to NULL, then the COALESCE expression return NULL;

Because the COALESCE is an expression, you can use it in any clause that accepts an expression such as SELECT, WHERE, GROUP BY, and HAVING.

#### A) Using SQL Server COALESCE expression with character string data example
The following example uses the COALESCE expression to return the string 'Hi' because it is the first non-null argument:

    SELECT 
        COALESCE(NULL, 'Hi', 'Hello', NULL) result;

In [7]:
cursor.execute('''
SELECT 
        COALESCE(NULL, 'Hi', 'Hello', NULL) result;
''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,result
0,Hi


#### B) Using SQL Server COALESCE expression with the numeric data example
This example uses the COALESCE expression to evaluate a list of arguments and to return the first number:

    SELECT 
        COALESCE(NULL, NULL, 100, 200) result;

In [8]:
cursor.execute('''
SELECT 
        COALESCE(NULL, NULL, 100, 200) result;
''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,result
0,100


#### C) Using SQL Server COALESCE expression to substitute NULL by new values
    
    SELECT 
        first_name, 
        last_name, 
        phone, 
        email
    FROM 
        sales.customers
    ORDER BY 
        first_name, 
        last_name;


In [9]:
cursor.execute('''
SELECT 
        first_name, 
        last_name, 
        phone, 
        email
    FROM 
        sales.customers
    ORDER BY 
        first_name, 
        last_name;
''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,first_name,last_name,phone,email
0,Aaron,Knapp,(914) 402-4335,aaron.knapp@yahoo.com
1,Aaron,Knapp,(914) 402-4335,aaron.knapp@yahoo.com
2,Abbey,Pugh,,abbey.pugh@gmail.com
3,Abbey,Pugh,,abbey.pugh@gmail.com
4,Abby,Gamble,,abby.gamble@aol.com
5,Abby,Gamble,,abby.gamble@aol.com
6,Abram,Copeland,,abram.copeland@gmail.com
7,Abram,Copeland,,abram.copeland@gmail.com
8,Adam,Henderson,,adam.henderson@hotmail.com
9,Adam,Henderson,,adam.henderson@hotmail.com


In [14]:
cursor.execute('''
SELECT 
    DISTINCT first_name, 
    last_name, 
    COALESCE(phone,'N/A') phone, 
    email
FROM 
    sales.customers
ORDER BY 
    first_name, 
    last_name;

''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,first_name,last_name,phone,email
0,Aaron,Knapp,(914) 402-4335,aaron.knapp@yahoo.com
1,Abbey,Pugh,,abbey.pugh@gmail.com
2,Abby,Gamble,,abby.gamble@aol.com
3,Abram,Copeland,,abram.copeland@gmail.com
4,Adam,Henderson,,adam.henderson@hotmail.com
5,Adam,Thornton,,adam.thornton@hotmail.com
6,Addie,Hahn,,addie.hahn@hotmail.com
7,Adelaida,Hancock,,adelaida.hancock@aol.com
8,Adelle,Larsen,,adelle.larsen@gmail.com
9,Adena,Blake,,adena.blake@hotmail.com


#### D) Using SQL Server COALESCE expression to use the available data

In [15]:
cursor.execute('''
CREATE TABLE salaries (
    staff_id INT PRIMARY KEY,
    hourly_rate decimal,
    weekly_rate decimal,
    monthly_rate decimal,
    CHECK(
        hourly_rate IS NOT NULL OR 
        weekly_rate IS NOT NULL OR 
        monthly_rate IS NOT NULL)
);
''')

<pyodbc.Cursor at 0x2605e066130>

In [18]:
cursor.execute('''
INSERT INTO 
    salaries(
        staff_id, 
        hourly_rate, 
        weekly_rate, 
        monthly_rate
    )
VALUES
    (1,20, NULL,NULL),
    (2,30, NULL,NULL),
    (3,NULL, 1000,NULL),
    (4,NULL, NULL,6000),
    (5,NULL, NULL,6500)

''')

<pyodbc.Cursor at 0x2605e066130>

In [19]:
cursor.execute('''
SELECT
    staff_id, 
    hourly_rate, 
    weekly_rate, 
    monthly_rate
FROM
    salaries
ORDER BY
    staff_id;

''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,staff_id,hourly_rate,weekly_rate,monthly_rate
0,1,20.0,,
1,2,30.0,,
2,3,,1000.0,
3,4,,,6000.0
4,5,,,6500.0


calculate monthly for each staff using the COALESCE expression as shown in the following query:

In [20]:
cursor.execute('''
SELECT
    staff_id,
    COALESCE(
        hourly_rate*22*8, 
        weekly_rate*4, 
        monthly_rate
    ) monthly_salary
FROM
    salaries;
''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,staff_id,monthly_salary
0,1,3520
1,2,5280
2,3,4000
3,4,6000
4,5,6500


#### NULLIF

The NULLIF expression accepts two arguments and returns NULL if two arguments are equal. Otherwise, it returns the first expression.

In [22]:
cursor.execute('''
SELECT 
    NULLIF(10, 10) result;
''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,result
0,


In [23]:
cursor.execute('''
SELECT 
    NULLIF(20, 10) result;
''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,result
0,20


In [24]:
cursor.execute('''
SELECT 
    NULLIF('Hello', 'Hello') result;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,result
0,


In [25]:
cursor.execute('''
SELECT 
    NULLIF('Hello', 'Hi') result;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,result
0,Hello


#### A) Using NULLIF expression to translate a blank string to NULL

In [26]:
cursor.execute('''
CREATE TABLE sales.leads
(
    lead_id    INT	PRIMARY KEY IDENTITY, 
    first_name VARCHAR(100) NOT NULL, 
    last_name  VARCHAR(100) NOT NULL, 
    phone      VARCHAR(20), 
    email      VARCHAR(255) NOT NULL
);

''')

<pyodbc.Cursor at 0x2605e066130>

In [27]:
cursor.execute('''
INSERT INTO sales.leads
(
    first_name, 
    last_name, 
    phone, 
    email
)
VALUES
(
    'John', 
    'Doe', 
    '(408)-987-2345', 
    'john.doe@example.com'
),
(
    'Jane', 
    'Doe', 
    '', 
    'jane.doe@example.com'
),
(
    'David', 
    'Doe', 
    NULL, 
    'david.doe@example.com'
);

''')

<pyodbc.Cursor at 0x2605e066130>

In [29]:
cursor.execute('''
SELECT 
    lead_id, 
    first_name, 
    last_name, 
    phone, 
    email
FROM 
    sales.leads
ORDER BY
    lead_id;

''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,lead_id,first_name,last_name,phone,email
0,1,John,Doe,(408)-987-2345,john.doe@example.com
1,2,Jane,Doe,,jane.doe@example.com
2,3,David,Doe,,david.doe@example.com


To find the leads who do not have the phone number, you use the following query:

In [30]:
cursor.execute('''
SELECT    
    lead_id, 
    first_name, 
    last_name, 
    phone, 
    email
FROM    
    sales.leads
WHERE 
    phone IS NULL;

''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,lead_id,first_name,last_name,phone,email
0,3,David,Doe,,david.doe@example.com


The output missed one row which has the empty string in the phone column. To fix this you can use the NULLIF expression:

In [32]:
cursor.execute('''
SELECT    
    lead_id, 
    first_name, 
    last_name, 
    phone, 
    email
FROM    
    sales.leads
WHERE 
    NULLIF(phone,'') IS NULL;

''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,lead_id,first_name,last_name,phone,email
0,2,Jane,Doe,,jane.doe@example.com
1,3,David,Doe,,david.doe@example.com
