# Working with Null Values Lab

### Introduction

In this lab, we'll practice working with the COALESCE and NULLIF commands.  Let's get started.

### Loading our Data

In [2]:
import sqlite3
conn = sqlite3.connect('crm.db')

In [4]:
import pandas as pd

pd.read_sql("SELECT * FROM persons LIMIT 1;",conn)

Unnamed: 0,id,first_name,last_name,cell_phone,home_phone,work_phone
0,1,Ynes,Durrett,273-102-2043,850-519-2573,152-593-2967


### Querying Null Values

Let's begin by selecting some of the rows with null values.  Find the number of rows where `home_phone` is null.

In [20]:
query = """
SELECT COUNT(*) as home_phone_null FROM persons WHERE home_phone IS null;
"""

pd.read_sql(query ,conn)

# 	home_phone_null
# 0	224

Unnamed: 0,home_phone_null
0,224


Now it turns out the count function will only count non-null values on a specific column.  Let's count the non-null values in home phone.  

In [15]:
query = """
SELECT COUNT(home_phone) as non_null_home_phone FROM persons;
"""

pd.read_sql(query ,conn)

# 	non_null_home_phone
# 0	776

Unnamed: 0,non_null_home_phone
0,776


So we can see that 776 of our rows had home phone numbers that were not null.

Now write a select statement that returns the null values in the home_phone, cell_phone and work_phone columns.  Name these columns `missing_phone`, `missing_cell` and `missing_work`.

In [30]:
query = """
SELECT (COUNT(*) - COUNT(home_phone)) as missing_phone, 
COUNT(*) - COUNT(cell_phone) as missing_cell, 
COUNT(*) - COUNT(work_phone) as missing_work FROM persons;
"""

pd.read_sql(query, conn)

Unnamed: 0,missing_phone,missing_cell,missing_work
0,224,0,224


### Coercing Null Values 

Now let's start coercing some with some of our null values.  Begin by selecting the `id` and `home_phone` from the first 8 rows, replacing any null `home_phone` with the string `'Unknown'`.

In [23]:
query = """
SELECT id, COALESCE(home_phone, 'Unknown') as home_phone, cell_phone, work_phone
FROM persons LIMIT 8;
"""

pd.read_sql(query ,conn)

# 	id	home_phone
# 0	1	850-519-2573
# 1	2	310-959-7139
# 2	3	474-798-1579
# 3	4	Unknown
# 4	5	Unknown
# 5	6	Unknown
# 6	7	807-792-5705
# 7	8	928-375-5660

Unnamed: 0,id,home_phone,cell_phone,work_phone
0,1,850-519-2573,273-102-2043,152-593-2967
1,2,310-959-7139,301-931-3773,741-504-0114
2,3,474-798-1579,324-169-8178,447-945-4760
3,4,Unknown,439-258-2695,
4,5,Unknown,254-692-3658,
5,6,Unknown,505-447-9193,
6,7,807-792-5705,471-468-9219,456-531-6413
7,8,928-375-5660,514-287-9273,798-832-6223


Next select the `id` and `home_phone` from first ten rows of the persons table, but display the `cell_phone` whenever the `home_phone` is null, and alias the column to `phone`.

In [24]:
query = """
SELECT id, COALESCE(home_phone, cell_phone) as phone 
FROM persons LIMIT 8;
"""

pd.read_sql(query, conn)

# 	id	phone
# 0	1	850-519-2573
# 1	2	310-959-7139
# 2	3	474-798-1579
# 3	4	439-258-2695
# 4	5	254-692-3658
# 5	6	505-447-9193
# 6	7	807-792-5705
# 7	8	928-375-5660

Unnamed: 0,id,phone
0,1,850-519-2573
1,2,310-959-7139
2,3,474-798-1579
3,4,439-258-2695
4,5,254-692-3658
5,6,505-447-9193
6,7,807-792-5705
7,8,928-375-5660


### Working with Missing Values

Let's begin by counting the number of rows where the cell_phone is an empty string and name the result as `missing_cell_phone_num`.

In [34]:
query = """SELECT COUNT(*) missing_cell_phone_num 
FROM persons WHERE cell_phone = '';
"""
pd.read_sql(query, conn)

# 	missing_cell_phone_nums
# 0	40

Unnamed: 0,missing_cell_phone_nums
0,40


Next, let's coerce these empty strings to be null, and then select the first 10 rows with null values.

> Display the id and the phone columns.

In [40]:
query = """SELECT id, NULLIF(cell_phone, '') as phone, cell_phone 
FROM persons WHERE phone is NULL LIMIT 10;
"""
pd.read_sql(query, conn)

# 	id	phone
# 0	16	None
# 1	42	None
# 2	96	None
# 3	110	None
# 4	133	None
# 5	154	None
# 6	210	None
# 7	216	None
# 8	231	None
# 9	241	None

Unnamed: 0,id,phone,cell_phone
0,16,,
1,42,,
2,96,,
3,110,,
4,133,,
5,154,,
6,210,,
7,216,,
8,231,,
9,241,,


Next, let's display the `work_phone` number for any `cell_phone` numbers that are an empty string. Then select the phone numbers that are still null.

There should only be seven rows that are still null.

In [49]:
query = """SELECT id, COALESCE(NULLIF(cell_phone, ''), work_phone) as phone
FROM persons WHERE phone IS null;
"""
pd.read_sql(query, conn)

# 	id	phone
# 0	16	None
# 1	216	None
# 2	396	None
# 3	404	None
# 4	624	None
# 5	682	None
# 6	781	None

Unnamed: 0,id,phone
0,16,
1,216,
2,396,
3,404,
4,624,
5,682,
6,781,


### Summary

In this lesson, we practiced working with null values in SQL.  We did so through working with the COALESCE command, which replaces null values with a hardcoded value or the value from another column.

And we also practiced working with the `NULLIF` command which converts specified values to null.  Finally, we practiced working with COALESCE and NULLIF together, to first convert specific values to null and then apply coalesce for all of our null values.