# Working with Null Values

### Introduction

In this lesson, we'll learn a couple of useful command for coercing null and missing values.  The first is `COALESCE`, which coerces null values to other specified values.  And the second is `NULLIF`, which converts values -- like missing values -- to null.  Let's learn about both commands in this lesson.  

### Loading our Data

Let's create and load our data into the CRM database with the following.

In [2]:
import sqlite3
conn = sqlite3.connect('crm.db')
cursor = conn.cursor()

In [5]:
import pandas as pd
url = "https://raw.githubusercontent.com/data-eng-10-21/case-when/main/coerced_persons.csv"
df = pd.read_csv(url)
df[:2]

Unnamed: 0,id,first_name,last_name,cell_phone,home_phone,work_phone
0,1,Ynes,Durrett,273-102-2043,850-519-2573,152-593-2967
1,2,,Gascoyne,301-931-3773,310-959-7139,741-504-0114


In [3]:
df.to_sql('persons', conn, index = False)

Now if we look at the data, we can see that some of the values are null.

In [4]:
pd.read_sql("SELECT * FROM persons WHERE first_name IS NULL LIMIT 3;", conn)

Unnamed: 0,id,first_name,last_name,cell_phone,home_phone,work_phone
0,2,,Gascoyne,301-931-3773,310-959-7139,741-504-0114
1,4,,Brashier,439-258-2695,,
2,5,,Gillis,254-692-3658,,


In this lesson, we'll see how we can work with those null values.

### Converting from Null Values

One way to work with our null values is with the Coalesce function.  The coalesce function returns the first non null value in a sequence.  For example, let's say that we want to replace any null values in first name with the string `'unknown'`.  We can do so with the following.

In [62]:
query = """SELECT COALESCE(first_name, 'Unknown') as first_name, last_name
FROM persons LIMIT 6;"""

pd.read_sql(query, conn)

Unnamed: 0,first_name,last_name
0,Ynes,Durrett
1,,Gascoyne
2,Candis,Langthorne
3,Unknown,Brashier
4,Unknown,Gillis
5,Unknown,Huddy


We can see that every time we had a null value -- in rows 3 through 6 -- it now replaces those with the word "Unknown".  So one way to think about the coalesce function is like a default argument in SQL.  When the value is null, it uses the second value. 

In [None]:
query = """SELECT COALESCE(first_name, 'Unknown') as first_name, last_name
FROM persons LIMIT 6;"""

pd.read_sql(query, conn)

With the coalesce function, we are not just limited to hardcoded values, but can also use another column value.  For example, let's say that we want to use  a person's last name when the first name is not present.  

Again we can do so with the COALESCE function.

In [65]:
query = """SELECT COALESCE(first_name, last_name) as name, last_name
FROM persons LIMIT 6;"""

pd.read_sql(query, conn)

Unnamed: 0,name,last_name
0,Ynes,Durrett
1,,Gascoyne
2,Candis,Langthorne
3,Brashier,Brashier
4,Gillis,Gillis
5,Huddy,Huddy


So this time we can see that when the first name is not available, we move to the `last_name`. 

> Finally, the coalesce function can take as many arguments as we want, so if the first and last name are not available, we can use the word `unknown`.

```sql
SELECT COALESCE(first_name, last_name, "unknown") as name, last_name
FROM persons LIMIT 3
```

### Converting to Null Values

Sometimes, we would prefer to represent our data as null values.  For example, if we take a look at some of the records in our persons table, we can see that some of the information is represented as empty strings.

In [77]:
pd.read_sql("SELECT * FROM persons LIMIT 3", conn)

Unnamed: 0,id,first_name,last_name,cell_phone,home_phone,work_phone
0,1,Ynes,Durrett,273-102-2043,850-519-2573,152-593-2967
1,2,,Gascoyne,301-931-3773,310-959-7139,741-504-0114
2,3,Candis,Langthorne,324-169-8178,474-798-1579,447-945-4760


So above, our second row of data has first_name stored as an empty string.  And as we know, we also have some first_name data stored as a null value.  We can make our data more consistent by converting the empty strings to null values.  

And we perform this operation with the `NULLIF` operation.

In [89]:
pd.read_sql("SELECT id, first_name, last_name FROM persons LIMIT 10", conn)

Unnamed: 0,id,first_name,last_name
0,1,Ynes,Durrett
1,2,,Gascoyne
2,3,Candis,Langthorne
3,4,,Brashier
4,5,,Gillis
5,6,,Huddy
6,7,Blakelee,Jorck
7,8,Ezra,Walcot
8,9,Page,Pardon
9,10,,Padwick


In [93]:
query = """SELECT id, NULLIF(first_name, '') as first_name, last_name 
FROM persons LIMIT 3"""
pd.read_sql(query, conn)

Unnamed: 0,id,first_name,last_name
0,1,Ynes,Durrett
1,2,,Gascoyne
2,3,Candis,Langthorne


So now we can see that our SELECT statement is representing the empty string as None.

> We can select this coerced data as if it were null, but to do so, we need to use an alias, and select on that alias.

In [94]:
query = """SELECT id, NULLIF(first_name, '') as name, last_name 
FROM persons WHERE name IS NULL LIMIT 3"""
pd.read_sql(query, conn)

Unnamed: 0,id,name,last_name
0,2,,Gascoyne
1,4,,Brashier
2,5,,Gillis


### Combining NullIf and Coalesce

Sometimes we'll see nullif and coalesce used together.  For example, let's take another look at how we first used the coalesce function.

In [95]:
query = """SELECT COALESCE(first_name, 'Unknown') as first_name, last_name
FROM persons LIMIT 6;"""

pd.read_sql(query, conn)

Unnamed: 0,first_name,last_name
0,Ynes,Durrett
1,,Gascoyne
2,Candis,Langthorne
3,Unknown,Brashier
4,Unknown,Gillis
5,Unknown,Huddy


It successfully changed the null values to be unknown, but did not convert the empty string values, because well empty is not the same as null.

So we can change this to capture both null and empty string values, by first converting the empty strings to null and then using coalesce.

In [97]:
query = """SELECT COALESCE(NULLIF(first_name, ''), 'Unknown') as first_name,
last_name
FROM persons LIMIT 6;"""

pd.read_sql(query, conn)

Unnamed: 0,first_name,last_name
0,Ynes,Durrett
1,Unknown,Gascoyne
2,Candis,Langthorne
3,Unknown,Brashier
4,Unknown,Gillis
5,Unknown,Huddy


Let's break down the above.  The `NULLIF(first_name, '')` converts empty strings to null values, and COALESCE the converts any null values to `'Unknown'`.

```sql
COALESCE(NULLIF(first_name, ''), 'Unknown')
```

### Summary

In this lesson, we learned a couple of techniques for working with null values.  The first is the COALESCE function, which returns the first non-null argument.  

We can use the COALESCE function like a default argument.  For example, the below statement will replace null first name values with unknown.

```sql
SELECT COALESCE(first_name, 'Unknown') as first_name
```

And we can also have coalesce use the first non-null column value like so. 
```sql
SELECT COALESCE(first_name, last_name) as name
```

Then we saw we can use NULLIF to convert values to null -- like empty strings. 

```sql
SELECT id, NULLIF(first_name, '') as name
```

And finally, we can combine the two, to first convert null-like values to null, and then coalesce those values.

```sql
COALESCE(NULLIF(first_name, ''), 'Unknown')
```