***Note:*** *Each cell demonstrating SQL code in Jupyter Notebook needs to begin with `%%sql` in order for the interpreter to treat the code as SQL statements/queries. The `%%sql` is not needed otherwise in SQL/database-client tools.*

*Initial setup to load sql module in order to run sql statements on this Notebook:*

In [1]:
%load_ext sql
%sql postgresql://postgres:password@localhost/dvdrental

# Filtering Data

Most of the time, when working with databases, you will want to narrow your focus to a subset of a table's rows. Therefore, the `where` clause is used in most cases to restrict the number of rows acted on by the SQL statement. This chapter explores the various types of filter conditions that you can employ in the `where` clauses of `select` as well as `update` and `delete` statements (I'll delve more onto these!).

## Condition Evaluation

   A `where` clause may contain one or more *conditions*, separated by the operators `and` and `or`. If multiple conditions are separated only by the `and` operator, then all the conditions must evaluate to `true` for the row to be included in the result set. Consider the following:
   
   > `WHERE staff_id = 1 AND payment_date < '2007-02-16'`
   
Given these two conditions, only payment records made under staff_id 1 **and** prior to 16 Feb 2007 will be retrieved:

In [2]:
%%sql 

select * from payment
WHERE staff_id = 1 AND payment_date < '2007-02-16'
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


payment_id,customer_id,staff_id,rental_id,amount,payment_date
17519,344,1,1341,3.99,2007-02-15 10:54:44.996577
17523,345,1,1457,4.99,2007-02-15 18:34:15.996577
17537,349,1,1197,2.99,2007-02-15 00:11:12.996577
17538,349,1,1523,0.99,2007-02-15 22:47:06.996577
17548,352,1,1498,0.99,2007-02-15 20:26:26.996577
17556,354,1,1491,0.99,2007-02-15 20:16:44.996577
17562,356,1,1410,0.99,2007-02-15 15:28:12.996577
17569,358,1,1455,2.99,2007-02-15 18:19:32.996577
17587,362,1,1429,2.99,2007-02-15 16:52:36.996577
17588,362,1,1529,2.99,2007-02-15 23:06:01.996577


If all conditions in the `where` clause are separated by the `or` operator, however, only *one* of the conditions must evaluate to `true` for the row to be included in the result set. Consider the following two conditions: 
   > `Where staff_id = 1 OR payment_date < '2007-02-16'`

When at least either one condition is true (i.e. payment record made under staff_id 1 **or** record made prior to 16 Feb 2007), the record will be retrieved from the database. 

The truth table below shows the possible outcomes for a `where` clause containing two conditions separated by the `or` operator:

**Intermediate result** | **Final result**
--- | ---
`WHERE true OR true` | `True`
`WHERE true OR false` | `True`
`WHERE false OR true` | `True`
`WHERE false OR false` | `False`

In [3]:
%%sql 

select * from payment
WHERE staff_id = 1 OR payment_date < '2007-02-16'
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


payment_id,customer_id,staff_id,rental_id,amount,payment_date
17503,341,2,1520,7.99,2007-02-15 22:25:46.996577
17504,341,1,1778,1.99,2007-02-16 17:23:14.996577
17505,341,1,1849,7.99,2007-02-16 22:41:45.996577
17508,341,1,3382,5.99,2007-02-21 12:33:49.996577
17510,342,1,2914,5.99,2007-02-20 02:11:44.996577
17511,342,1,3081,2.99,2007-02-20 13:57:39.996577
17513,343,1,1564,6.99,2007-02-16 01:15:33.996577
17517,343,1,2980,8.99,2007-02-20 07:03:29.996577
17518,343,1,3407,0.99,2007-02-21 14:42:28.996577
17519,344,1,1341,3.99,2007-02-15 10:54:44.996577


### Using Parentheses

If your `where` clause includes three or more conditions using both the `and` and `or` operators, you should use the parentheses to make your intent clear, both to the database server and to anyone else reading your code. Here's a `where` clause that extends the previous example:

   > `WHERE amount > 4.0 AND (staff_id = 1 OR payment_date < '2007-02-16')`

There are now three conditions in the above code. Below shows the Truth Table of the possible outcomes for this `where` clause:

**Intermediate result** | **Final result**
--- | ---
`WHERE true AND (true OR true)` | `True`
`WHERE true AND (true OR false)` | `True`
`WHERE true AND (false OR true)` | `True`
`WHERE true AND (false OR false)` | `False`
`WHERE false AND (true OR true)` | `False`
`WHERE false AND (true OR false)` | `False`
`WHERE false AND (false OR true)` | `False`
`WHERE false AND (false OR false)` | `False`

Thus having more conditions in your `where` clause yields more combinations for the server to evaluate.

In [4]:
%%sql 

select * from payment
WHERE amount > 4.0 AND (staff_id = 1 OR payment_date < '2007-02-16')
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


payment_id,customer_id,staff_id,rental_id,amount,payment_date
17503,341,2,1520,7.99,2007-02-15 22:25:46.996577
17505,341,1,1849,7.99,2007-02-16 22:41:45.996577
17508,341,1,3382,5.99,2007-02-21 12:33:49.996577
17510,342,1,2914,5.99,2007-02-20 02:11:44.996577
17513,343,1,1564,6.99,2007-02-16 01:15:33.996577
17517,343,1,2980,8.99,2007-02-20 07:03:29.996577
17520,344,2,1475,4.99,2007-02-15 19:36:27.996577
17523,345,1,1457,4.99,2007-02-15 18:34:15.996577
17526,346,1,1994,5.99,2007-02-17 09:35:32.996577
17531,347,1,3026,4.99,2007-02-20 10:16:26.996577


### Using the `not` Operator

Consider the following condition:

   > `WHERE amount > 4.0 AND NOT(staff_id = 1 OR payment_date < '2007-02-16')`
   
I added the `not` operator after the `and` operator from the previous example. The truth table below lists all possible outcomes:

**Intermediate result** | **Final result**
--- | ---
`WHERE true AND NOT(true OR true)` | `False`
`WHERE true AND NOT(true OR false)` | `False`
`WHERE true AND NOT(false OR true)` | `False`
`WHERE true AND NOT(false OR false)` | `True`
`WHERE false AND NOT(true OR true)` | `False`
`WHERE false AND NOT(true OR false)` | `False`
`WHERE false AND NOT(false OR true)` | `False`
`WHERE false AND NOT(false OR false)` | `False`

In [5]:
%%sql 

select * from payment
WHERE amount > 4.0 AND NOT(staff_id = 1 OR payment_date < '2007-02-16')
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


payment_id,customer_id,staff_id,rental_id,amount,payment_date
17507,341,2,3130,7.99,2007-02-20 17:31:48.996577
17509,342,2,2190,5.99,2007-02-17 23:58:17.996577
17512,343,2,1547,4.99,2007-02-16 00:10:50.996577
17516,343,2,2461,6.99,2007-02-18 18:26:38.996577
17525,345,2,2766,4.99,2007-02-19 16:13:41.996577
17529,347,2,1711,8.99,2007-02-16 12:40:18.996577
17539,349,2,2987,6.99,2007-02-20 07:24:16.996577
17545,351,2,1792,5.99,2007-02-16 18:33:16.996577
17552,352,2,3331,4.99,2007-02-21 08:06:19.996577
17554,353,2,1928,7.99,2007-02-17 05:16:57.996577


Albeit the database server can interpret this statement, it is typically difficult for a person to evaluate a `where` clause that includes the `not`  operator. In this case, you can apply De Morgan's laws to rewrite the `where` clause to avoid using the `not` operator:

   > `WHERE amount > 4.0 AND staff_id != 1 AND payment_date >= '2007-02-16')`

In [6]:
%%sql 

select * from payment
WHERE amount > 4.0 AND staff_id != 1 AND payment_date >= '2007-02-16'
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


payment_id,customer_id,staff_id,rental_id,amount,payment_date
17507,341,2,3130,7.99,2007-02-20 17:31:48.996577
17509,342,2,2190,5.99,2007-02-17 23:58:17.996577
17512,343,2,1547,4.99,2007-02-16 00:10:50.996577
17516,343,2,2461,6.99,2007-02-18 18:26:38.996577
17525,345,2,2766,4.99,2007-02-19 16:13:41.996577
17529,347,2,1711,8.99,2007-02-16 12:40:18.996577
17539,349,2,2987,6.99,2007-02-20 07:24:16.996577
17545,351,2,1792,5.99,2007-02-16 18:33:16.996577
17552,352,2,3331,4.99,2007-02-21 08:06:19.996577
17554,353,2,1928,7.99,2007-02-17 05:16:57.996577


## Building a Condition

A condition is made up of one or more *expressions* coupled with one or more *operators*. An expression can be any of the following:

+ A number
+ A column in a table or view
+ A string literal
+ A built-in function, such as concat('John',' ', 'Doe')
+ A subquery
+ A list of expressions, such as ('Action', 'Animation', 'Children')

The operators used within conditions include:
+ Comparison operators, such as `=`, `!=`, `<`, `>`, `<>`, `LIKE`, `IN`, and `BETWEEN`
+ Arithmetic operators, such as `+`, `-`, `*`, and `/`

The following section demonstrates how you can combine these expressions and operators to manufacture the various types of conditions. These condition types are categorised under 4 key subsections:

1. Equality Conditions
2. Range Conditions
3. Membership Conditions
4. Matching Conditions

### Equality Conditions

A large percentage of the filter conditions that you come across will be of the form `'column = expression'` as in:
 > name = 'Action'
 > staff_id = 1
 > category_id = (SELECT category_id FROM category WHERE name = 'Action')

Conditions such as these are called *equality conditions* because they equate one expression to another. The first two equate a column to a literal (one string and a number), and the third example equtes a column to the value returned from a subquery. The following query uses two equality conditions; one in the `on` clause (a join condition), and the other in the `where` clause (a filter condition):

In [7]:
%%sql 

SELECT c.customer_id, c.first_name, c.last_name, a.district 
FROM customer c INNER JOIN address a
ON c.address_id = a.address_id
WHERE a.district = 'Buenos Aires';

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


customer_id,first_name,last_name,district
89,Julia,Flores,Buenos Aires
107,Florence,Woods,Buenos Aires
219,Willie,Howell,Buenos Aires
322,Jason,Morrissey,Buenos Aires
359,Willie,Markham,Buenos Aires
405,Leonard,Schofield,Buenos Aires
445,Micheal,Forman,Buenos Aires
530,Darryl,Ashcraft,Buenos Aires
560,Jordan,Archuleta,Buenos Aires
585,Perry,Swafford,Buenos Aires


#### Inequality conditions

Another faily common type of condition is the *inequality condition*, which asserts that two expressions are not equal. Here's the previous query with the filter condition in the `where` clause changed to an inequality condition:

In [8]:
%%sql 

SELECT c.customer_id, c.first_name, c.last_name, a.district 
FROM customer c INNER JOIN address a
ON c.address_id = a.address_id
WHERE a.district <> 'Buenos Aires'
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


customer_id,first_name,last_name,district
1,Mary,Smith,Nagasaki
2,Patricia,Johnson,California
3,Linda,Williams,Attika
4,Barbara,Jones,Mandalay
5,Elizabeth,Brown,Nantou
6,Jennifer,Davis,Texas
7,Maria,Miller,Central Serbia
8,Susan,Wilson,Hamilton
9,Margaret,Moore,Masqat
10,Dorothy,Taylor,Esfahan


#### data modification using equality conditions

Equality/inequality conditions are commonly used when modifying data. For example, we can delete rows from a table using equality/inequality conditions like this:

> DELETE FROM customer
> WHERE active <> 1;

This statement will remove all non-active customer records from the customer table.

### Range Conditions

Along with checking that an expression is equal to (or not equal to) another expression, you can build conditions that check whether an expression falls within a certain range. This type of condition is common when working with numeric or temporal data. Consider the following:

In [9]:
%%sql

SELECT rental_id, customer_id, inventory_id, rental_date
FROM rental
WHERE rental_date < '2005-05-26'
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


rental_id,customer_id,inventory_id,rental_date
1,130,367,2005-05-24 22:53:30
2,459,1525,2005-05-24 22:54:33
3,408,1711,2005-05-24 23:03:39
4,333,2452,2005-05-24 23:04:41
5,222,2079,2005-05-24 23:05:21
6,549,2792,2005-05-24 23:08:07
7,269,3995,2005-05-24 23:11:53
8,239,2346,2005-05-24 23:31:46
9,126,2580,2005-05-25 00:00:40
10,399,1824,2005-05-25 00:02:21


You could also specify a upper and lower bound range for the date column like this:

In [10]:
%%sql

SELECT rental_id, customer_id, inventory_id, rental_date
FROM rental
WHERE rental_date >= '2005-05-24' 
AND rental_date < '2005-05-26'
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


rental_id,customer_id,inventory_id,rental_date
1,130,367,2005-05-24 22:53:30
2,459,1525,2005-05-24 22:54:33
3,408,1711,2005-05-24 23:03:39
4,333,2452,2005-05-24 23:04:41
5,222,2079,2005-05-24 23:05:21
6,549,2792,2005-05-24 23:08:07
7,269,3995,2005-05-24 23:11:53
8,239,2346,2005-05-24 23:31:46
9,126,2580,2005-05-25 00:00:40
10,399,1824,2005-05-25 00:02:21


#### The `between` operator

When you have both an upper and lower limit for your range, you may choose to use a single condition that utilises the `between` operator rather than using two separate conditions, as in:

In [11]:
%%sql

SELECT rental_id, customer_id, inventory_id, rental_date
FROM rental
WHERE rental_date BETWEEN '2005-05-24' AND '2005-05-26'
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


rental_id,customer_id,inventory_id,rental_date
1,130,367,2005-05-24 22:53:30
2,459,1525,2005-05-24 22:54:33
3,408,1711,2005-05-24 23:03:39
4,333,2452,2005-05-24 23:04:41
5,222,2079,2005-05-24 23:05:21
6,549,2792,2005-05-24 23:08:07
7,269,3995,2005-05-24 23:11:53
8,239,2346,2005-05-24 23:31:46
9,126,2580,2005-05-25 00:00:40
10,399,1824,2005-05-25 00:02:21


**Note:** When using the `between` operator, there are a couple of things to keep in mind. You should alaways specify the lower limit of the range first (after `between`) and the upper limit of the range second (after `and`). See what happens if you mistakenly specify the upper limit first:

In [12]:
%%sql

SELECT rental_id, customer_id, inventory_id, rental_date
FROM rental
WHERE rental_date BETWEEN '2005-05-26' AND '2005-05-24';

 * postgresql://postgres:***@localhost/dvdrental
0 rows affected.


rental_id,customer_id,inventory_id,rental_date


No data is returned from the query. This is because the server is, in effect, generating two conditions from your single condition using the `<=` and `>=` operators, as in:

In [13]:
%%sql

SELECT rental_id, customer_id, inventory_id, rental_date
FROM rental
WHERE rental_date >= '2005-05-26' 
AND rental_date <= '2005-05-24';

 * postgresql://postgres:***@localhost/dvdrental
0 rows affected.


rental_id,customer_id,inventory_id,rental_date


#### String ranges

While ranges of dates and numbers are easy to understand, you can also build conditions that search for ranges of strings, which are a bit harder to visualise. To work with string ranges, you need to know the order of the characters within your character set (e.g. 'a' is before than 'b' but after capital 'A'). For example, say you are searching for customers having a Social Security number that falls within a certain range. You would write a query statement like this:

In [14]:
%%sql

SELECT customer_id, fed_id
FROM customer_detail
WHERE fed_id BETWEEN '500-00-0000' AND '999-99-9999';

 * postgresql://postgres:***@localhost/dvdrental
5 rows affected.


customer_id,fed_id
5,555-55-5555
6,666-66-6666
7,777-77-7777
8,888-88-8888
9,999-99-9999


### Membership Conditions

In some cases, you will not be restricting an expression to a single value or range of values, but rather to a finite set of values. For example, you might want to locate all films whose rating is either 'G', 'PG', or 'PG-13':

In [15]:
%%sql

SELECT film_id, title, rating
FROM film
WHERE rating = 'G' OR rating = 'PG' OR rating = 'PG-13'
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


film_id,title,rating
98,Bright Encounters,PG-13
1,Academy Dinosaur,PG
2,Ace Goldfinger,G
4,Affair Prejudice,G
5,African Egg,G
6,Agent Truman,PG
7,Airplane Sierra,PG-13
9,Alabama Devil,PG-13
11,Alamo Videotape,G
12,Alaska Phantom,PG


However, this way of writing `where` clause will be too tedious to generate in cases where the set of expressions contained 10 or 20 members. For such situations, you can use the `in` operator instead. With the `in` operator, you can write a single condition regardless how many expressions are in the set like this:

In [16]:
%%sql

SELECT film_id, title, description, rating
FROM film
WHERE rating IN ('G', 'PG', 'PG-13')
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


film_id,title,description,rating
98,Bright Encounters,A Fateful Yarn of a Lumberjack And a Feminist who must Conquer a Student in A Jet Boat,PG-13
1,Academy Dinosaur,A Epic Drama of a Feminist And a Mad Scientist who must Battle a Teacher in The Canadian Rockies,PG
2,Ace Goldfinger,A Astounding Epistle of a Database Administrator And a Explorer who must Find a Car in Ancient China,G
4,Affair Prejudice,A Fanciful Documentary of a Frisbee And a Lumberjack who must Chase a Monkey in A Shark Tank,G
5,African Egg,A Fast-Paced Documentary of a Pastry Chef And a Dentist who must Pursue a Forensic Psychologist in The Gulf of Mexico,G
6,Agent Truman,A Intrepid Panorama of a Robot And a Boy who must Escape a Sumo Wrestler in Ancient China,PG
7,Airplane Sierra,A Touching Saga of a Hunter And a Butler who must Discover a Butler in A Jet Boat,PG-13
9,Alabama Devil,A Thoughtful Panorama of a Database Administrator And a Mad Scientist who must Outgun a Mad Scientist in A Jet Boat,PG-13
11,Alamo Videotape,A Boring Epistle of a Butler And a Cat who must Fight a Pastry Chef in A MySQL Convention,G
12,Alaska Phantom,A Fanciful Saga of a Hunter And a Pastry Chef who must Vanquish a Boy in Australia,PG


#### Using `not in`

Sometimes, instead of checking whether a particular expression exists within a set of expression, you want to see whether the expression does *not* exists. For such cases, you can use the `not in` operator:

In [17]:
%%sql

SELECT film_id, title, description, rating
FROM film
WHERE rating NOT IN ('G', 'PG', 'PG-13')
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


film_id,title,description,rating
133,Chamber Italian,A Fateful Reflection of a Moose And a Husband who must Overcome a Monkey in Nigeria,NC-17
384,Grosse Wonderful,A Epic Drama of a Cat And a Explorer who must Redeem a Moose in Australia,R
8,Airport Pollock,A Epic Tale of a Moose And a Girl who must Confront a Monkey in Ancient India,R
3,Adaptation Holes,A Astounding Reflection of a Lumberjack And a Car who must Sink a Lumberjack in A Baloon Factory,NC-17
10,Aladdin Calendar,A Action-Packed Tale of a Man And a Lumberjack who must Reach a Feminist in Ancient China,NC-17
213,Date Speed,A Touching Saga of a Composer And a Moose who must Discover a Dentist in A MySQL Convention,R
14,Alice Fantasia,A Emotional Drama of a A Shark And a Database Administrator who must Vanquish a Pioneer in Soviet Georgia,NC-17
15,Alien Center,A Brilliant Drama of a Cat And a Mad Scientist who must Battle a Feminist in A MySQL Convention,NC-17
16,Alley Evolution,A Fast-Paced Drama of a Robot And a Composer who must Battle a Astronaut in New Orleans,NC-17
17,Alone Trip,A Fast-Paced Character Study of a Composer And a Dog who must Outgun a Boat in An Abandoned Fun House,R


#### Using subqueries

Along with writing your own set of expressions, you can also use a subquery to generate a set for you on the fly like this:

In [18]:
%%sql

SELECT film_id, title, description, rental_rate
FROM film
WHERE film_id IN (SELECT film_id FROM film
   WHERE rental_rate = 4.99)
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


film_id,title,description,rental_rate
2,Ace Goldfinger,A Astounding Epistle of a Database Administrator And a Explorer who must Find a Car in Ancient China,4.99
7,Airplane Sierra,A Touching Saga of a Hunter And a Butler who must Discover a Butler in A Jet Boat,4.99
8,Airport Pollock,A Epic Tale of a Moose And a Girl who must Confront a Monkey in Ancient India,4.99
10,Aladdin Calendar,A Action-Packed Tale of a Man And a Lumberjack who must Reach a Feminist in Ancient China,4.99
13,Ali Forever,A Action-Packed Drama of a Dentist And a Crocodile who must Battle a Feminist in The Canadian Rockies,4.99
20,Amelie Hellfighters,A Boring Drama of a Woman And a Squirrel who must Conquer a Student in A Baloon,4.99
21,American Circus,A Insightful Drama of a Girl And a Astronaut who must Face a Database Administrator in A Shark Tank,4.99
28,Anthem Luke,A Touching Panorama of a Waitress And a Woman who must Outrace a Dog in An Abandoned Amusement Park,4.99
31,Apache Divine,A Awe-Inspiring Reflection of a Pastry Chef And a Teacher who must Overcome a Sumo Wrestler in A U-Boat,4.99
32,Apocalypse Flamingos,A Astounding Story of a Dog And a Squirrel who must Defeat a Woman in An Abandoned Amusement Park,4.99


The subquery fetches a set of rows, and the main query checks to see whether the value of the `film_id` can be found in the set returned from the subquery.

### Matching Conditions

So far, I have introduced you to conditions that identify an exact string, a range of strings, or a set of strings; the final condition type deals with partial string matches. You may, for example, want to find all employees whose last name begins with *T*. You could use a built-in function to strip off the first letter of the `last_name` column, as in:

In [19]:
%%sql

SELECT customer_id, first_name, last_name
FROM customer
WHERE LEFT(last_name, 1) = 'T'
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


customer_id,first_name,last_name
10,Dorothy,Taylor
12,Nancy,Thomas
17,Donna,Thompson
44,Marie,Turner
67,Kelly,Torres
128,Marjorie,Tucker
265,Jennie,Terry
311,Paul,Trout
327,Larry,Thrasher
370,Wayne,Truong


While the built-in function `left()` does the job, it doesn't give you much flexibility. Instead, you can use wildcard characters to build search expressions, as demonstrated in the following section.

#### Using wildcards

When searching for partial string matches, you might be interested in:

+ Strings beginning/ending with a certain character
+ Strings beginning/ending with a substring
+ Strings containing a certain character anywhere within the string
+ Strings containing a substring anywhere the string
+ Strings with a specific format, regardless of individual characters

You can build search expressions to identify these and many other partial string matches by using the wildcard characters as shown:

**Wild character** | **Matches**
--- | ---
`_` | Exactly one character
`%` | Any number of characters (including 0)

The underscore character takes the place of a single character, while the percent sign can take the place of a variable number of characters. When building conditions that utilise search expressions, you can use the `like` operator, as in:

In [20]:
%%sql

SELECT last_name
FROM customer
WHERE last_name LIKE '_a%t';

 * postgresql://postgres:***@localhost/dvdrental
7 rows affected.


last_name
Hart
Garrett
Barnett
Barrett
Lambert
East
Talbert


In the above example, the search expression specifies string containing an *a* in the second position  and followed by an *t* at any other position in the string (including the last position). Table below shows some more search expressions and their interpretations:

**Search Expression** | **Interpretation**
--- | ---
`F%` | Strings beginning with F
`%t` | Strings ending with t
`%bas%` | Strings containing the substring `'bas'`
`__t_` | Four-character strings with a `t` in the third position
`___-__-____` | 11-character strings with dashes in the fourth and seventh positions

The wildcard characters work fine for building simple search expressions; if your needs are a bit more sophisticated, however, you can use multiple search expressions, as demonstrated by the following:

In [21]:
%%sql

SELECT customer_id, first_name, last_name
FROM customer
WHERE last_name LIKE 'F%' OR last_name LIKE 'G%'
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


customer_id,first_name,last_name
18,Carol,Garcia
35,Virginia,Green
38,Martha,Gonzalez
69,Judy,Gray
89,Julia,Flores
93,Phyllis,Foster
94,Norma,Gonzales
98,Lillian,Griffin
102,Crystal,Ford
104,Rita,Graham


#### Using regular expressions

If you find that the wildcard characters don't cater enough flexibility for your needs, you can use regular expressions to build search expressions. PostgreSQL uses the tilde (~) operator to match a string to a regular expression. Here's what the previous query would be like using PostgreSQL implementation of regular expression:

In [22]:
%%sql

SELECT customer_id, first_name, last_name
FROM customer
WHERE last_name ~ '^[FG]'
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


customer_id,first_name,last_name
18,Carol,Garcia
35,Virginia,Green
38,Martha,Gonzalez
69,Judy,Gray
89,Julia,Flores
93,Phyllis,Foster
94,Norma,Gonzales
98,Lillian,Griffin
102,Crystal,Ford
104,Rita,Graham


***Note:*** *Many other database systems do not use the tilde (~) operator, though they would have their own form of built-in functions to support regular expressions.*

The tilde regular expression operator provides a more powerful means to pattern matching than `like` operator. Below table lists available operators for pattern matching using regular expressions in PostgreSQL:

**Operator** | **Description** | **Example**
--- | --- | ---
`~` | Matches regular expression, case sensitive | 'thomas' ~ '.*thomas.*'
`~*` | Matches regular expression, case insensitive | 'thomas' ~* '.*Thomas.*'
`!~` | Does not match regular expression, case sensitive | 'thomas' !~ '.*Thomas.*'
`!~*` |	Does not match regular expression, case insensitive | 'thomas' !~* '.*vadim.*'

### The `null` value

`null` is the absence of a value in the dataset. There can be a various reasons for having the `null` values in the database but below are typically the key reasons:

*Not applicable*: Such as the return date column for DVD rentals which have yet to be returned to the store.

*Value not yet known*: For example, the federal ID is not known at the time a customer row is created.

*Value undefined*: Such as when an account is created for a product that has not yet been added to the database.

When working with `null`, you should remember:

+ An expression can be null, but it can never *equal* null
+ Two nulls are never equal to each other

To test whether an expression is null, you need to use the `is null` operator, as demonstrated below:

In [23]:
%%sql

SELECT rental_id, inventory_id, customer_id, rental_date, return_date 
FROM rental
WHERE return_date IS NULL
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


rental_id,inventory_id,customer_id,rental_date,return_date
11496,2047,155,2006-02-14 15:16:03,
11541,2026,335,2006-02-14 15:16:03,
12101,1556,479,2006-02-14 15:16:03,
11563,1545,83,2006-02-14 15:16:03,
11577,4106,219,2006-02-14 15:16:03,
11593,817,99,2006-02-14 15:16:03,
11611,1857,192,2006-02-14 15:16:03,
11646,478,11,2006-02-14 15:16:03,
11652,1622,597,2006-02-14 15:16:03,
11657,3043,53,2006-02-14 15:16:03,


***Note:*** *pgadmin4 and other standard SQL client-tools may display null values as `[null]` in the result set; in Jupyter Notebooks, however, the null values are displayed as `None`.*

Be careful not to make this mistake like below:

In [24]:
%%sql

SELECT rental_id, inventory_id, customer_id, rental_date, return_date 
FROM rental
WHERE return_date = NULL
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
0 rows affected.


rental_id,inventory_id,customer_id,rental_date,return_date


As you can see, the query parses and executes but does not return any rows. The database server will not alert you to your error, so be careful when constructing conditions that test for `null`.

To see whether a value has been assigned to a column, you can use the `is not null` operator, as in:

In [25]:
%%sql

SELECT rental_id, inventory_id, customer_id, rental_date, return_date 
FROM rental
WHERE return_date IS NOT NULL
LIMIT 10;

 * postgresql://postgres:***@localhost/dvdrental
10 rows affected.


rental_id,inventory_id,customer_id,rental_date,return_date
2,1525,459,2005-05-24 22:54:33,2005-05-28 19:40:33
3,1711,408,2005-05-24 23:03:39,2005-06-01 22:12:39
4,2452,333,2005-05-24 23:04:41,2005-06-03 01:43:41
5,2079,222,2005-05-24 23:05:21,2005-06-02 04:33:21
6,2792,549,2005-05-24 23:08:07,2005-05-27 01:32:07
7,3995,269,2005-05-24 23:11:53,2005-05-29 20:34:53
8,2346,239,2005-05-24 23:31:46,2005-05-27 23:33:46
9,2580,126,2005-05-25 00:00:40,2005-05-28 00:22:40
10,1824,399,2005-05-25 00:02:21,2005-05-31 22:44:21
11,4443,142,2005-05-25 00:09:02,2005-06-02 20:56:02
