# <img src="./images/SQLIcon.png?modified=23223" width=80px height=80px style="vertical-align: middle;"> The `WHERE` Clause

In [None]:
#@title ### Run the following cell to download the necessary files for this lesson { display-mode: "form" } 
#@markdown Don't worry about what's in this collapsed cell

!pip install -q ipython
print('Downloading video_player.py...')
!wget https://s3-eu-west-1.amazonaws.com/aicore-portal-public-prod-307050600709/lesson_files/193a87e2-127f-4c60-a164-1c342ab0a2d7/video_player.py -q -O video_player.py
import video_player
print('Downloading AND_chaining.mp4...')
!wget https://s3-eu-west-1.amazonaws.com/aicore-portal-public-prod-307050600709/lesson_files/193a87e2-127f-4c60-a164-1c342ab0a2d7/AND_chaining.mp4 -q -O AND_chaining.mp4


The `WHERE` statement allows you to specify the conditions which records in the database must meet, to be included in resultant set of data. The common conditionals are:

- `>, >=`: greater than and greater than equal to
- `<, <=`: less than less than or equal to
- `=`: equal to 
- `!=`: not equal to

>The result of the conditional will return a boolean, so that SQL knows whether to include the records in the final set of data. If the condition is `True` the record will be included otherwise it will be omitted.


In [None]:
SELECT title, 
       rental_rate,
       ROUND((rental_rate / rental_duration), 2) AS rental_rate_per_day, 
       (rental_rate * rental_duration) AS total_rental_cost, 
       (rental_rate * rental_duration) + replacement_cost AS total_replacement_cost
FROM 
    film
WHERE
     rental_rate > 2.49;

This query will only return records from the `film` table where the `rental_rate` of the film is greater than `2.49`. In this query we have selected the `rental_rate` column to be displayed. This isn't mandatory, we don't need to `SELECT` the column we're filtering. We could run the query as follows:

In [None]:
SELECT title, 
       ROUND((rental_rate / rental_duration), 2) AS rental_rate_per_day, 
       (rental_rate * rental_duration) AS total_rental_cost, 
       (rental_rate * rental_duration) + replacement_cost AS total_replacement_cost
FROM 
    film
WHERE
     rental_rate > 2.49;

The query above will return the same data as the first query minus the `rental_rate` column. In the SQL **order of execution**, filtering was applied before the selection of data, so we're allowed to cherry pick the columns we want after the filtering has been applied. 

There are other operations you can apply in the conditional statement allowing you to filter the data in more complex ways:

- `LIKE`: allows you to search for a specific pattern in the data
- `IN`: allows you to check if a value is one of multiple choices
- `NOT`: used to *negate* a condition, if the condition would normally return `True` then it would instead return `False`
- `AND`: allows the combination of conditions, both need to be `True` for the condition to be `True`
- `BETWEEN`: select from a range of values, `BETWEEN` is inclusive, `BETWEEN 2 AND 3` is inclusive of `2` and `3`
- `OR`: return `True` if one of the one or both of the conditions is `True`

### `LIKE`

The `LIKE` operator allows you to match patterns of data which can be useful when you don't know the exact value you're looking for. It does this with the use of two wildcards:

- `%` this represents **zero or more** of any character
- `_` represents exactly **one** character

Let's see how these can be used, say we wanted to find all actors in the `actor` table who's `first_name` started with `A`. We could do so with the following query:

In [None]:
SELECT first_name
FROM
    actor
WHERE
    first_name LIKE 'A%';

The syntax to construct the `LIKE` statement is, `value LIKE pattern`, where `pattern` is a *string pattern* to match against. In this case we have `first_name` as the value and `'A%'` as the string pattern. The string pattern should always be wrapped in **single quotes**, SQL treats **double quotes** differently. The pattern `A%` means match the first character to be `A` and all subsequent characters can be any character. 

#### Pattern matching

Let's take a look at other common ways of matching patterns using `%` and `_`:

- `%er%`: Will match any value that has `er` at any position in the word
- `%r`: Checks the last letter of a word is `r`
- `%r_`: Checks the second last letter of a word is `r`. Here `_` represents exactly one character and since it is placed directly after `r` then it would represent the last character of the word.
- `___`: Finds words containing exactly three characters, using three underscores

We can also use the keyword `NOT` to find the inverse of these matches:

- `NOT LIKE '%er%'` finds all words that don't contain `er` at any position etc.

## `AND`, `OR`, `BETWEEN` and `IN`

`AND`, `OR`, `BETWEEN` and `IN` keywords allow you to add additional complexity when filtering with the `WHERE` statement. 

### `AND`

The `AND` keyword will only return rows where both conditions specified by the `AND` clause are `True`:

In [None]:
SELECT first_name
FROM 
    actor
WHERE
    first_name LIKE `B%`
    AND first_name LIKE `%r_`;

This query would only return rows where the `first_name` of the actor started with `B`, using the condition `'B%'`. It would also filter records where the **second last** letter of their `first_name` was `r`(from the second condition `'%r_'`).

You can also chain `AND` conditions together to create more complex filtering.

In [None]:
SELECT address,
       district,
       phone
FROM
    address
WHERE 
    district LIKE '%Cal%'
    AND phone LIKE '%49'
    AND address LIKE '1%';

The above query filters the `address` table with three separate conditions that all must be `True` for the record to be returned:

- `'%Cal%'`: Looks for any `district` which has `Cal` at any place in the word
- `'%49'`: Checks for `phone_number`'s ending in the number `49`
- `'1%'`: Looks for `addresse`s which start with the number `1`

This query filtered the `address` table of size `603` down to just one record, which means the desired conditions. You can see how powerful chaining these conditions together can be to filter to a very specific set of data.

<h3 style="color: rgb(241, 90, 36)">Watch it in action</h3>


In [None]:
#@title ### Run the cell to play the video{ display-mode: "form" } 
video_player.play_video("AND_chaining.mp4")

### `OR`

The `OR` keyword will return `True` if either condition is matched and supports the chaining of multiple `OR` statements. Let's find all districts in the `address` table starting with the letter `A` or `C`:

In [None]:
SELECT address,
       district,
       phone
FROM
    address
WHERE 
    district LIKE 'A%'
    OR district LIKE 'C%';

The result is as expected, all rows where the `district` starts with `C` or `A` are returned. 

>Note the `OR` keyword doesn't mean exclusive `OR`, where the condition will return `True` if only one of the `OR` conditions is met. You can have cases where two of your `OR` statements return `True` for the same row of data. SQL deals with this by immediately returning `True` as soon as one of the `OR` conditions is met. It won't duplicate data in the case that two of the `OR` statements are correct, for example:

In [None]:
SELECT address,
       district,
       phone
FROM
    address
WHERE 
    district LIKE 'A%'
    OR district LIKE 'California%'
    OR district LIKE 'C%';

In the example above where, the `district = California` both the 2nd and 3rd `OR` statements are both `True`. Once SQL reads `OR district LIKE 'California%'` before `OR district LIKE 'C%'` it will return the row and continue to the next. Both queries return 75 rows in this case, i.e. the rows with `California` as a values aren't duplicated. 

### `IN`

`IN` works like a shorthand for multiple `OR` statements, you can check whether a value exists in a list of values. If it does, the conditional will return `True` and the row will be returned. 

In [None]:
SELECT address,
       district,
       phone
FROM
    address
WHERE 
    district IN ('Alberta', 'California', 'Hamilton');

The query above will return all rows where the `district` is equal to `Alberta`, `California` or `Hamilton`. This can be less verbose than using many different `OR` statements. The equivalent query with `OR` statements would be:

In [None]:
SELECT address,
       district,
       phone
FROM
    address
WHERE 
    district = 'Alberta'
    OR district = 'California'
    OR district = 'Hamilton';

Both will work but in most cases, using `IN` will be preferable for readability especially if we wanted to check for more values. However chained `OR` statements have the advantage of the use of multiple `LIKE` statements to retrieve data.

#### `BETWEEN`

`BETWEEN` allows you to filter by a range of values in the `WHERE` statement. For instance let's get the names of all films in the `film` table where the `rental_rate` is between `0.99` and `3.99`:


In [None]:
SELECT title AS film_title,
       rental_rate
FROM 
    film
WHERE
    rental_rate BETWEEN '0.99' AND '3.99';
ORDER BY
    rental_rate DESC
LIMIT
    5


You can also apply the `BETWEEN` statement to date ranges, for example lets retrieve all rentals taken out on the 25th, 26th and 27th of May. 

In [None]:
SELECT customer_id,
       rental_date
FROM 
    rental
WHERE
    rental_date BETWEEN '2005-05-25 00:00:00' AND '2005-05-28 00:00:00';
ORDER BY
    customer_id, rental_date

With the date range defined above `BETWEEN '2005-05-25 00:00:00' AND '2005-05-28 00:00:00'`. `'2005-05-25 00:00:00'` will be the start of the day the 25th of May to `2005-05-28 00:00:00` which will end at the start of the day on the 28th. This will get all dates in the range 25th to 27th of May.

## Key Takeaways

- The `WHERE` clause has many amazing options to help filter your data
- If you need to filter data by text strings, then using the `LIKE` keyword with pattern matching is powerful
- Using the **logical operators** allows you to filter for very specific records
- If you need to filter by a range of values in your table use the `BETWEEN` keyword