# Introduction to SQL - Part B

In this course, we will learn powerful queries that will allow us to filter and sort our data. This time we will use *Google Big Query's* SQL editor to access an already prepared database. This database is called **IMDB**. It is a list of all the movies 🎥 rated by IMDB that contains 5,043 rows, each representing a different movie.

## What you will learn in this course 🧐🧐

* Filter your results with `WHERE`
* Add conditions with `AND` and `OR`
* Get unique results with `DISTINCT`
* Create intervals with `BETWEEN`
* Detect specific patterns with `ILIKE`
* Sort your results with `ORDER BY`


## WHERE


```sql
SELECT movie_title

FROM IMDB.movies

WHERE imdb_score > 8.5;
```

This statement filters your results. In the example above, it will only output movies that have an IMDB score **greater than 8.5** 💯. Let's look at it line by line.

1. `WHERE` is a query that indicates that you want to filter your results under certain conditions. Only rows that meet these conditions will be shown.  

2. `imdb_score > 8.5` represents the condition in the imdb_score column. Only rows with a number greater than 8.5 will be returned.

3. The symbol (`>`) is a comparator. This operation sign creates a logical comparison which can be either TRUE or FALSE. Here are the most common signs:

* `=` equals

* `!=` different

* `>` strictly superior

* `<` strictly inferior

* `>=` greater or equal

* `<=` less or equal

*Exercise*: Find all the movies that have an `imdb_score` less than or equal to 1.5 👎.

## AND

`AND` is a logical operator that allows you to add several conditions at once. For example, if you want to see only movies that were released in 1984 and have a score higher than 8, here is the code:  

```sql
SELECT movie_title 

FROM IMDB.movies

WHERE imdb_score > 8

AND title_year = 1984;
```

*Exercise*: Find all movies with a `budget` of more than *$50,000,000*, `country` equal to `"USA"` and `imdb_score` less than *3*.


## OR

OR is a logical operator that includes all lines that meet at least one of the conditions of your statement. It is therefore less strict than `AND`.


```sql
SELECT movie_title

FROM IMDB.movies

WHERE genres = 'Comedy'

OR title_year < 1980;
```

In the case of the top, `OR` evaluates each of the conditions separately. If at least one of them is true, then the corresponding line will appear in your result.

You can have several `AND`s and `OR`s in the same statement, so you will need to group them in parentheses.


```sql
SELECT movie_title

FROM IMDB.movies

WHERE genres = 'Horror'

OR ( title_year > 1980 AND title_year < 1990);
```

*Exercise*: Using `OR` and `AND`, find all the movies that were produced in North Korea that have a score above 7 but below 8.

## DISTINCT

**DISTINCT** allows you to select multiples informations from one or more columns. You will have only unique values in the output. `DISTINCT` is always use with the `SELECT`command.

```sql
SELECT DISTINCT genres

FROM IMDB.movies

WHERE budget > 1000000000;
```

1. `SELECT DISTINCT genres` will return unique genres.

2. `WHERE budget > 1000000000;` condition to filter on the budget. Here it is 1 billion.

3. The output will be unique genre who have a budget strictly superior to 1B$

4. As you can see you can had conditon, in this example we have used the `WHERE` condition. But you can use `SELECT DISTINCT` alone.


*Exercise*: On an another table (`IMDB.directors`) find all directors who have a film with **director_facebook_likes > 100**


## BETWEEN

**BETWEEN** is a special operator used with the `WHERE` query. It allows you to create filters by intervals of numbers, letters, or dates.

```sql
SELECT movie_title

FROM IMDB.movies

WHERE movie_title BETWEEN "A" AND "J";
```

```sql
SELECT movie_title

FROM IMDB.movies

WHERE title_year BETWEEN 1990 AND 2000;
```

1. `WHERE movie_title BETWEEN 'A' AND 'J'` will output all movies that start with the letter *A* through the letter *J*.

2. Similarly `WHERE year BETWEEN 1990 AND 2000` will output all movies that were released between *1990* and *2000* included.

*Exercise*: Find all movies that were released between *1988* and *1990* AND have an `imdb_score` between *4* and *5*.


## LIKE (ILIKE)

`LIKE` (or `ILIKE` depending on your DBMS) is used to compare similar values, mostly from *Text*.

`LIKE` looks for a pattern in each of the columns you specify, using the `%` sign.

For example, the following statement filters all results by movies that have a title starting with the letter Z :

```sql
SELECT movie_title

FROM IMDB.movies

WHERE movie_title ILIKE 'Z%';
```

1. In fact, `Z%` will select all the strings in the `"movie_title"` column that begin with *Z*, that is, all the movies whose names begin with the letter *Z*, regardless of the rest of the word.
2. Conversely, `%a`, will select all the strings in the chosen column that end with a, in this example all the movies whose names end with the letter a, regardless of the beginning of the word.
3. Finally, `%b%`, which will select all the strings in the chosen column that contain a "b" in the title. Here is an example:


```sql
SELECT movie_title

FROM IMDB.movies

WHERE movie_title LIKE '%woman%';
```

Here, the console will return a list of movies whose name contains `"woman"`, for example `"catwoman"` 🐈 or `"wonderwoman"` 🛡️.

4. NB: **LIKE is case sensitive**: `"%WOMAN%"` in the same instruction would have returned different results

Exercise: Using `LIKE`, find all the movies that begin with the letter `"j"` OR contain the word `"man"`.


## ORDER BY

`ORDER BY` is a clause that allows the results of an instruction to be sorted, which often allows them to be analysed more quickly and easily. Let's look at the statement below:

```sql
SELECT movie_title

FROM IMDB.movies

ORDER BY imdb_score DESC;
```



1. `ORDER BY` sorts the results alphabetically or numerically (automatically according to the type of column).
2. `imdb_score` is the name of the column on which you want to sort.
3. `DESC` is the contraction of `DESCENDING`, which means that this sorting will be done in descending order (from largest to smallest OR from Z to A). In our example, the highest scores will be placed first.
4. You can also sort in ascending order with `ASC` (ASCENDING contraction). This time, your results will be sorted from smallest to largest for numeric data, or from A to Z for strings.
5. You can also limit the number of results you want your console to return with the `LIMIT` clause. Here is an example:


```sql
SELECT movie_title 

FROM IMDB.movies

ORDER BY imdb_score ASC

LIMIT 3;
```

This instruction will return the titles of the first three films with the lowest score in our database.

*Exercise*: Using `ORDER BY`, find the movie with the lowest `budget`.


## AS

Finally, `AS` allows you to rename your column. This can be useful when your columns have names that are not very explicit or too technical.


```sql
SELECT movie_title AS Title, title_year AS Year

FROM IMDB.movies

WHERE movie_title LIKE '%woman%'

AND title_year > 1935;
```

In the top case, we renamed the movie_title column: `Title` and the `title_year` column: `Year`.

Aliases are also very often used in Google Big Query to join tables (`JOIN`) and include aggregation functions in some clauses that `ORDER BY`, `GROUP BY` and `HAVING`. We will see this in the next courses.

⚠️ with `AS`, these columns are not permanently renamed, the columns will keep their original names _movie_title_ and _title_year_ in your database. The new names will only appear when you launch your query. ⚠️

## Resources 📚📚

Standard SQL Reference for Google Big Query - [https://bit.ly/2JltrZ3](https://bit.ly/2JltrZ3)

Working with Aliases - [https://bit.ly/2JxRQ0Q](https://bit.ly/2JxRQ0Q)

More Complex Queries With Andor - [http://bit.ly/2gX7BBy](https://www.khanacademy.org/computing/computer-programming/sql/more-advanced-sql-queries/p/more-complex-queries-with-andor)

Using Booleans and Relational Operators - [http://bit.ly/2hkjsqp](http://bit.ly/2hkjsqp)