# SQL querying and selecting data

## Preparation

For this section you need `chinook.db` database file and working `%sql` magic.  
If you don't have it, please go back to the [previous section](connect_to_database.ipynb) and follow the instructions.  
The following code should not produce any errors:

In [None]:
%load_ext sql
%sql sqlite:///chinook.db

## `SELECT` - querying the database

### Selecting some/all columns and their order

Write the 1st query to retrieve some data of several (first five) customers from this database:
- after `SELECT` you write the name(s) of the column(s) that we want to retireve
- after `FROM` you write the name of the table
- the optional part `LIMIT` allows to specify how many rows to show

In [None]:
%%sql
SELECT FirstName, LastName
  FROM customers 
  LIMIT 5

We can retrieve all columns from the table using `*` instead of the column names:

In [None]:
%%sql
SELECT * 
  FROM customers 
  LIMIT 5

The order of the names of the column define the order of columns in the table. It is also possible to directly perform arithmetic operations:

In [None]:
%%sql
SELECT TrackId, Name, UnitPrice + 10
  FROM tracks
  LIMIT 5

### `LIMIT`- limiting the number of returned rows

Simple limit of the records returned from the query:

In [None]:
%%sql
SELECT TrackId, Name
  FROM tracks
  LIMIT 3

*Note:* if you want to get the first 10 rows starting from the 10th row of the result set, you use `OFFSET` keyword:

In [None]:
%%sql
SELECT TrackId, Name
  FROM tracks
  LIMIT 3 OFFSET 10

### `AS` - renaming columns

To provide an own name to a column use the `AS` keyword. Put the new name in quotes:

In [None]:
%%sql
SELECT TrackId, Name, UnitPrice, UnitPrice + 10 AS 'NewUnitPrice'
  FROM tracks
  LIMIT 5

### `ORDER` - sorting rows

With `ORDER BY` you define the sorting order. Additional keywords:
- The `ASC` keyword means ascending (default, when you don't specify).
- The `DESC` keyword means descending.

In [None]:
%%sql
SELECT Name, Milliseconds, AlbumId
  FROM tracks
  ORDER BY AlbumId DESC
  LIMIT 10

### `DISTINCT` - select unique rows (remove duplicated rows)

With `DISTINCT` you force duplicate rows to be removed from the query result. Compare the following two queries:

In [None]:
%%sql 
SELECT City
  FROM customers
  LIMIT 10

In [None]:
%%sql 
SELECT DISTINCT City
  FROM customers
  LIMIT 10

### `WHERE` - selecting rows by a condition

#### Relational operators

Let's filter all tracks for which: `millisconds > 300000`:

In [None]:
%%sql
SELECT TrackId, Milliseconds
  FROM tracks
  WHERE Milliseconds > 300000
  LIMIT 5

SQL uses the following relational operators: `>`, `>=`, `<`, `<=`, `=` (equality), `!=` or `<>` (both inequality).  
Let's find customers from Prague:

In [None]:
%%sql
SELECT FirstName, LastName, City 
  FROM customers
  WHERE City = 'Prague'

#### `OR`, `AND`, `NOT` - Logical operators

Understand the following examples of `OR`, `AND`, `NOT`:

In [None]:
%%sql
SELECT FirstName, Country
  FROM customers 
  WHERE Country = "Netherlands" OR Country = "Germany"
  LIMIT 5

In [None]:
%%sql
SELECT FirstName, Country
  FROM customers 
  WHERE NOT( Country = "Netherlands" OR Country = "Germany" )
  LIMIT 5

In [None]:
%%sql
SELECT *
  FROM invoice_items
  WHERE InvoiceId = 26 AND TrackId > 850

#### `IS NULL` - Value is missing

The following statement attempts to find tracks whose composers are NULL: `IS NULL`.  

To find the tracks whose composers are not `NULL`, use: `IS NOT NULL`.

In [None]:
%%sql
SELECT Name, Composer
  FROM tracks
  WHERE Composer IS NULL
  LIMIT 5

#### `IN` - Set membership (for categorical variables)

Compare the following two notations to test whether a value belongs to a set.  
The `OR` notation works only with a fixed set of values and does not scale well:

In [None]:
%%sql
SELECT *
  FROM customers
  WHERE country = "Brazil" OR country = "Finland" OR country = "Poland" OR country = "Spain"

The `IN` notation might use a directly written list but also a result of a subquery (not shown here).

In [None]:
%%sql
SELECT *
  FROM customers
  WHERE country IN ("Brazil", "Finland", "Poland", "Spain")

#### `BETWEEN` - Value in range (for numerical variables)

Use `BETWEEN` (and `NOT BETWEEN`) to find whether a value is in (or out) a certain range.

How to find invoices whose invoice dates are from January 1 2010 and January 31 2010?

In [None]:
%%sql
SELECT InvoiceId, BillingAddress, InvoiceDate, Total
  FROM invoices
  WHERE InvoiceDate BETWEEN '2010-01-01' AND '2010-01-31'
  ORDER BY InvoiceDate

#### `LIKE` - Value matches a pattern (for text variables)

Sometimes, you don’t know exactly the complete keyword that you want to query. For example, you may know that your most favorite song contains the word `elevator` but you don’t know exactly the name.

1) To find the tracks whose names start with the `Wild` string, you use the percent sign `%` wildcard at the end of the pattern.

2) To find the tracks whose names end with `Wild` word, you use `%` wildcard at the beginning of the pattern.

3) To find the tracks whose names contain the `Wild` literal string, you use `%` wildcard at the beginning and end of the pattern:

In [None]:
%%sql
SELECT TrackId, Name
  FROM tracks
  WHERE Name LIKE 'Wild%'

Get track name by exact number of charchters and finish by `y`:

In [None]:
%%sql
SELECT Trackid, Name
  FROM tracks
  WHERE Name LIKE '___y'