## Lecture objective:

The **goal** of this lecture is to get to know the following basic SQL commands:

- `SELECT` columns `FROM` table
- `LIMIT`
- `DISTINCT`
- `COUNT`
- `WHERE`
- `AND`, `OR` and `NOT`
- `ORDER BY`
- `BETWEEN`
- `IN`
- `LIKE` and `ILIKE`

and practice with **comparison operators**:
- = equal
- \> greater than
- < less than
- \>= greater than or equal to
- <= less than or equal to
- <> or != not equal to

### Libraries and function setup to perform queries

In [28]:
# Libraries
import pandas as pd
import sqlite3



cnx = sqlite3.connect('./data/jobs.db')

# Definimos la función para hacer queries.
def sql_query(query):
    return pd.read_sql(query, cnx)



### `SELECT` columns `FROM` table

Let´s take a look to our table *jobs* to see what information we have

In [29]:
query = """
SELECT * FROM jobs
"""
sql_query(query)

Unnamed: 0,Name,Surname,Country,Job,Age,Experience,Salary
0,Jennifer,Gomez,Benin,"Engineer, mining",27,9,79156
1,Julia,Garrett,Italy,Bookseller,58,1,67262
2,Courtney,Freeman,Niue,"Pharmacist, hospital",44,15,44105
3,Sheena,Faulkner,Peru,Estate manager/land agent,56,6,71914
4,Cheryl,Arnold,Pakistan,Colour technologist,27,2,90405
...,...,...,...,...,...,...,...
9995,Makayla,Miller,Kuwait,Stage manager,26,3,41031
9996,William,Underwood,Mali,Training and development officer,31,3,77919
9997,Scott,Lewis,Montenegro,International aid/development worker,40,3,84261
9998,Jacqueline,Diaz,Venezuela,Contracting civil engineer,50,10,94421


### `LIMIT`

That is great, but we only want to see the 5 first results. We can do that with LIMIT

In [30]:
query = """
SELECT * FROM jobs
LIMIT 5
"""
sql_query(query)

Unnamed: 0,Name,Surname,Country,Job,Age,Experience,Salary
0,Jennifer,Gomez,Benin,"Engineer, mining",27,9,79156
1,Julia,Garrett,Italy,Bookseller,58,1,67262
2,Courtney,Freeman,Niue,"Pharmacist, hospital",44,15,44105
3,Sheena,Faulkner,Peru,Estate manager/land agent,56,6,71914
4,Cheryl,Arnold,Pakistan,Colour technologist,27,2,90405


### `DISTINCT`

Let us see how many different unique job positions does the table contain.

In [31]:
query = """
SELECT DISTINCT Job FROM jobs
"""
sql_query(query)

Unnamed: 0,Job
0,"Engineer, mining"
1,Bookseller
2,"Pharmacist, hospital"
3,Estate manager/land agent
4,Colour technologist
...,...
634,Newspaper journalist
635,"Education officer, community"
636,Horticultural consultant
637,"Lecturer, further education"


### `COUNT`

That is great, but we only want to know how many different jobs there are.

In [32]:
query = """
SELECT COUNT( DISTINCT Job) FROM jobs
"""
sql_query(query)

Unnamed: 0,COUNT( DISTINCT Job)
0,639


### `WHERE`

Let us find those people who are from a particular country, i.e. from Spain.

In [35]:
query = """
SELECT * FROM jobs
WHERE Country = "Spain"
"""
sql_query(query)

Unnamed: 0,Name,Surname,Country,Job,Age,Experience,Salary
0,Patrick,Rogers,Spain,"Scientist, forensic",55,11,38447
1,Thomas,Garcia,Spain,Computer games developer,54,2,51497
2,Bob,Brown,Spain,Multimedia specialist,32,14,95244
3,Mary,Little,Spain,Retail manager,32,8,46793
4,Jennifer,Morgan,Spain,Astronomer,43,1,63334
5,Richard,Hampton,Spain,Operational researcher,30,4,66890
6,Jason,Boone,Spain,Local government officer,39,1,91562
7,Patricia,Thompson,Spain,Clinical research associate,60,6,69307
8,Kathleen,Fritz,Spain,"Psychotherapist, child",40,9,35535
9,Haley,Warren,Spain,Occupational psychologist,26,5,74294


### `AND`
What if we want to find only spanish citizens that are older than 50 and salary is under 50k. We can use AND command as much as we like.

In [38]:
query = """
SELECT * FROM jobs
WHERE Country = "Spain" AND Age > 50 AND Salary < 50000
"""
sql_query(query)

Unnamed: 0,Name,Surname,Country,Job,Age,Experience,Salary
0,Patrick,Rogers,Spain,"Scientist, forensic",55,11,38447
1,Dustin,Pitts,Spain,Agricultural consultant,55,5,49160


Imagine we wanted to find out spanish and french citizens. Initially we might try something like this.

In [44]:
query = """
SELECT * FROM jobs
WHERE Country = "Spain" AND Country = "France"
"""
sql_query(query)

Unnamed: 0,Name,Surname,Country,Job,Age,Experience,Salary


We can see that the query does not return any results. This is because a column cannot have two different values in the same row.

### `OR`

Lets see with if we can find an alternative.

In [46]:
query = """
SELECT * FROM jobs
WHERE Country = "Spain" OR Country = "France"
"""
sql_query(query)

Unnamed: 0,Name,Surname,Country,Job,Age,Experience,Salary
0,Charles,Medina,France,Bookseller,60,2,82257
1,Jodi,Bradford,France,Public affairs consultant,49,15,63459
2,Mary,Boyer,France,Cytogeneticist,54,3,73941
3,Patrick,Rogers,Spain,"Scientist, forensic",55,11,38447
4,Alec,Lynch,France,Sales promotion account executive,32,7,50754
...,...,...,...,...,...,...,...
65,Kristi,Juarez,France,Futures trader,39,3,47727
66,Timothy,Pittman,France,Information officer,39,13,76922
67,Erik,Everett,Spain,"Engineer, agricultural",54,8,70546
68,Angela,Robles,France,Commercial art gallery manager,41,1,70046


### `IN`

However, the code above seems a bit repetitive. Is there something a bit more cleaner?

In [47]:
query = """
SELECT * FROM jobs
WHERE Country IN("Spain","France")
"""
sql_query(query)

Unnamed: 0,Name,Surname,Country,Job,Age,Experience,Salary
0,Charles,Medina,France,Bookseller,60,2,82257
1,Jodi,Bradford,France,Public affairs consultant,49,15,63459
2,Mary,Boyer,France,Cytogeneticist,54,3,73941
3,Patrick,Rogers,Spain,"Scientist, forensic",55,11,38447
4,Alec,Lynch,France,Sales promotion account executive,32,7,50754
...,...,...,...,...,...,...,...
65,Kristi,Juarez,France,Futures trader,39,3,47727
66,Timothy,Pittman,France,Information officer,39,13,76922
67,Erik,Everett,Spain,"Engineer, agricultural",54,8,70546
68,Angela,Robles,France,Commercial art gallery manager,41,1,70046


### `ORDER BY`

Helps us sort rows based on column value, in either ascending or descending order.

- use `ASC` to sort in ascending order
- use `DESC` to sort in descending order
- If you leave it blank, `ORDER BY` uses `ASC` by default.

It is placed towards the end of the query. We want to do any selection and filtering first, before finally sorting.


Let us see who are the best paid by sorting by salary

We can combine and sort by age

In [53]:
query = """
SELECT * FROM jobs
WHERE Country IN("Spain","France")
ORDER BY salary DESC, Age ASC
"""
sql_query(query)

Unnamed: 0,Name,Surname,Country,Job,Age,Experience,Salary
0,Tony,White,Spain,Air traffic controller,28,7,99503
1,Bob,Brown,Spain,Multimedia specialist,32,14,95244
2,Emily,Lee,France,"Engineer, petroleum",40,8,92922
3,Jason,Boone,Spain,Local government officer,39,1,91562
4,John,Crawford,France,Chiropractor,42,9,87304
...,...,...,...,...,...,...,...
65,Suzanne,Anderson,France,Chief Strategy Officer,51,5,32339
66,Christopher,Robinson,France,Geophysicist/field seismologist,34,1,30672
67,Frank,Chavez,France,Applications developer,43,6,30616
68,Deanna,Bond,France,"Engineer, biomedical",41,11,30487


### `BETWEEN` operator can be used to match a value against a range of values:
- value BETWEEN low AND high

It is the same as saying:
- value \>= low AND value \<= high

Can combine with `NOT BETWEEN`:
- value \< low OR value \> high
- value NOT BETWEEN low AND high

Can also be used with dates. Note that you need to format dates in the ISO 8601 standard format, which is YYYY-MM-DD
- date BETWEEN "2007-01-01" AND "2007-02-01"

Lets put an example to see it in practice. We will select people aging between 30 and 33 (inclusive) and salaries not comprehended between 50k and 70k.

In [56]:
query = """
SELECT * FROM jobs
WHERE Age BETWEEN 30 AND 33
AND Salary NOT BETWEEN 50000 AND 70000
"""
sql_query(query)

Unnamed: 0,Name,Surname,Country,Job,Age,Experience,Salary
0,Michael,Pierce,Morocco,"Therapist, nutritional",30,6,97697
1,Mary,Foster,United States Minor Outlying Islands,Trading standards officer,30,4,80071
2,Sarah,Ashley,Latvia,Geophysicist/field seismologist,30,9,31180
3,Martin,Pacheco,Holy See (Vatican City State),Tourist information centre manager,32,3,80653
4,Curtis,French,Wallis and Futuna,"Psychotherapist, child",33,1,96881
...,...,...,...,...,...,...,...
794,Patrick,Robinson,Belgium,"Administrator, Civil Service",32,4,34669
795,Ryan,Barker,Lebanon,Graphic designer,30,14,49835
796,Nicholas,Kennedy,Dominican Republic,Materials engineer,30,7,40947
797,Amanda,Shaw,San Marino,Agricultural consultant,32,4,78405


### `LIKE` (case-sensitive) and `ILIKE` (case-insensitive) operator allows us to perform pattern matching against string data with the use of wildcard characters.

- Percent %
    - Matches any sequence of characters
- Underscore _
    - Matches any single character

Examples %:
- All names that begin with an "A"
    - `WHERE` name `LIKE` 'A%'
- All names that end with an 'a'
    - `WHERE` name `LIKE` '%a'

Examples _:
- Using the underscore allows us to replace just a single character.
    - Get all pokemon
    - WHERE name LIKE 'Char_"

- You can use multiple underscores
- Imagine we had version string codes in the format 'Version#A4', 'Version#B7', etc ...
    - WHERE value LIKE 'Version# _ _'
- We can also combine pattern matching operators to create more complex patterns
    - WHERE name LIKE '`_`her`%`'
        - `C`her`yl`
        - `T`her`esa`
        - `S`her`ri`


In [59]:
query = """
SELECT * FROM jobs
WHERE Name LIKE "_her%"

"""
sql_query(query)

Unnamed: 0,Name,Surname,Country,Job,Age,Experience,Salary
0,Cheryl,Arnold,Pakistan,Colour technologist,27,2,90405
1,Sherri,Garrison,Cayman Islands,Planning and development surveyor,46,11,66866
2,Cheryl,Richardson,Gambia,"Education officer, environmental",25,13,54225
3,Theresa,Sherman,Zimbabwe,Training and development officer,51,2,46474
4,Cheryl,Peterson,Vietnam,Paramedic,46,10,47632
...,...,...,...,...,...,...,...
58,Sherry,Smith,Paraguay,Hospital doctor,45,6,37900
59,Cheryl,Cooley,Netherlands,Comptroller,39,10,92207
60,Sherri,Herrera,Gabon,Dealer,35,2,57239
61,Sherri,James,Cote d'Ivoire,IT sales professional,47,5,62233


#### Notice, that after the second % it can be blank

In [60]:
query = """
SELECT * FROM jobs
WHERE Name LIKE "%her%"

"""
sql_query(query)

Unnamed: 0,Name,Surname,Country,Job,Age,Experience,Salary
0,Cheryl,Arnold,Pakistan,Colour technologist,27,2,90405
1,Katherine,Rodriguez,Netherlands Antilles,Contractor,42,5,88961
2,Sherri,Garrison,Cayman Islands,Planning and development surveyor,46,11,66866
3,Katherine,Gonzalez,Burkina Faso,Furniture conservator/restorer,29,2,93964
4,Katherine,White,Nigeria,Production manager,33,6,76129
...,...,...,...,...,...,...,...
373,Christopher,Turner,Gambia,Television/film/video producer,46,2,82636
374,Christopher,Roberts,Isle of Man,"Psychotherapist, child",39,8,76534
375,Christopher,Garcia,Armenia,Meteorologist,48,10,41000
376,Christopher,Dennis,Egypt,Graphic designer,31,5,39089


#### We can add a NOT in front

In [61]:
query = """
SELECT * FROM jobs
WHERE Name NOT LIKE "%her%"

"""
sql_query(query)

Unnamed: 0,Name,Surname,Country,Job,Age,Experience,Salary
0,Jennifer,Gomez,Benin,"Engineer, mining",27,9,79156
1,Julia,Garrett,Italy,Bookseller,58,1,67262
2,Courtney,Freeman,Niue,"Pharmacist, hospital",44,15,44105
3,Sheena,Faulkner,Peru,Estate manager/land agent,56,6,71914
4,Stephen,Gonzalez,Gambia,Merchant navy officer,26,4,50366
...,...,...,...,...,...,...,...
9617,Makayla,Miller,Kuwait,Stage manager,26,3,41031
9618,William,Underwood,Mali,Training and development officer,31,3,77919
9619,Scott,Lewis,Montenegro,International aid/development worker,40,3,84261
9620,Jacqueline,Diaz,Venezuela,Contracting civil engineer,50,10,94421


In [69]:
query = """
SELECT * FROM jobs
WHERE Name LIKE "B%" AND Surname NOT LIKE "M%"

"""
sql_query(query)

Unnamed: 0,Name,Surname,Country,Job,Age,Experience,Salary
0,Barbara,Hodges,Seychelles,TEFL teacher,57,8,34038
1,Brittney,Burnett,Luxembourg,Illustrator,38,9,55960
2,Brent,Hardy,Tuvalu,Consulting civil engineer,29,15,60984
3,Bradley,Page,Guyana,Optometrist,49,4,89551
4,Benjamin,Kane,Central African Republic,"Nurse, mental health",30,10,62991
...,...,...,...,...,...,...,...
446,Brandon,Harris,Kyrgyz Republic,Public relations account executive,53,9,68037
447,Brandon,Anderson,Kazakhstan,"Scientist, research (medical)",60,6,92857
448,Brian,Jones,Brazil,Office manager,47,4,97255
449,Bruce,Evans,New Caledonia,Museum education officer,39,3,90157
