# SQL querying and selecting data (exercises)

## Preparation

For this section you need `chinook.db` database file and working `%sql` magic.  
If you don't have it, please go back to the [previous section](connect_to_database.ipynb) and follow the instructions.  
The following code should not produce any errors:

In [5]:
%load_ext sql
%sql sqlite:///chinook.db
%sql SELECT * FROM tracks LIMIT 5

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


TrackId,Name,AlbumId,MediaTypeId,GenreId,Composer,Milliseconds,Bytes,UnitPrice
1,For Those About To Rock (We Salute You),1,1,1,"Angus Young, Malcolm Young, Brian Johnson",343719,11170334,0.99
2,Balls to the Wall,2,2,1,,342562,5510424,0.99
3,Fast As a Shark,3,2,1,"F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman",230619,3990994,0.99
4,Restless and Wild,3,2,1,"F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman",252051,4331779,0.99
5,Princess of the Dawn,3,2,1,Deaffy & R.A. Smith-Diesel,375418,6290521,0.99


In [19]:
%%sql
SELECT 
    name
FROM 
    sqlite_master 
WHERE 
    type ='table' AND 
    name NOT LIKE 'sqlite_%';

name
albums
artists
customers
employees
genres
invoices
invoice_items
media_types
playlists
playlist_track


In [18]:
# %sql SELECT name FROM sys.tables

RuntimeError: If using snippets, you may pass the --with argument explicitly.
For more details please refer: https://jupysql.ploomber.io/en/latest/compose.html#with-argument


Original error message from DB driver:
(sqlite3.OperationalError) no such table: sys.tables
[SQL: SELECT name FROM sys.tables]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

If you need help solving this issue, send us a message: https://ploomber.io/community


## Exercise: biggest tracks

Print (select) the top 10 biggest `tracks` according to size in `Bytes` column.

In [9]:
%%sql
SELECT *
  FROM tracks
  ORDER BY Bytes DESC
  LIMIT 10
  

TrackId,Name,AlbumId,MediaTypeId,GenreId,Composer,Milliseconds,Bytes,UnitPrice
3224,Through a Looking Glass,229,3,21,,5088838,1059546140,1.99
2820,Occupation / Precipice,227,3,19,,5286953,1054423946,1.99
3236,The Young Lords,253,3,20,,2863571,587051735,1.99
3242,The Man With Nine Lives,253,3,20,,2956998,577829804,1.99
2910,Dave,231,3,19,,2825166,574325829,1.99
3235,The Magnificent Warriors,253,3,20,,2924716,570152232,1.99
3231,The Lost Warrior,253,3,20,,2920045,558872190,1.99
2902,Maternity Leave,231,3,21,,2780416,555244214,1.99
3228,"Battlestar Galactica, Pt. 3",253,3,20,,2927802,554509033,1.99
2832,The Woman King,227,3,18,,2626376,552893447,1.99


## Exercise: simple filtering

Write statements to get `tracks` with: the `AlbumId` equal to `1` and the `Bytes` length greater than 200,000 milliseconds.

In [24]:
%%sql
SELECT *
  FROM tracks
  WHERE Milliseconds > 200000 AND AlbumId = 1
  LIMIT 5

TrackId,Name,AlbumId,MediaTypeId,GenreId,Composer,Milliseconds,Bytes,UnitPrice
1,For Those About To Rock (We Salute You),1,1,1,"Angus Young, Malcolm Young, Brian Johnson",343719,11170334,0.99
6,Put The Finger On You,1,1,1,"Angus Young, Malcolm Young, Brian Johnson",205662,6713451,0.99
7,Let's Get It Up,1,1,1,"Angus Young, Malcolm Young, Brian Johnson",233926,7636561,0.99
8,Inject The Venom,1,1,1,"Angus Young, Malcolm Young, Brian Johnson",210834,6852860,0.99
9,Snowballed,1,1,1,"Angus Young, Malcolm Young, Brian Johnson",203102,6599424,0.99


## Exercise: filter with `IN`

Return `customers` from `State` of `FL` (Florida), `WA` (Washington), `CA` (California).  
Use `IN`, not `AND`.

In [22]:
%%sql 
SELECT * 
FROM customers 
WHERE State IN ("FL", "WA", "CA")


CustomerId,FirstName,LastName,Company,Address,City,State,Country,PostalCode,Phone,Fax,Email,SupportRepId
16,Frank,Harris,Google Inc.,1600 Amphitheatre Parkway,Mountain View,CA,USA,94043-1351,+1 (650) 253-0000,+1 (650) 253-0000,fharris@google.com,4
17,Jack,Smith,Microsoft Corporation,1 Microsoft Way,Redmond,WA,USA,98052-8300,+1 (425) 882-8080,+1 (425) 882-8081,jacksmith@microsoft.com,5
19,Tim,Goyer,Apple Inc.,1 Infinite Loop,Cupertino,CA,USA,95014,+1 (408) 996-1010,+1 (408) 996-1011,tgoyer@apple.com,3
20,Dan,Miller,,541 Del Medio Avenue,Mountain View,CA,USA,94040-111,+1 (650) 644-3358,,dmiller@comcast.com,4
22,Heather,Leacock,,120 S Orange Ave,Orlando,FL,USA,32801,+1 (407) 999-7788,,hleacock@gmail.com,4


## Exercise: filter for numbers in range

Find `invoices` whose `Total` is between 14.96 and 18.86. Use `BETWEEN`.  
Sort the output with increasing `Total`. Show only these columns: `InvoiceId`, `BillingAddress`, `Total`.

In [31]:
%sql SELECT * FROM invoices 
%sql SELECT InvoiceId, BillingAddress, Total FROM invoices WHERE Total BETWEEN 14.96 AND 18.86

InvoiceId,BillingAddress,Total
88,"Calle Lira, 198",17.91
89,"Rotenturmstraße 4, 1010 Innere Stadt",18.86
103,162 E Superior Street,15.86
201,319 N. Frances Street,18.86
208,Ullevålsveien 14,15.86
306,Klanova 9/506,16.86
313,"68, Rue Jouvence",16.86


## Exercise: filter partially matching words

Find the `tracks` whose `Name`s contain a substring: `Br` (two letters), one letter, `wn` (two letters).

In [36]:

%%sql
SELECT *
FROM tracks
WHERE Name LIKE '%BR%' OR '%wn%'

TrackId,Name,AlbumId,MediaTypeId,GenreId,Composer,Milliseconds,Bytes,UnitPrice
12,Breaking The Rules,1,1,1,"Angus Young, Malcolm Young, Brian Johnson",263288,8596840,0.99
95,Bring'em Back Alive,10,1,1,Audioslave/Chris Cornell,329534,7911634,0.99
230,"Bye, Bye Brasil",23,1,7,,283402,9499590,0.99
239,Brejo Da Cruz,23,1,7,,214099,7270749,0.99
256,Sobremesa,24,1,7,Chico Science,240091,7960868,0.99
283,A Sombra Da Maldade,26,1,8,Da Gama/Toni Garrido,230922,7697230,0.99
301,A Sombra Da Maldade,27,1,8,Da Gama/Toni Garrido,285231,9544383,0.99
326,"Tapa Aqui, Descobre Ali",29,1,9,Paulo Levi/W. Rangel,188630,6327391,0.99
329,Garotas do Brasil,29,1,9,"Garay, Ricardo Engels/Luca Predabom/Ludwig, Carlos Henrique/Maurício Vieira",210155,6973625,0.99
339,Communication Breakdown,30,1,1,Jimmy Page/John Bonham/John Paul Jones,192653,6287257,0.99


## Exercise: filtering missing values

Find the `customers` who do not have phone numbers. In the result show only the name and the (missing) phone number.

In [38]:
%%sql 
SELECT FirstName,LastName,Phone
FROM customers 
WHERE Phone IS NULL

FirstName,LastName,Phone
Ladislav,Kovács,


## Exercise: from the database to a Python list

Create a Python variable `bs` to be a list containing all `tracks` sizes as provided in the `Bytes` column.  
Print the `type` of the `bs` variable. Print the first 10 elements of `bs`.

In [56]:
res = %sql SELECT Bytes FROM tracks
# bs = res.DataFrame()
bs = list(res)
print(bs[:10])
# print(type(bs[0]))
# for i in bs[:10]:
#     print(i[0])
print(bs[1])
bs2 = [x[0] for x in bs[:10]]
bs2

[(11170334,), (5510424,), (3990994,), (4331779,), (6290521,), (6713451,), (7636561,), (6852860,), (6599424,), (8611245,)]
(5510424,)


[11170334,
 5510424,
 3990994,
 4331779,
 6290521,
 6713451,
 7636561,
 6852860,
 6599424,
 8611245]

## Exercise: from the database to a Pandas data frame

Create a Python variable `df` to be a Pandas `DataFrame` with two columns corresponding to `Milliseconds` and `Bytes` columns of the `tracks` table. Print `df`.  
You will likely need to:
- Import `pandas` package.
- Use `read_sql` function from `pandas`.
- Create a separate connection `engine` with `creeate_engine`.

In [59]:
# %sql SELECT Milliseconds, Bytes FROM tracks

In [63]:
import pandas as pd
import sqlalchemy as sa
engine = sa.create_engine("sqlite:///chinook.db")
conn = engine.connect()
df = pd.read_sql(sa.text('SELECT Milliseconds, Bytes FROM tracks'), conn)
df

Unnamed: 0,Milliseconds,Bytes
0,343719,11170334
1,342562,5510424
2,230619,3990994
3,252051,4331779
4,375418,6290521
...,...,...
3498,286741,4718950
3499,139200,2283131
3500,66639,1189062
3501,221331,3665114
