<a href="https://colab.research.google.com/github/CompPsychology/psych290_colab_public/blob/main/notebooks/week-01/W1_Tutorial_01A_SQL_Intro_(album).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# W1 Tutorial 1A -- Introduction to SQL (DB: album) (2025-03)

(c) Johannes Eichstaedt & the World Well-Being Project, 2023.

✋🏻✋🏻 NOTE - You need to create a copy of this notebook before you work through it. Click on "Save a copy in Drive" option in the File menu, and safe it to your Google Drive.

Welcome to our first tutorial. Here we will learn the basics of SQL, the Structured Query Language to read, manipulate, summarize and write from databases. Below is a table of contents.

**FYI:** you can execute a cell by hitting `CTRL+Enter` (Win) or `Command+Enter` (Mac).   
`Shift+Enter` or `Command+Enter` will execute + advance to the cell below.

Please execute every cell as you go along.

**FYI:**
* 🤓🤓🤓 comparisons with the tidyverse are flagged with the triple nerd  
* 🐬🐬🐬 when there is code that runs in MySQL but not in SQLite, this is marked with the triple dolphin

## 1) Setting up Colab with DLATK and SQLite

This tutorial begins by setting up DLATK in the Colab environment. The next couple of subsections do this for you.

You don't need to understand or follow along with the code -- it uses git (as in GitHub) to pull the code fromt the cloud and install it (with `pip`).

This will take ~1.5 to 2 minutes. If colab asks you about this not being authored by Google, say "Run anyway."

### 1a) Install packages

In [None]:
#We first install the necessary packages and then download the dataset.

#This cell does it for you.

# installing DLATK and necessary packages
!git clone -b psych290 https://github.com/dlatk/dlatk.git
!pip install dlatk/
!pip install jupysql

!git clone https://github.com/CompPsychology/album.git

Cloning into 'dlatk'...
remote: Enumerating objects: 6975, done.[K
remote: Counting objects: 100% (1063/1063), done.[K
remote: Compressing objects: 100% (138/138), done.[K
remote: Total 6975 (delta 987), reused 930 (delta 925), pack-reused 5912 (from 1)[K
Receiving objects: 100% (6975/6975), 62.36 MiB | 18.09 MiB/s, done.
Resolving deltas: 100% (4940/4940), done.
Processing ./dlatk
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: dlatk
  Building wheel for dlatk (setup.py) ... [?25l[?25hdone
  Created wheel for dlatk: filename=dlatk-1.3.1-py3-none-any.whl size=35635829 sha256=a2e740f9c363d6361f33b873888db9ba0d632e1c38be621c90f398a1ff30a314
  Stored in directory: /tmp/pip-ephem-wheel-cache-a8j7whex/wheels/cc/c9/65/e1ecc64bac68518c07b286fe86921aa938e11a0c3a87d8ff93
Successfully built dlatk
Installing collected packages: dlatk
Successfully installed dlatk-1.3.1
Collecting jupysql
  Downloading jupysql-0.11.1-py3-none-any.whl.metadata (5.9 

### 1b) Download data and insert into SQLite database

In [None]:
# this download the album csvs we need for this tutorial
!git clone https://github.com/CompPsychology/album.git

fatal: destination path 'album' already exists and is not an empty directory.


Now that you have set up Colab, create a `username` variable which we use to name the database for the rest of the tutorial.

In [None]:
username = "your_name"

If you want to read out any Python, you can just enter it in a code cell -- for example:

In [None]:
username

'your_name'

We then load the downloaded data into a database named [username].db in the sqlite_data folder.

In [None]:
# load the required package -- similar to library() function in R
import os
from dlatk.tools.importmethods import csvToSQLite

# store the complete path to the database -- sqlite_data/[username].db
database = os.path.join("sqlite_data", username)

# import CSVs into tables in this database
csvToSQLite(
    "album/data/album.csv",
    database,
    "album"
)

csvToSQLite(
    "album/data/track.csv",
    database,
    "track"
)

Importing data, reading album/data/album.csv file
Reading remaining 7 rows into the table...
Importing data, reading album/data/track.csv file
Reading remaining 63 rows into the table...


SQL Query: CREATE TABLE album (id INT, title VARCHAR(31), artist VARCHAR(63), label VARCHAR(15), released VARCHAR(15));
SQL Query: CREATE TABLE track (id INT, album_id INT, title VARCHAR(63), track_number INT, duration INT);


### 1c) Setup database connection

Finally, we establish a connection with the (SQLite) database with the `%sql` extension from colab.

In [None]:
# loads the %%sql extension
%load_ext sql

# connects the extension to the database
from sqlalchemy import create_engine
engine = create_engine(f"sqlite:///sqlite_data/{username}.db?charset=utf8mb4")
%sql engine

#set the output limit to 50
%config SqlMagic.displaylimit = 50

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


## 2) Introduction to SQL <a class="anchor" id="intro"></a>

**CRUD** refers to the four necessary functions to implement a storage application: **create, read, update and delete**.

In fact, **Read** in **C.R.U.D** is the most common set of operations performed on a database in general. That's what we'll start with.

Lets learn how to read out meta-data about our tables and such.

### 2a) Listing tables (MySQL: `SHOW` statement)

Once we set up in a database, our next step is usually is to list the tables inside. The 🐬🐬🐬 **SHOW** statement does that for us in MySQL.

```
🐬🐬🐬
SHOW tables
🐬🐬🐬
```

Since we are using SQLite we can do the same things with the `tables` command. Let's list the tables in the `Username` database that we are connected to.

**NOTE:** `%sqlcmd` is a Colab extension that allows us to run such meta-data commands.

In [None]:
%sqlcmd tables

Name
album
track


For slightly neater output

In [None]:
result = %sqlcmd tables
# the print function in Python will make the output prettier
print(result)

+-------+
|  Name |
+-------+
| album |
| track |
+-------+


### 2b) FYI only: showing and selecting databases

In this notebook, we use SQLite to connect to a specified database -- as we do at the very top of the notebook, in this command:

```
engine = create_engine(f"sqlite:///sqlite_data/{username}.db?charset=utf8mb4")
```

However, in other flavours of SQL, say MySQL, you can specify one of potentially many databases on the fly.

The **USE** statement specifies the database you intend to use.

```
🐬🐬🐬
USE [database]
🐬🐬🐬
```

In MySQL, we can use **SHOW** to show all databases we have access to.

```
🐬🐬🐬
SHOW databases
🐬🐬🐬
```

### 2c) Reading table meta-data

The goal is to see all the columns in a table, and their datatypes. In case of MySQL the 🐬🐬🐬 **DESCRIBE** statement does this:

 ```
🐬🐬🐬
DESCRIBE [table]
🐬🐬🐬
```


SQLite requires us to use a workaround. The command is called

```
PRAGMA table_info([TABLE_NAME])`.
```

`PRAGMA` is a SQLite statement that allows us to run meta commands like `table_info()`.

**NOTE**
* SQLite and MySQL commands are case insensitive, however we maintain uppercase throughout this tutorial so that we can make a habit of it as case insensitivity is not consistent across types of SQL (e.g., MySQL vs. Microsoft's T-SQL), and also help to read the code more easily.

Let's run the PRAGMA command within a %%SQL block:

In [None]:
%%sql

PRAGMA table_info(album);

cid,name,type,notnull,dflt_value,pk
0,id,INT,0,,0
1,title,VARCHAR(31),0,,0
2,artist,VARCHAR(63),0,,0
3,label,VARCHAR(15),0,,0
4,released,VARCHAR(15),0,,0


#### 👩‍🔬💻 Exercise

In the field below, can you show the columns from the track table?

In [None]:
%%sql



cid,name,type,notnull,dflt_value,pk
0,id,INT,0,,0
1,album_id,INT,0,,0
2,title,VARCHAR(63),0,,0
3,track_number,INT,0,,0
4,duration,INT,0,,0


#### The tables for this tutorial: `album` and `track`

Ok, so we will work two main tables in the rest of this tutorial -- an `album` table that describes albums, and a `track` table that shows the tracks in those albums.

As shown in the results of the above queries, `album` and `track` are the two tables within the `album` database. These tables are connected through a primary key - foreign key relationship, i.e; both tables have `id` as their "primary keys" (these are different) and `album_id` is the "foreign key" in `track` (i.e., it corresponds to `id` in the album table).

This is database jargon you don't totally have to know.

Basically, what we are saying is that `track.album_id` = `album.id`. If you wanted to merge the tables, this would be how.

But now is a good moment to talk about the **Type** column, this column encodes the data type of the colum.

## 3) Data Types

🤓🤓🤓 R equivalent: character, numeric, date, etc

MySQL supports numerous data types . However, we will discuss only the ones that we need for this course. You can always explore the additional data types supported by MySQL at https://dev.mysql.com/doc/refman/8.0/en/data-types.html.

### 3a) Numeric Data Types: INT and DOUBLE <a class="anchor" id="ndt"></a>

MySQL supports integer data types like **INTEGER** (or **INT**) that occupies **4** bytes, and floating point data type **DOUBLE** that stores real numbers in **8** bytes.

Integer are just numbers like 0, 1, 2...., doubles are all real numbers within some precision (e.g., 10256.4586576)

Basically, if you want to just save whole numbers, you use **INT**, if you want to record something like this 3.14, use **DOUBLE**.

### 3b) String Data Types: VARCHAR and TEXT

**VARCHAR** is the most used string data type in SQL, declared as VARCHAR(n) where n is the maximum number of characters you want to store.

**TEXT** -- SQLite useS the **TEXT** which can hold strings from 1 byte to even GBs. In this case, we need not specify the length of the character and it also offers additional features when compared to 🐬🐬🐬 **VARCHAR**.

**VARCHAR**s can have normal indices on them, **TEXT** fields only **FULLTEXT** indices.

This makes a difference when you have big datasets. Basically, if at all possible, you want your text to be stored in **VARCHAR** fields of a certain maximum length, if you know what that is. But don't fret it.  

**Note:** SQLite supports all the above datatypes. You can read more about them at [https://www.sqlite.org/datatype3.html](https://www.sqlite.org/datatype3.html).

Let's look at what is contained IN the columns, not just AT the column meta data.

**VARCHAR** is the most used string data type in MySQL, declared as **VARCHAR(n)** where **n** is the maximum number of characters you want to store.

## 4) SELECT Statement <a class="anchor" id="select"></a>

**SELECT** forms the backbone of the commands that we will execute in the course. **SELECT** is used to access particular columns, and then you can add all sorts of extras to filter and process the output further.

80%+ of all SQL commands start with SELECT.

The basic structure of **SELECT** statement is as shown below.

> **SELECT** **[** COLUMN1 **[** **AS** C1**]** **]**, **[** COLUMN2 **[** **AS** C2**]** **]**, ... <br/>
> **FROM** **[** TABLE1 **[** **AS** T1**]** **]**, **[** TABLE2 **[** **AS** T2**]** **]**, ... <br/>
> **WHERE** **[** CONDITION1 **]** **[** AND **|** OR **]** **[** CONDITION2 **]** ... <br/>

We can retrieve certain columns of a tables by simply listing the required column names after the **SELECT** keyword. Let's try listing all the labels that produce albums from the *album* table.

**NOTE:** The below query can be written in a single line, However, we separate the logical units of the query by dividing it into multiple lines so it's more readable.


In [None]:
%%sql

SELECT label
FROM album;

label
Blue Note
Polydor
Parlophone
Columbia
Columbia
DiscReet
Columbia


🤓🤓🤓 in the tidyverse, this corresponds to

`dataframe %>% select(column_name)`

here:

`album %>% select(label)`

Finally, to retrieve all columns of a table we need not list all their names. We can simply use the asterisk **\*** as shown below. If you are familiar with regular expressions, * is used to match all arguments. The same idea applies here.

👉 remember the asterisk (`*`) in SQL, it means "all." It's a close buddy of **SELECT**. They hang out a lot.

In [None]:
%%sql

SELECT *
FROM album;

id,title,artist,label,released
1,Two Men with the Blues,Willie Nelson and Wynton Marsalis,Blue Note,2008-07-08
11,Hendrix in the West,Jimi Hendrix,Polydor,1972-01-00
12,Rubber Soul,The Beatles,Parlophone,1965-12-03
13,Birds of Fire,Mahavishnu Orchestra,Columbia,1973-03-00
16,Live And,Johnny Winter,Columbia,1971-05-00
17,Apostrophe,Frank Zappa,DiscReet,1974-04-22
18,Kind of Blue,Miles Davis,Columbia,1959-08-17


#### 👩‍🔬💻 Exercise

Please select only the titles from the table.

In [None]:
%%sql



title
Two Men with the Blues
Hendrix in the West
Rubber Soul
Birds of Fire
Live And
Apostrophe
Kind of Blue


### **ORDER** BY <a class="anchor" id="oby"></a>

By default, the results of a **SELECT** are in whatever order they happen to have been written to the table, and **ORDER BY** clause is used to order the results based on the values of a column.

We can order the results based on increasing order of the values of a column using **ASC** qualifier or in descending order using **DESC**. Below is an example which displays the tracks in the `track` table in the decreasing order of their duration.

This works for numbers, characters (A-Z), dates, etc.

In [None]:
%%sql

SELECT *
FROM track
ORDER BY duration DESC;

id,album_id,title,track_number,duration
21,11,Red House,8,786
51,16,It's My Own Fault,2,734
68,18,All Blues,4,696
43,13,One Word,7,597
66,18,Freddy Freeloader,2,589
69,18,Flamenco Sketches,5,566
65,18,So What,1,565
70,11,Fake Track,9,549
54,16,Mean Town Blues,5,539
17,11,Voodoo Chile,4,469


#### 👩‍🔬💻 Exercise

Can you output the tracks sorted by ascending track number?

In [None]:
%%sql



id,album_id,title,track_number,duration
1,1,Bright Lights Big City,1,320
14,11,Johnny B. Goode,1,285
22,12,Drive My Car,1,150
37,13,Birds of Fire,1,350
50,16,Good Morning Little Schoolgirl,1,285
56,17,Don't Eat the Yellow Snow,1,127
65,18,So What,1,565
2,1,Night Life,2,344
15,11,Lover Man,2,185
23,12,Norwegian Wood (This Bird Has Flown),2,125


🤓🤓🤓 in the tidyverse, order corresponds to arrange(desc()) e.g.

`dataframe %>% arrange(desc(column_name))`

🤯🤯 If we want to order by TWO columns, we can just list them in order. That's what we want here -- first album_id, then by track_number. Nice and tidy, very German!

🤓🤓🤓 in the tidyverse, this would be

`track %>% arrange(asc(album_id), asc(track_number))`

In [None]:
%%sql

SELECT *
FROM track
ORDER BY album_id ASC, track_number ASC;

id,album_id,title,track_number,duration
1,1,Bright Lights Big City,1,320
2,1,Night Life,2,344
4,1,Caldonia,3,205
5,1,Stardust,4,308
3,1,Basin Street Blues,5,296
6,1,Georgia On My Mind,6,280
7,1,Rainy Day Blues,7,343
8,1,My Bucket's Got A Hole In It,8,296
9,1,Ain't Nobody's Business,9,447
10,1,That's All,10,368


### RAND( )  <a class="anchor" id="rand"></a>

We can also randomly order (permutate) the rows of a table using the **RAND( )** function of MySQL. But how is it different from a vanilla **SELECT** statement? **SELECT** statement returns the results in the order they were stored, which can(not) be in a particular order (depends on the order of insertion).

**RAND( )** provides gives us results in random order. This is particularly useful while obtaining a sample of the rows, which we will discuss below. You don't want to base everything you learn from a table on its first 10 rows, say. That's a recipe for BAD SCIENCE 🙅🙅‍♂️

For now, let's shuffle the above results. Execute the below cell a few times to see it shuffle away!

In [None]:
%%sql

SELECT *
FROM track
ORDER BY RANDOM();

id,album_id,title,track_number,duration
46,13,Resolution,10,129
2,1,Night Life,2,344
50,16,Good Morning Little Schoolgirl,1,285
62,17,Apostrophe,7,350
34,12,If I Needed Someone,13,143
63,17,Uncle Remus,8,164
60,17,Cosmik Debris,5,254
51,16,It's My Own Fault,2,734
39,13,Celestial Terrestrial Commuters,3,174
59,17,Father O'Blivion,4,138


### LIMIT <a class="anchor" id="limit"></a>

As you can see, the above output is already getting long and annoying. We don't have time for that.

We may simply want the first **N** rows of a table, which we can do with the **LIMIT** quantifier as show below.

In [None]:
%%sql

SELECT title, artist
FROM album
LIMIT 5;

title,artist
Two Men with the Blues,Willie Nelson and Wynton Marsalis
Hendrix in the West,Jimi Hendrix
Rubber Soul,The Beatles
Birds of Fire,Mahavishnu Orchestra
Live And,Johnny Winter


**LIMIT** combined with **ORDER BY** **RANDOM( )** provides a method to obtained a sample without replacement where the sample size is **N**. To do this, we simply randomly order the rows of a table (as discussed above) and the limit the result size to **N**. For example, let's take a sample of size 5 from the table `track`.

🐬🐬🐬 FYI: in MySQL `RANDOM()` is called `RAND()`

In [None]:
%%sql

SELECT *
FROM track
ORDER BY RANDOM()
LIMIT 5;

id,album_id,title,track_number,duration
32,12,In My Life,11,147
60,17,Cosmik Debris,5,254
24,12,You Won't See Me,3,202
16,11,Blue Suede xShoes,3,266
10,1,That's All,10,368


#### 👩‍🔬💻 Exercise

Can you output the 5 shortest tracks?

In [None]:
%%sql



id,album_id,title,track_number,duration
40,13,Sapphire Bullets of Pure Love,4,24
19,11,Sgt. Pepper's Lonely Hearts Club Band,6,76
61,17,Excentrifugal Forz,6,93
58,17,St. Alfonzo's Pancake Breakfast,3,110
42,13,Hope,6,119


### COUNT

We can also count the number of values in a column, and we use **COUNT( COLUMN )** function to do that. For example, to count the number of records of every label in the *album* table, we can use the below query.

In [None]:
%%sql

SELECT COUNT(artist)
FROM album;

COUNT(artist)
7


To count the number of records in the tables, we do **COUNT(** * **)**. You remember the asterisk, don't you?

In [None]:
%%sql

SELECT COUNT(*)
FROM album;

COUNT(*)
7


When will above queries differ in their results? To be specific when will **COUNT(COLUMN)** be less that **COUNT( * )**?

Well, when an entry in the column is empty -- in SQL, that's a `NULL` entry -- equivalent to a **NA** in R. We don't have such entries yet.

#### 👩‍🔬💻 Exercise

How many rows are there in the track table?

In [None]:
%%sql



COUNT(*)
63


🤓🤓🤓 in the tidyverse, you would get the same result with

`dataframe %>% count()`

### DISTINCT <a class="anchor" id="distinct"></a>

What if we only want the unique entries in a column?

We can get a list of unique labels by using  **DISTINCT** with the column name.

Coming back to our first query under **SELECT** statement, we now add the **DISTINCT** qualifier to obtain the list of unique labels as shown below.

In [None]:
%%sql

SELECT DISTINCT label
FROM album;

label
Blue Note
Polydor
Parlophone
Columbia
DiscReet


#### 👩‍🔬💻 Exercise

Can you list the distinct track numbers?

In [None]:
%%sql



track_number
1
2
5
3
4
6
7
8
9
10


🤓🤓🤓 in the tidyverse, you would get the same result with

`dataframe %>% distinct(column_name)`

**DISTINCT** is regularly followed by **COUNT** keyword to count the unique items in a list. For example, number of unique tokens in a sentence,  number of unique Twitter users, etc.

Now, let's count the number of unique labels using **COUNT** over the results from **DISTINCT** qualifier.

In [None]:
%%sql

SELECT COUNT(DISTINCT label)
FROM album;

COUNT(DISTINCT label)
5


#### 👩‍🔬💻 Exercise

How many distinct track numbers are there? (Verify against the list above a few cells ago.)

In [None]:
%%sql



COUNT(DISTINCT track_number)
14


### Aliasing (AS) <a class="anchor" id="as"></a>

We can also rename the extracted columns using **AS** qualifier as shown below. Aliasing is frequently used to rename tables used as subqueries (will be explained below).

In [None]:
%%sql

SELECT COUNT(DISTINCT label) AS number_of_distinct_labels
FROM album;

number_of_distinct_labels
5


😍 waaaay prettier. we always have time for THAT!

### Math Operations <a class="anchor" id="mo"></a>

Here's something trivial, yet useful, when combined with other features of MySQL. We can also perform math operations like addition, subtraction, etc. over numbers, results of a table, or results of aggregate functions explained below. We can also perform other operations like square root **SQRT( )**, logarithm **LOG( )**, etc.

**NOTE:** Use aliasing to rename the columns appropriately.

In [None]:
%%sql

SELECT 7*7 AS Product;

Product
49


#### 👩‍🔬💻 Exercise

Now what's the square root of 49?

In [None]:
%%sql



square_root
7.0


### Aggregation Functions

Remember the **COUNT** function from above? **COUNT** essentially summarizes (or aggregates) the values of a column. There are similar functions, called aggregate functions, like **COUNT**, **SUM**, **MIN**, **MAX**, **AVG**, **STD**, etc. on a column which summarizes the values of a column.

Summarize = as in you only get one row back.

🤓🤓🤓 In the tidyverse, ... %>% summarize(...) does that too!

Let's say we have to find the minimum duration of any track. We can do that as follows.

In [None]:
%%sql

SELECT MIN(duration) AS minimum
FROM track;

minimum
24


🤓🤓🤓 I hope you are getting excited now... in the tidyverse, you would get the same result with

`track %>% summarize(minimum = min(duration))`

Then we can combine the idea of using math operation mentioned above and aggregate functions to result in more kewl output. For example, we can find the range of track duration.

In [None]:
%%sql

SELECT MAX(duration) - MIN(duration)
FROM track;

MAX(duration) - MIN(duration)
762


#### 👩‍🔬💻 Exercise

Can you write the tidyverse code for this? If you are not R level 7, feel free to skip.

```

```

#### 👩‍🔬💻 Exercise

Find the total duration of all tracks in minutes.

In [None]:
%%sql



duration_in_minutes
297


#### 👩‍🔬💻 Exercise

Using the **AVG** function, find the mean duration.

In [None]:
%%sql



AVG(duration)
283.06349206349205


In [None]:
%%sql

SELECT STD(duration)
FROM track;

RuntimeError: (sqlite3.OperationalError) no such function: STD
[SQL: SELECT STD(duration)
FROM track;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)


In [None]:
%%sql

SELECT (duration - 283.0635) / 170.3688 AS z_score
FROM track
LIMIT 5;

z_score
0.2168031940120493
0.3576740576913145
0.075932330332784
-0.4582030277844299
0.1463677621724166


### GROUP BY <a class="anchor" id="gby"></a>

If you haven't taken a break, now might be a good moment to do so.

Shit's getting real!

Summarizing the table into groups is one of the coolest parts of SQL, and core to the "split-apply-combine" framework, that's also why the tidyverse is so helpful. Group by does the splitting.

**GROUP BY** is always used in association with an aggregate function (explained above) on a column which summarizes the group. In fact, MySQL doesn't allow you to use **GROUP BY** by itself and you must aggregate one of the columns. It is also mandatory that you include the column, used to group the table, in the select list.

🤓🤓🤓 In tidyverse jargon, there are no tibbles in SQL. If you `group_by`, you have to `summarize` or do something else with it.

For example, let's count the number of albums per *label*.

In [None]:
%%sql

SELECT label, COUNT(*) AS num_albums
FROM album
GROUP BY label;

label,num_albums
Blue Note,1
Columbia,3
DiscReet,1
Parlophone,1
Polydor,1


🤓🤓🤓 We hope you are getting excited! Do I even have to give you the tidyverse code at this point? Aren't you already seeing through the split-apply-combine matrix?

`album %>% group_by(label) %>% count()`

knock knock, Neo 🐇

Anyway...now let's find the average duration of tracks (grouped by *album_id*) for a particular album.

In [None]:
%%sql

SELECT album_id, AVG(duration) AS average
FROM track
GROUP BY album_id;

album_id,average
1,320.7
11,330.0
12,152.78571428571428
13,242.4
16,405.3333333333333
17,211.88888888888889
18,550.8


#### 👩‍🔬💻 Exercise

Now you find the average duration for a given track number:

In [None]:
%%sql



track_number,average_per_track_number
1,297.42857142857144
2,363.1428571428572
3,223.0
4,315.0
5,308.1428571428572
6,155.5
7,329.2
8,344.2
9,355.6
10,214.66666666666663


#### 👩‍🔬💻 Exercise

And now find the highest track number for a given album:

In [None]:
%%sql



album_id,max_track_number
1,10
11,9
12,14
13,10
16,6
17,9
18,5


#### 👩‍🔬💻 Exercise

Ok, level 102, putting a few ideas together. Let's say you are very particular about your record keeping -- some would say intense -- and decide, for a given album, to subtract the LARGEST track number FROM the NUMBER of tracks you have in the database for a given album. What does the output tell you? Think about that for a sec.

In [None]:
%%sql



album_id,MAX(track_number) - COUNT(*)
1,0
11,0
12,0
13,0
16,0
17,0
18,0


the output tells you that track numbers are incremental indices on the tracks per album, and you didn't lose a track by accident.

Now move on to Tutorial 1B, Neo.