In [1]:
%load_ext sql

# Connect to CHINOOK database

Conect directly from sqlite folder

In [2]:
%sql sqlite:///C:/sqlite/chinook.db

In [3]:
%%sql
PRAGMA table_info(track);

 * sqlite:///C:/sqlite/chinook.db
Done.


cid,name,type,notnull,dflt_value,pk
0,track_id,INTEGER,1,,1
1,name,NVARCHAR(200),1,,0
2,album_id,INTEGER,0,,0
3,media_type_id,INTEGER,1,,0
4,genre_id,INTEGER,0,,0
5,composer,NVARCHAR(220),0,,0
6,milliseconds,INTEGER,1,,0
7,bytes,INTEGER,0,,0
8,unit_price,"NUMERIC(10,2)",1,,0


# 1) Introduction

In this lesson, we'll learn how to apply **summary statistics** on several columns and understand **how aggregate functions behave under conditions.**

## Instructions

Write a query that displays the track's name, album identifier, and runtime in milliseconds where the track's unit price is greater than or equal to **one dollar**.

1. Use the following column names: `name`, `album_id`, and `milliseconds`.

1. Use the `WHERE` clause to specify the condition.

In [5]:
%%sql

SELECT name,
       album_id,
       milliseconds
FROM track
WHERE unit_price >= 1
limit 5;

 * sqlite:///C:/sqlite/chinook.db
Done.


name,album_id,milliseconds
Battlestar Galactica: The Story So Far,226,2622250
Occupation / Precipice,227,5286953
"Exodus, Pt. 1",227,2621708
"Exodus, Pt. 2",227,2618000
Collaborators,227,2626626


# 2) Using Several Statistics in a Column

We can also compute several statistics simultaneously with the `SELECT` clause using a similar query template. 

For example, let's compute the **sum** and **average** track's runtime and the **number of rows** in the `track` table at the same time.

In [6]:
%%sql
SELECT SUM(milliseconds) AS total_runtime, AVG(milliseconds) AS avg_runtime, COUNT(*) AS num_row 
  FROM track;


 * sqlite:///C:/sqlite/chinook.db
Done.


total_runtime,avg_runtime,num_row
1378778040,393599.2121039109,3503


It'll take `1378778040` milliseconds (more than 15 days straight) to listen to all the tracks available for purchase from the store because, on average, every track is `393599.212104` milliseconds long (around 6.5 minutes). We have `3503` tracks.

## Instructions

1. Write a query that computes the lowest and highest tracks' runtime from the `track` table. Remember that the tracks' runtime is in the `milliseconds` column.

Rename the lowest value as `min_runtime`.

Rename the highest value as `max_runtime`.

In [7]:
%%sql
SELECT MIN(milliseconds) as min_runtime,
       MAX(milliseconds) as max_runtime
from track

 * sqlite:///C:/sqlite/chinook.db
Done.


min_runtime,max_runtime
1071,5286953


# 3) Using Aggregate Functions with Computed Columns

As we learned previously, we can do mathematical calculations with columns:

 * Columns with values (e.g., column * 3.14).

 * Columns with other columns (e.g., column_1 + column_2).

For example, we can convert the `milliseconds`' column into seconds by dividing it by `1000.0`.

> Note that we use `1000.0` here instead of 1000 to enforce a division in FLOAT. This is equivalent to using the `CAST` function as we did previously.

For example, we can take the previous query and calculate the track's runtime average in seconds directly.

```sql
SELECT AVG(milliseconds / 1000.0) AS avg_runtime_seconds 
  FROM track;
```

## Instructions

1. Write a query that aggregates the following computed columns from the `track` table and displays them.

* AVG(milliseconds / 1000.0 / 60) as avg_runtime_minutes.

* AVG(bytes / 1024.0 / 1024) as avg_size_megabyte

In [8]:
%%sql

SELECT AVG(milliseconds / 1000.0 / 60) as avg_runtime_minutes,
       AVG(bytes / 1024.0 / 1024) as avg_size_megabyte
from track

 * sqlite:///C:/sqlite/chinook.db
Done.


avg_runtime_minutes,avg_size_megabyte
6.559986868398516,31.957823815701044


# 4) Combining Aggregate and Scalar Functions

The number `6.559987` is a decimal with several digits after the decimal point. With the scalar function `ROUND()`, we can reduce the number of digits after the decimal point to `1` (i.e., rounded to the nearest tenth). We can then write the following query.

```sql
SELECT ROUND(AVG(milliseconds / 1000.0 / 60), 1) AS avg_runtime_minutes_rounded
  FROM track;
```

## Instructions

1. Write a query that computes the tracks' runtime average in minutes in two ways from the `track` table.

  * Use only the `AVG()` function.

  * Rename the result as avg_runtime_minutes.

1. Combine the `ROUND()` and `AVG()` functions.

  * Compute the average of the runtimes in minutes.

  * Round down the result to the nearest hundredth (i.e., two digits after the decimal point).

  * Rename the result as `avg_runtime_minutes_rounded`.

In [9]:
%%sql
SELECT AVG(milliseconds / 1000.0 / 60) as avg_runtime_minutes,
       ROUND(AVG(milliseconds / 1000.0 / 60), 2) AS avg_runtime_minutes_rounded
  FROM track;

 * sqlite:///C:/sqlite/chinook.db
Done.


avg_runtime_minutes,avg_runtime_minutes_rounded
6.559986868398516,6.56


# 5) Combining Aggregate Functions with Arithmetic Operators

Instead of using `AVG` aggregate function, we could use the aggregates functions `SUM` and `COUNT`. Therefore, we write the following query:

```sql
SELECT SUM(total) / COUNT(total) AS avg_total
  FROM example;
```

## Instructions

1. Write a query that computes the average value of the tracks' runtime in two ways from the `track` table.

    Use only the `AVG()` function.

    Rename the result as avg_runtime.

1. Multiply `SUM(milliseconds)` with 1.0, and divide it by `COUNT(milliseconds)`.

    Rename the result as another_avg_runtime.

In [10]:
%%sql
SELECT AVG(milliseconds) avg_runtime,
       (SUM(milliseconds)*1.0)/COUNT(milliseconds) as another_avg_runtime
from track

 * sqlite:///C:/sqlite/chinook.db
Done.


avg_runtime,another_avg_runtime
393599.2121039109,393599.2121039109


## 6) Summary Statistics Under a Condition

Using the following query, we see that only **two unit** prices are available for tracks.

```sql
SELECT DISTINCT(unit_price) AS prices 
  FROM track;
```

Now, we want to determine how long it will take us to listen to only the tracks that cost **$1.99** (without a break)

## Instructions

1. Write a query that computes the **number of minutes** necessary to listen to only the tracks that cost $1.99.

  Use the following expression to convert the runtime column into minutes: `milliseconds / 1000.0 / 60.`
  
  Sum up the tracks's runtime in minutes and rename the result as `total_runtime_minutes`.
  
  Use the `WHERE` clause to filter out the rows where the track unit price is `1.99`.

In [11]:
%%sql
SELECT SUM(milliseconds / 1000.0 / 60) as total_runtime_minutes
FROM track
WHERE unit_price = 1.99

 * sqlite:///C:/sqlite/chinook.db
Done.


total_runtime_minutes
8351.582616666667
