# Learn SQL with Jupyter Lab

In this tutorial we will be looking at a novel approach to learn SQL solely with Jupyter Lab. Even though Jupyter Lab has become the handy tool for data scientists, SQL however is a missing piece from the picture. With the emmergence of [Analytical SQL Cell](https://github.com/datarho/asqlcell), SQL has become the first class citizen of Jupyter Lab.

The benefits of learning SQL with Jupyter Lab are:

1. Easy to bootstrap. Analytical SQL Cell widget for Jupyter Lab is all you need.
2. Extremely efficient. Analytical SQL Cell will leverage [DuckDB](https://duckdb.org/) to query and visualize data in a performant way.
3. Free and open source. No commercial database or ChatGPT is required :-)

We'll use a database called Chinook which contains information about a fictional music shop. In this tutorial, data is stored in a [SQLite](https://www.sqlite.org/) database file.

Chinook is a sample database for MySQL, SQL Server, etc. It is an alternative to the Northwind database, being ideal for demos. The name of this sample database was based on the Northwind database. Chinooks are winds in the interior West of North America, where the Canadian Prairies and Great Plains meet various mountain ranges. Therefore, Chinook is a good name choice for a database that intents to be an alternative to Northwind.

The database includes several tables on invoice, track, album, artist etc. related to the store's sales. The Entity Relationship Diagram of the database is provided as follows:

![Chinook Database](https://m-soro.github.io/Business-Analytics/SQL-for-Data-Analysis/L4-Project-Query-Music-Store/Misc/001.png)

# Preparation

Firstly we'll need to install `asqlcell`:

In [None]:
%pip install asqlcell --upgrade

And then import `asqlcell` and load extention properly. In this way, a `sql` magic will be enabled in the cell for writing SQL queries later on.

In [None]:
%load_ext asqlcell

Now we can create a connection object to the SQLite database file containing Chinook data. The connection object will be used together with the `sql` magic.

In [None]:
from sqlalchemy import create_engine

con = create_engine(f"sqlite:///chinook.sqlite").connect()

# Exercise

Now we are ready to query the database with SQL statements.

In each code cell, the first line will always be `%%sql --con con` which tells Jupyter Lab to use `sql` magic with the specified connection object to handle the code in the cell. In this way, Analytical SQL Cell will process the SQL query for the result set.

## Customers

Provide a query showing a distinct list of billing countries from the invoice table.

Hint: you can find the data in `Invoice` table. Remember to remove the duplicated items.

In [None]:
%%sql --con con

SELECT DISTINCT
    BillingCountry
FROM Invoice

Provide a query showing customers with their full names, customer id and country who are not in the US.

Hint: you can find the data in `Customer` table. You'll need to concatinate first name and last name for the full name.

In [None]:
%%sql --con con

SELECT
    FirstName || ' ' || LastName AS Customer,
    CustomerId,
    Country
FROM Customer 
WHERE Country != 'USA'

Provide a query only showing the customers from Brazil.

In [None]:
%%sql --con con

SELECT
    FirstName || ' ' || LastName AS Customer,
    CustomerId,
    Country
FROM Customer 
WHERE Country = 'Brazil'

Provide a query showing the invoices of customers who are from Brazil. The resultant table should show the customer's full name, invoice id, date of the invoice and billing country.

Hint: you'll need both `Customer` and `Invoice` tables to built the query.

In [None]:
%%sql --con con

SELECT
    Customer.FirstName || ' ' || Customer.LastName AS Customer,
    Invoice.InvoiceId,
    Invoice.InvoiceDate,
    Invoice.BillingCountry
FROM Customer
JOIN Invoice ON Customer.CustomerId = Invoice.CustomerId 
WHERE Invoice.BillingCountry = 'Brazil'

## Music

Print the first 10 track sales from the invoice line table.

Hint: you can read from `InvoiceLine` table with a limit of 10 records.

In [None]:
%%sql --con con

SELECT
    *
FROM InvoiceLine
LIMIT 10

Find the names of top 10 tracks sold and how many they times they were sold.

Hint: you will also need `Track` table in addition to `InvoiceLine` to get track information.

In [None]:
%%sql --con con

SELECT
    Track.TrackId,
    Track.Name,
    SUM(InvoiceLine.Quantity) AS Sold
FROM Track
JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId
GROUP BY 1, 2
ORDER BY 3 DESC
LIMIT 10

Who are the top 10 highest selling artists?

Hint: you need to get data from  the `InvoiceLine`, `Track`, `Album` and `Artist` tables.

In [None]:
%%sql --con con

SELECT
    Artist.ArtistId,
    Artist.Name,
    SUM(InvoiceLine.Quantity) AS Sold
FROM InvoiceLine
JOIN Track ON Track.TrackId = InvoiceLine.TrackId
JOIN Album ON Album.AlbumId = Track.AlbumId
JOIN Artist ON Artist.ArtistId = Album.ArtistId
GROUP BY 1, 2
ORDER BY 3 DESC
LIMIT 10

We can also visualize the result set by clicking the Chart tab with the following settings:

* Chart type: Column
* X-Axis: Name
* Y-Axis: Sold
* Color: Name

Also you can specify the width and hight of the chart in the Display tab.

If you're happy with the result, you can also click the `Pin` button to persist the result in the notebook. In this way, your colleague will be able to see the chart after receiving your notebook (without starting a kernel).

## Employees

Provide a query showing only the employees who are sales support agents.

Hint: you can find the data in `Employee` table.

In [None]:
%%sql --con con

SELECT
    FirstName || ' ' || LastName AS Employee,
    EmployeeID
FROM Employee
WHERE Title = 'Sales Support Agent'

Provide a query that shows the invoices associated with each sales agent.

Hint: you'll need `Customer`, `Invoice` and `Employee` tables to built the query.

In [None]:
%%sql --con con

SELECT
    Invoice.InvoiceID,
    Invoice.Total,
    Employee.FirstName || ' ' || Employee.LastName AS 'Sales Agent'
FROM Invoice
JOIN Customer ON Invoice.CustomerId = Customer.CustomerId
JOIN Employee ON Customer.SupportRepId = Employee.EmployeeId

We can visualize the result set by clicking the Chart tab with the following settings:

* Chart type: Bar
* X-Axis: Sales Agent
* Y-Axis: Total
* Color: Sales Agent

Also you can specify the width and hight of the chart in the Display tab.

Provide a query that shows the invoice total, customer name, country and sale agent name for all invoices and customers.

Hint: you'll need `Customer`, `Invoice` and `Employee` tables to built the query.

In [None]:
%%sql --con con

SELECT
    Invoice.Total,
    Customer.FirstName || ' ' || Customer.LastName AS Customer,
    Customer.Country,
    Employee.FirstName || ' ' || Employee.LastName AS 'Sales Agent'
FROM Invoice
JOIN Customer ON Invoice.CustomerId = Customer.CustomerId
JOIN Employee ON Customer.SupportRepId = Employee.EmployeeId

We can visualize the result set by clicking the Chart tab with the following settings:

* Chart type: Column
* X-Axis: Country
* Y-Axis: Total
* Color: Sales Agent

## Conclusion

We can see that with the help of [Analytical SQL Cell](https://github.com/datarho/asqlcell), SQL is now the first class citizen of Jupyter Lab. We can learn SQL and even visualize the result with ease.