# Business Questions on Chinook Music Store

In this tutorial we will be looking at a database called Chinook which contains information about a fictional music shop. The database includes several tables on invoice, track, album, artist etc. related to the store's sales. We will use this database to explore business questions and propositions.

The benefits of this approach are:

1. Easy to bootstrap. Analytical SQL Cell widget for Jupyter Lab is all you need.
2. Extremely efficient. Analytical SQL Cell will leverage [DuckDB](https://duckdb.org/) to query and visualize data in a performant way.
3. Free and open source. No commercial database or ChatGPT is required :-)

Chinook is a sample database available for SQL Server, Oracle, MySQL, etc. It is an alternative to the Northwind database, being ideal for demos and testing ORM tools targeting single and multiple database servers. The name of this sample database was based on the Northwind database. Chinooks are winds in the interior West of North America, where the Canadian Prairies and Great Plains meet various mountain ranges. Chinooks are most prevalent over southern Alberta in Canada. Chinook is a good name choice for a database that intents to be an alternative to Northwind.

The database includes several tables on invoice, track, album, artist etc. related to the store's sales. The Entity Relationship Diagram of the database is provided as follows:

![Chinook Database](https://m-soro.github.io/Business-Analytics/SQL-for-Data-Analysis/L4-Project-Query-Music-Store/Misc/001.png)

# Preparation

Firstly we'll need to load extension from `asqlcell`.

In [None]:
%load_ext asqlcell

Now we can create a connection object to the DuckDB database file. Notice that the connection object will be used for SQL queries later.

In [None]:
from sqlalchemy import create_engine, inspect

con = create_engine(f"duckdb:///chinook.duckdb").connect()

We can take a look at the database's schema via the connection object to understand which tables it contains:

In [None]:
inspect(con).get_table_names()

# Exploration

## Sales

Let's investigate the countries in which most sales occur.

### How much did users spent in total per country?

We can join `Invoice` and `Customer` tables to create a result set of country name, customer name and total spent grouped by all the spendings of the customers.

In [None]:
%%sql --con con

SELECT
    Customer.Country,
    Customer.FirstName || ' ' || Customer.LastName AS Customer,
    SUM(Invoice.Total) AS Total
FROM Invoice
JOIN Customer ON Customer.CustomerId = Invoice.CustomerId
GROUP BY 1, 2
ORDER BY 3 DESC

We can also visualize the result set by clicking the Chart tab with the following settings:

* Chart type: Column
* X-Axis: Country
* Y-Axis: Total, sort in descending order

It is obvious that USA, Canada and France are top three countries contributing to the revenue.

### How many users are there per country?

We could leverage `Customer` to create a table of country name and custmer count by grouping the customers by country. We'd like to sort the countries so that the countries with most customers should appear first.

In [None]:
%%sql --con con

SELECT
    Country,
    COUNT(CustomerId) AS Count
FROM Customer
GROUP BY 1
ORDER BY 2
DESC

We can also visualize the result set by clicking the Chart tab with the following settings:

* Chart type: Pie
* Size: Count
* Color: Country, sort in descending order

It is obvious that most customers reside in USA, Canada and Brazil.

### How much did users spent per order per country?

We will need to leverage subquery refactoring or CTE to make the logic clearer. First we can create a detailed sales table via `Invoice` and `Customer`, and then calculate the average order.

In [None]:
%%sql --con con

WITH DetailedSales AS
(
    SELECT
        Customer.Country,
        Customer.CustomerId,
        Invoice.InvoiceId,
        Invoice.Total
    FROM Invoice
    INNER JOIN Customer ON Customer.CustomerId = Invoice.CustomerId
)

SELECT
    Country,
    COUNT(DISTINCT CustomerId) AS Customers,
    SUM(Total) / COUNT(DISTINCT InvoiceId) AS Average
FROM DetailedSales
GROUP BY Country
HAVING Customers > 1
ORDER BY Average DESC, Country ASC

Notice that certain countries has only one customer, it is difficult to draw reasonable conclusions therefore they are excluded.

We can also visualize the result set by clicking the Chart tab with the following settings:

* Chart type: Column
* X-Axis: Country
* Y-Axis: Average

As we can see, even though USA and Canada users spent quite a lot of money in total, there are countries we should keep an eye on regarding money spent per order:

* Czech Republic
* India
* Germany

The sample is not big enough to risk in large budget marketing campaigns, but we should start some smaller ones to get more customers and confirm the potential market.

## Music

The above analysis revealed some countries where the Chinook media store can make potential investments. Now let's identify the most popular music genres to aid the stores to make future marketing and investiment decisions.

### How many tracks sold in the USA per genre?

In [None]:
%%sql --con con

SELECT 
    Genre.Name AS Genre,
    SUM(InvoiceLine.Quantity) AS TracksSold,
FROM Invoice
INNER JOIN InvoiceLine ON Invoice.InvoiceId = InvoiceLine.InvoiceId
INNER JOIN Track ON InvoiceLine.TrackId = Track.TrackId
INNER JOIN Genre ON Track.GenreId = Genre.GenreId
WHERE Invoice.BillingCountry = 'USA'
GROUP BY 1
ORDER BY TracksSold DESC

We can also visualize the result set by clicking the Chart tab with the following settings:

* Chart type: Bar
* X-Axis: Genre
* Y-Axis: TracksSold

The most popular genres in the USA are Rock, Latin, Metal and Alternative & Punk, followed with a big gap by all the others.

## Who is writing the rock music?

Now that we know that our customers love rock music, we can decide which musicians to invite to play at the concert.

Let’s invite the artists who have written the most rock music in our dataset. Write a query that returns the Artist name and total track count of the top 10 rock bands. We will need to use the Genre, Track , Album, and Artist tables.

In [None]:
%%sql --con con

SELECT
    Artist.Name,
    COUNT(Track.Name) AS Count
FROM Track
JOIN Genre ON Track.GenreId = Genre.GenreId
JOIN Album ON Album.AlbumId = Track.AlbumId
JOIN Artist ON Artist.ArtistId = Album.ArtistId
WHERE Genre.Name = 'Rock'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10

We can visualize the result set by clicking the Chart tab with the following settings:

* Chart type: Bar
* X-Axis: Name
* Y-Axis: Count

It is obvious Led Zeppelin, U2 and Deep Purple are the most popular musicians.

## Employee

Each customer for the Chinook store gets assigned to a sales support agent within the company when they first make a purchase. We can analyze the purchases of customers belonging to each employee to see if any sales support agent is performing well.

### How about the total sales of each employee per month?

We can use `DATETRUNC` to truncate the invoice date so we can group by month which will makes the data look nicer when we plot it.

In [None]:
%%sql --con con

SELECT
    Employee.FirstName || ' ' || Employee.LastName AS Employee,
    DATETRUNC('month', Invoice.InvoiceDate) AS Date,
    SUM(Invoice.Total) AS Sales,
FROM Employee
JOIN Customer ON Customer.SupportRepId = Employee.EmployeeId
JOIN Invoice ON Invoice.CustomerId = Customer.CustomerId
WHERE Title = 'Sales Support Agent'
GROUP BY 1, 2

We can visualize the result set by clicking the Chart tab with the following settings:

* Chart type: Line
* X-Axis: Date
* Y-Axis: Sales
* Color: Employee
* Width: 900

It is not obvious to evaluate the performance of employees.

We should consider whether any extra columns from the employee table explain any variance which could be indicative of the employee performance. Let's try the hire date.

In [None]:
%%sql --con con

SELECT
    Employee.HireDate,
    Employee.FirstName || ' ' || Employee.LastName AS Employee,
    SUM(Invoice.Total) AS Sales
FROM Employee
JOIN Customer ON Customer.SupportRepId = Employee.EmployeeId
JOIN Invoice ON Invoice.CustomerId = Customer.CustomerId
WHERE Title = 'Sales Support Agent'
GROUP BY 1, 2

We can visualize the result set by clicking the Chart tab with the following settings:

* Chart type: Column
* X-Axis: Employee
* Y-Axis: Sales

Well, there are differences in sales between Jane and Steve, but they were hired the same year so this shouldn't be the reason.