# Guided Project : Answering Business Questions using SQL

## Introduction and Schema Diagram

In [1]:
%%capture
%load_ext sql
%sql sqlite:///chinook.db

'Connected: None@chinook.db'

## Data overview

In [2]:
%%sql
SELECT
    name,
    type
FROM sqlite_master
WHERE type IN ("table","view");

Done.


name,type
album,table
artist,table
customer,table
employee,table
genre,table
invoice,table
invoice_line,table
media_type,table
playlist,table
playlist_track,table


## Selecting New Albums to Purchase

The Chinook record store has just signed a deal with a new record label, and you've been tasked with selecting the first three albums that will be added to the store, from a list of four. All four albums are by artists that don't have any tracks in the store right now - we have the artist names, and the genre of music they produce:

| Artist Name          |	Genre |
|----------------------|----------|
| Regal                | Hip-hop  |
| Red Tone             | Punk     |
| Meteor and the Girls | Pop      |
| Slim Jim Bites       | Blues    |


Regal	Hip-Hop
Red Tone	Punk
Meteor and the Girls	Pop
Slim Jim Bites	Blues

The record label specializes in artists from the USA, and they have given Chinook some money to advertise the new albums in the USA, so we're interested in finding out which genres sell the best in the USA.

In [27]:
%%sql
WITH usa_invoices AS
    (
    SELECT
        i.invoice_id,
        il.invoice_line_id,
        i.billing_country,
        il.track_id
    FROM invoice i
    INNER JOIN invoice_line il ON il.invoice_id = i.invoice_id
    WHERE billing_country = "USA"
    ),
    usa_genres_invoices AS
    (
    SELECT
        t.track_id,
        g.name
    FROM track t
    INNER JOIN usa_invoices ui ON ui.track_id = t.track_id
    INNER JOIN genre g ON g.genre_id = t.genre_id
    )
    
SELECT 
    name as genre,
    COUNT(track_id) AS number_sold_usa_abs,
    ROUND(COUNT(track_id) / CAST((SELECT COUNT(*) FROM usa_genres_invoices) AS FLOAT)*100,1) AS number_sold_usa_per
FROM usa_genres_invoices
GROUP BY 1
ORDER BY 2 DESC;

Done.


genre,number_sold_usa_abs,number_sold_usa_per
Rock,561,53.4
Alternative & Punk,130,12.4
Metal,124,11.8
R&B/Soul,53,5.0
Blues,36,3.4
Alternative,35,3.3
Latin,22,2.1
Pop,22,2.1
Hip Hop/Rap,20,1.9
Jazz,14,1.3


Looking at all all the invoices billed in the USA and the genre of the corresponding tracks, we can see that Rock sells the modt tracks in the USA. Next are Alternative & Punk and Metal.

Our list of four albums to choose within contains albums form the following genres : Hip-Hop, Punk, Pop, Blues.
Looking back at our table above, we can see that their relative ranking is as follows in terms of sold tracks in the USA :
    1. Punk
    2. Blues
    3. Pop
    4. Hip-Hop

If we want to select the 3 most promising albums in terms of sales from the list of four, we should go for the Punk, Blues and Pop albums.

## Analysing Employee Sales Performance

Each customer for the Chinook store gets assigned to a sales support agent within the company when they first make a purchase. You have been asked to analyze the purchases of customers belonging to each employee to see if any sales support agent is performing either better or worse than the others.

Let's first have a look at all the employees and their respective titles within the company.

In [38]:
%%sql
SELECT 
    first_name || " " || last_name as employee_name,
    title
FROM employee

Done.


employee_name,title
Andrew Adams,General Manager
Nancy Edwards,Sales Manager
Jane Peacock,Sales Support Agent
Margaret Park,Sales Support Agent
Steve Johnson,Sales Support Agent
Michael Mitchell,IT Manager
Robert King,IT Staff
Laura Callahan,IT Staff


We want to know who between Jane, Margaret and Steve perform the best as sales support agent.

In [41]:
%%sql

WITH customer_total_expenditures AS
    (
    SELECT 
        c.customer_id,
        c.support_rep_id,
        ROUND(SUM(i.total),2) total
    FROM customer c
    INNER JOIN invoice i ON i.customer_id = c.customer_id
    GROUP BY 1
    ORDER BY 2 DESC
    ),
     sales_support_agents AS
    (
    SELECT
        employee_id,
        first_name || " " || last_name as employee_name,
        hire_date
    FROM employee
    )
    
SELECT 
    ssa.employee_name,
    ssa.hire_date,
    ROUND(SUM(cte.total),2) as total
FROM sales_support_agents ssa
INNER JOIN customer_total_expenditures cte ON cte.support_rep_id = ssa.employee_id
GROUP BY 1
ORDER BY 3 DESC;


Done.


employee_name,hire_date,total
Jane Peacock,2017-04-01 00:00:00,1731.51
Margaret Park,2017-05-03 00:00:00,1584.0
Steve Johnson,2017-10-17 00:00:00,1393.92


We can see from the table above that Jane Peacock, who has made the highest amount of sales (total amount in dollars), is also the support agent that has been the longest in the company.
Margaret comes second in terms of sales and was actually hired after Jane but before Steve. Steve then comes last and arrived the latest in the company.

This likely shows a correlation between the experience of the agent in his role and the amount of sales he or she makes.

## Analysing Sales by Country

In [69]:
%%sql
WITH customers_total_purchases AS
    (
        SELECT
            customer_id,
            COUNT(invoice_id) invoices_per_customer,
            ROUND(SUM(total),2) total
        FROM invoice
        GROUP BY customer_id
    ),
        final_table AS
    (
        SELECT 
            c.country,
            COUNT(c.country) number_of_customers,
            ROUND(SUM(ctp.total),2) as total,
            SUM(ctp.total)/COUNT(c.country) as avg_sales_customer,
            SUM(ctp.total)/SUM(invoices_per_customer) as avg_order_value
        FROM customer c
        INNER JOIN customers_total_purchases ctp ON ctp.customer_id = c.customer_id
        GROUP BY c.country  
    )

SELECT
    country,
    number_of_customers,
    total,
    ROUND(avg_sales_customer,2) avg_sales_customer,
    ROUND(avg_order_value,2) avg_order_value
FROM 
    (
    SELECT
        ft.*,
        CASE
            WHEN ft.number_of_customers = 1 THEN 1
            ELSE 0
        END AS sort
    FROM final_table ft
    )
ORDER BY 3 DESC, 2 ASC;

Done.


country,number_of_customers,total,avg_sales_customer,avg_order_value
USA,13,1040.49,80.04,7.94
Canada,8,535.59,66.95,7.05
Brazil,5,427.68,85.54,7.01
France,5,389.07,77.81,7.78
Germany,4,334.62,83.66,8.16
Czech Republic,2,273.24,136.62,9.11
United Kingdom,3,245.52,81.84,8.77
Portugal,2,185.13,92.56,6.38
India,2,183.15,91.58,8.72
Ireland,1,114.84,114.84,8.83


We can see that the USA, Canada and Brazil are the top 3 markets.


It's worth keeping in mind that because the amount of data from each of these countries is relatively low. Because of this, we should be cautious spending too much money on new marketing campaigns solely based on this table, as the sample size is not large enough to give us high confidence. A better approach would be to run small campaigns in these countries, collecting and analyzing the new customers to make sure that these trends hold with new customers.

## Albums vs Individual Tracks

The Chinook store is setup in a way that allows customer to make purchases in one of the two ways:

* purchase a whole album
* purchase a collection of one or more individual tracks.

The store does not let customers purchase a whole album, and then add individual tracks to that same purchase (unless they do that by choosing each track manually). When customers purchase albums they are charged the same price as if they had purchased each of those tracks separately.

Management are currently considering changing their purchasing strategy to save money. The strategy they are considering is to purchase only the most popular tracks from each album from record companies, instead of purchasing every track from an album.

We have been asked to find out what percentage of purchases are individual tracks vs whole albums, so that management can use this data to understand the effect this decision might have on overall revenue.

It is very common when you are performing an analysis to have 'edge cases' which prevent you from getting a 100% accurate answer to your question. In this instance, we have two edge cases to consider:

* Albums that have only one or two tracks are likely to be purchased by customers as part of a collection of individual tracks.
* Customers may decide to manually select every track from an album, and then add a few individual tracks from other albums to their purchase.

In the first case, since our analysis is concerned with maximizing revenue we can safely ignore albums consisting of only a few tracks. The company has previously done analysis to confirm that the second case does not happen often, so we can ignore this case also.

In order to answer the question, we're going to have to identify whether each invoice has all the tracks from an album. We can do this by getting the list of tracks from an invoice and comparing it to the list of tracks from an album. We can find the album to compare the purchase to by looking up the album that one of the purchased tracks belongs to. It doesn't matter which track we pick, since if it's an album purchase, that album will be the same for all tracks.

In [106]:
%%sql
WITH invoice_first_track AS
    (
    SELECT
        il.invoice_id invoice_id,
        MIN(il.track_id) first_track_id
    FROM invoice_line il
    GROUP BY 1
    )

SELECT
    album_purchase,
    COUNT(invoice_id) number_of_invoices,
    COUNT(invoice_id)/(SELECT COUNT(*) FROM invoice) percentage_of_invoices

FROM
    (
    SELECT
        ifs.*,
        CASE
            WHEN
                (SELECT t.track_id FROM track t
                 WHERE t.album_id = (
                                    SELECT t2.album_id FROM track t2
                                    WHERE t2.track_id = ifs.first_track_id                
                                  ) 
                 EXCEPT
                 
                 SELECT il2.track_id FROM invoice_line il2
                 WHERE il2.invoice_id = ifs.invoice_id
                ) IS NULL
            AND
                (SELECT il2.track_id FROM invoice_line il2
                 WHERE il2.invoice_id = ifs.invoice_id
                 
                 EXCEPT
                 
                 SELECT t.track_id FROM track t
                 WHERE t.album_id = (
                                    SELECT t2.album_id FROM track t2
                                    WHERE t2.track_id = ifs.first_track_id                
                                  ) 
                ) IS NULL
            THEN "yes"
            ELSE "no"
        END AS "album_purchase"
    FROM invoice_first_track ifs
    )
GROUP BY album_purchase;

Done.


album_purchase,number_of_invoices,percentage_of_invoices
no,500,0
yes,114,0


Album purchases account for 18.6% of purchases. Based on this data, I would recommend against purchasing only select tracks from albums from record companies, since there is potential to lose one fifth of revenue.