# Effect on Revenue: Purchase the most popular tracks from each album from record companies, instead of the whole album

In this project, we're going to use SQL skills to answer the above business question with the data set from the Chinook database, `chinook.db`. 

In [44]:
%%capture
%load_ext sql
%sql sqlite:///chinook.db

'Connected: None@chinook.db'

In [45]:
%%sql
SELECT *
FROM customer
LIMIT 5

Done.


customer_id,first_name,last_name,company,address,city,state,country,postal_code,phone,fax,email,support_rep_id
1,Luís,Gonçalves,Embraer - Empresa Brasileira de Aeronáutica S.A.,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,+55 (12) 3923-5555,+55 (12) 3923-5566,luisg@embraer.com.br,3
2,Leonie,Köhler,,Theodor-Heuss-Straße 34,Stuttgart,,Germany,70174,+49 0711 2842222,,leonekohler@surfeu.de,5
3,François,Tremblay,,1498 rue Bélanger,Montréal,QC,Canada,H2G 1A7,+1 (514) 721-4711,,ftremblay@gmail.com,3
4,Bjørn,Hansen,,Ullevålsveien 14,Oslo,,Norway,0171,+47 22 44 22 22,,bjorn.hansen@yahoo.no,4
5,František,Wichterlová,JetBrains s.r.o.,Klanova 9/506,Prague,,Czech Republic,14700,+420 2 4172 5555,+420 2 4172 5555,frantisekw@jetbrains.com,4


Our first task is selecting three albums that will be added to the store from the below list. Since all four albums are by the artists that don't have any tracks in the store right now - we have the artist names and the genre of music they produce: 

|**Artist Name**|**Genre**|
|----|----|
|Regal|Hip-Hop|
|Red Tone|Punk|
|Meteor and the Girls|Pop|
|Slim Jim Bites|Blues|

The record label specializes in artists from the USA, and they have given Chinook some money to advertise the new albums in the USA, so we're interested in finding out which genres sell the best in the USA.

## The most popular genres in the USA

In [49]:
%%sql
SELECT 
    g.name AS genre,
    SUM(il.quantity) AS nr_of_tracks_sold,
    ROUND(
        CAST(
        SUM(il.quantity) as FLOAT)*100/(
            SELECT SUM(quantity)
            FROM invoice_line),
        2) AS percentages
FROM genre g
LEFT JOIN 
    track t on t.genre_id = g.genre_id
LEFT JOIN 
    invoice_line il on il.track_id = t.track_id
GROUP BY g.name
ORDER BY percentages DESC
LIMIT 15

Done.


genre,nr_of_tracks_sold,percentages
Rock,2635,55.39
Metal,619,13.01
Alternative & Punk,492,10.34
Latin,167,3.51
R&B/Soul,159,3.34
Blues,124,2.61
Jazz,121,2.54
Alternative,117,2.46
Easy Listening,74,1.56
Pop,63,1.32


The top 3 selling genres are **Rock**(2635,	55.39%), **Metal**(619, 13.01%) and **Alternative & Punk** (492, 10.34%). 

Consider *Red Tone*'s album belongs to **Punk**, their album should be included based on this result. Then, we can see **Blues** (124, 2.61%) is placed at 6 and **Pop**(63, 1.32%) is placed at 10. Both of them have a higher rank than **Hip-Hop** (33, 0.69%). Therefore we should leave the album from *Regal* and purchase the other 3 based on the sales of tracks from their genres.

## Amount of sales assigned to sales support agent

Each customer for the Chinook store gets assigned to a sales support agent within the company when they first make a purchase. Our next task is to analyze the purchases of customers belonging to each employee to see if any sales support agent is performing either better or worse than the others.

In [52]:
%%sql
select
    e.employee_id,
    e.first_name||" "||e.last_name as employee,
    e.title as title,
    e.hire_date as hire_date,
    ROUND(sum(i.total),2) as revenue,
    ROUND(sum(i.total)*100/(select sum(total)
                  from invoice),2) as percentage
from customer c
left join 
    employee e on e.employee_id = c.support_rep_id
left join
    invoice i on i.customer_id = c.customer_id
group by e.employee_id
order by revenue desc



Done.


employee_id,employee,title,hire_date,revenue,percentage
3,Jane Peacock,Sales Support Agent,2017-04-01 00:00:00,1731.51,36.77
4,Margaret Park,Sales Support Agent,2017-05-03 00:00:00,1584.0,33.63
5,Steve Johnson,Sales Support Agent,2017-10-17 00:00:00,1393.92,29.6


There are 3 sales support agents in Chinook and *Jane* has the best sales record among all of them. Meanwhile, it is also important to notice that the order of their sales record is equal to their entry date. It indicates that their working experience may help with their sales.

## Sales of the customer from different countries

The next task is to analyze the sales data for customers from each different country. We will use the country value from the table `customer`, and ignore the country from the billing address in the invoice table.

In particular, for each country we will calculate data the followings:

- total number of customers
- total value of sales
- the average value of sales per customer
- average order value

Because there are a number of countries with only one customer, they will be grouped as "Other".

In [5]:
%%sql
with 
    /*group countries with only 1 customer as "Other"*/
    country_gp as( 
                select 
                    country,
                    case 
                        when count(*) = 1 then 'Other'
                        else country
                        end as country_gp,
                    case
                        when count(*) = 1 then 0
                        else 1
                        end as gp_order /*put 'Other' to the bottom*/
                from customer
                group by country
                ),    
    cus_sales as(
                select 
                    customer_id,
                    sum(total) as total,
                    count(*) as nr_order
                from invoice
                group by customer_id
                )
select 
    c_gp.country_gp as country,
    count(*) as nr_of_customer,
    round(sum(cs.total),2) as total_sales,
    round(avg(cs.total),2) as avg_sales_per_customer,
    round(sum(cs.total)/sum(cs.nr_order),2) as avg_order_value
from customer c
left join 
    country_gp c_gp on c_gp.country = c.country
left join 
    cus_sales cs on cs.customer_id = c.customer_id
group by country_gp
order by gp_order desc, total_sales desc

Done.


country,nr_of_customer,total_sales,avg_sales_per_customer,avg_order_value
USA,13,1040.49,80.04,7.94
Canada,8,535.59,66.95,7.05
Brazil,5,427.68,85.54,7.01
France,5,389.07,77.81,7.78
Germany,4,334.62,83.66,8.16
Czech Republic,2,273.24,136.62,9.11
United Kingdom,3,245.52,81.84,8.77
Portugal,2,185.13,92.56,6.38
India,2,183.15,91.57,8.72
Other,15,1094.94,73.0,7.45


The sales of USA is the top of all countries, and then is Canada. Meanwhile Czech Republich has the highest average sales per customer, 136.62.It is also important to notice that the total number of customer and total sales in the countries with only 1 customer are larger than the number of customer in USA.

## Percentage of purchases are individual tracks vs whole albums

Our last task is to find out what percentage of purchases are individual tracks vs whole albums, so that management can use this data to understand the effect this decision might have on overall revenue.

Consider that the Chinook store allows the customer to make purchases in one of the two ways:

- purchase a whole album
- purchase a collection of one or more individual tracks.

First of all, we will find out all the albums in every invoice. Then we will compare the tracks purchased and the full track list from the album. If they are the same, we will assign 1 as 'full album', otherwise 0. Besides, the purchase with complete and incomplete albums will be also classified as 0.

In [53]:
%%sql
with
    invoice_album as(
        select 
            il.invoice_id,
            t1.album_id,
            count(*) as n_track,
            case
                when (
                    /* select all tracks from a album*/
                    select track_id from track t0
                    where t0.album_id = t1.album_id 

                    except

                    /* select all tracks from an invoice*/
                    select track_id from invoice_line il_t
                    where il_t.invoice_id = il.invoice_id
                    ) isnull then 1
                else 0
                end as full_album,
            sum(il.unit_price*il.quantity) as ttl_revenue

        from invoice_line il
        left join track t1 on t1.track_id = il.track_id
        group by il.invoice_id, t1.album_id
        ),
    /* if a purchase contains tracks from non-completed album, assign as 0*/
    summary as(
        select 
            invoice_id, 
            min(full_album) as full_album,
            sum(ttl_revenue) as total_revenue,
            sum(n_track) as n_track
        from invoice_album
        group by invoice_id
        )
    
select
    case
        when full_album = 1 then 'By album'
        else 'By track'
        end as 'Purchase Method',
    count(*) as n_invoice, 
    (round(
        count(*)/(
            select 
                cast(
                   count(
                       distinct(invoice_id)
                   ) as FLOAT
               )
            from invoice_line)*100,2)
    ) as per_invoices,
    sum(n_track)/count(*) as track_per_invoice
from summary
group by full_album;

Done.


Purchase Method,n_invoice,per_invoices,track_per_invoice
By track,500,81.43,6
By album,114,18.57,12


More than 80% of the purchases are not a full album. Moreover, the number of tracks purchased by track is only half of the number purchased by album. This implies that most of the purchases only include a few songs from the whole album. Therefore purchase only the most popular tracks from each album from record companies, instead of purchasing every track from an album would definitely reduce the cost without a significant reduction of the total revenue.

To be continued:

- Which artist is used in the most playlists?
- How many tracks have been purchased vs not purchased?
- Is the range of tracks in the store reflective of their sales popularity?
- Do protected vs non-protected media types have an effect on popularity?