In [1]:
# import libraries

import sqlite3  # for working with sqlite3 in jupyter python script
from IPython.display import Image  # for displaying images in markdown cells

In [2]:
%%html
<style>
table {align:left;display:block}  # to align html tables to left
</style> 

# Dataquest - Intermediate SQL For Data Analysis <br/> <br/> Project Title: Answering Business Questions Using SQL

## 1) Introduction and Schema Diagram

Provided by: [Dataquest.io](https://www.dataquest.io/)

In this guided project, we're going to practice using our SQL skills to answer business questions.

We'll use the Chinook database, provided as a SQLite database file called chinook.db. A copy of the database schema is below.

![title](chinook-schema.svg)

In the case where the working database **'chinook.db'** has unwanted modification when exploring this exercise, a backup database **'chinook-unmodified.db'** file is also located in this github directory folder that can be copied over the chinook.db to restore it back to its initial state.

We'll use the following code to connect our Jupyter Notebook to our database file:

In [3]:
%%capture
%load_ext sql
%sql sqlite:///chinook.db

## 2) Overview Of The Data

Let's start by getting familiar with our data. 

We can query the database to get a list of all tables and views in our database:

In [4]:
%%sql
/*
Write a query to return information on the tables and views in the database.

Write one or two queries to get familiar with the tables and to practice running SQL in this interface. Use the schema diagram on the previous screen for reference.
*/

SELECT
    name,
    type
FROM sqlite_master
WHERE type IN ("table","view");

 * sqlite:///chinook.db
Done.


name,type
album,table
artist,table
customer,table
employee,table
genre,table
invoice,table
invoice_line,table
media_type,table
playlist,table
playlist_track,table


## 3) Selecting Albums To Purchase

Provided by: [Dataquest.io](https://www.dataquest.io/)

The Chinook record store has just signed a deal with a new record label, and you've been tasked with selecting the first three albums that will be added to the store, from a list of four. All four albums are by artists that don't have any tracks in the store right now - we have the artist names, and the genre of music they produce:

---
Artist Name | Genre|
--- | --- |
Regal | Hip-Hop |
Red Tone | Punk |
Meteor and the Girls | Pop |
Slim Jim Bites | Blues |

The record label specializes in artists from the USA, and they have given Chinook some money to advertise the new albums in the USA, **so we're interested in finding out which genres sell the best in the USA.**

We need to write a query to find out which genres sell the most tracks in the USA, write up a summary of your findings, and make a recommendation for the three artists whose albums we should purchase for the store.


In [5]:
%%sql
/*
Write a query that returns each genre, with the number of tracks sold in the USA:
- in absolute numbers
- in percentages.

Write a paragraph that interprets the data and makes a recommendation for the three artists whose albums we should purchase for the store, based on sales of tracks from their genres.
*/

SELECT
    inv.billing_country AS 'Country Sold',
    SUM(il.quantity) AS 'Number of tracks sold (qty)',
    gen.name AS 'Genre Name'
FROM
    invoice AS inv
LEFT JOIN
    invoice_line AS il
    ON il.invoice_id = inv.invoice_id
LEFT JOIN 
    track AS tra
    ON tra.track_id = il.track_id
LEFT JOIN
    genre AS gen
    ON gen.genre_id = tra.genre_id
WHERE
    inv.billing_country = 'USA'
    AND
    (gen.name LIKE '%hip%'
     OR
     gen.name LIKE '%punk%'
     OR
     gen.name LIKE '%pop%'
     OR
     gen.name LIKE '%blues%')
GROUP BY
    gen.name
ORDER BY
    2 DESC;

 * sqlite:///chinook.db
Done.


Country Sold,Number of tracks sold (qty),Genre Name
USA,130,Alternative & Punk
USA,36,Blues
USA,22,Pop
USA,20,Hip Hop/Rap


#### Findings (Selecting Albums To Purchase):
Based on the shortlisted findings above based on genre popularity measured by number of tracks sold, it would appear the below artist/genre are preferred in descending order, where **'Alternative & Punk' genre dominates the sales chart by far**, while the other genres have less of a difference among them.

Rank |Artist Name | Artist Genre
--- | --- | --- |
1 | Red Tone | Punk | 
2 | Slim Jim Bites | Blues |
3 | Meteor and the Girls | Pop | 
4 | Regal | Hip-Hop |

## 4) Analysing Employee Sales Performance

Provided by: [Dataquest.io](https://www.dataquest.io/)

Each customer for the Chinook store gets assigned to a sales support agent within the company when they first make a purchase. You have been asked to analyze the purchases of customers belonging to each employee to see if any sales support agent is performing either better or worse than the others.

You might like to consider whether any extra columns from the employee table explain any variance you see, or whether the variance might instead be indicative of employee performance.

In [40]:
%%sql
/*
Write a query that finds the total dollar amount of sales assigned to each sales support agent within the company. Add any extra attributes for that employee that you find are relevant to the analysis.

Write a short statement describing your results, and providing a possible interpretation.
*/

SELECT
    emp.first_name || ' ' || emp.last_name AS employee_name,
    SUM(inv.total) AS invoice_total,
    emp.*
FROM
    employee AS emp
LEFT JOIN
    customer AS cus
    ON cus.support_rep_id = emp.employee_id
LEFT JOIN
    invoice AS inv
    ON inv.invoice_id= cus.customer_id
WHERE
    emp.title = 'Sales Support Agent'
GROUP BY
    emp.employee_id
ORDER BY
    invoice_total DESC;

 * sqlite:///chinook.db
Done.


employee_name,invoice_total,employee_id,last_name,first_name,title,reports_to,birthdate,hire_date,address,city,state,country,postal_code,phone,fax,email
Jane Peacock,165.32999999999998,3,Peacock,Jane,Sales Support Agent,2,1973-08-29 00:00:00,2017-04-01 00:00:00,1111 6 Ave SW,Calgary,AB,Canada,T2P 5M5,+1 (403) 262-3443,+1 (403) 262-6712,jane@chinookcorp.com
Margaret Park,155.43,4,Park,Margaret,Sales Support Agent,2,1947-09-19 00:00:00,2017-05-03 00:00:00,683 10 Street SW,Calgary,AB,Canada,T2P 5G3,+1 (403) 263-4423,+1 (403) 263-4289,margaret@chinookcorp.com
Steve Johnson,143.55,5,Johnson,Steve,Sales Support Agent,2,1965-03-03 00:00:00,2017-10-17 00:00:00,7727B 41 Ave,Calgary,AB,Canada,T3B 1Y7,1 (780) 836-9987,1 (780) 836-9543,steve@chinookcorp.com


#### Findings (Analysing Employee Sales Performance):
The top rankings in terms of employee sales as such.

There do not seem to have any observable trends within the employee data itself that differentiates their sales performance.

As such, the sales performance could be due to other factors such as:
- genre of tracks (could each sales representative specialise in certain genre?)
- country where tracks are sold (could each sales representative specialise in customers or outlets from specific countries?)
- single track or tracks in albums (could each sales representative specialise in singles / album tracks?)

Rank |employee_name | invoice_total
--- | --- | --- |
1 | Jane Peacock | 165.33 | 
2 | Margaret Park | 155.43 |
3 | Steve Johnson | 143.55 | 

## 5) Analysing Sales By Country

Provided by: [Dataquest.io](https://www.dataquest.io/)

Your next task is to analyze the sales data for customers from each different country. You have been given guidance to use the country value from the customers table, and ignore the country from the billing address in the invoice table.

In particular, you have been directed to calculate data, for each country, on the:
- total number of customers
- total value of sales
- average value of sales per customer
- average order value

Because there are a number of countries with only one customer, you should group these customers as "Other" in your analysis.

---
**Tip:**
You can use the following 'trick' to force the ordering of "Other" to last in your analysis.

If there is a particular value that you would like to force to the top or bottom of results, you can implement a sorting feature in a subquery:
- Put what would normally be your most outer query in a subquery with a case statement that adds a numeric column, and then in the outer query sort by that column.

In [None]:
%%sql
/*
Write a query that collates data on purchases from different countries.

- Where a country has only one customer, collect them into an "Other" group.

- The results should be sorted by the total sales from highest to lowest, with the "Other" group at the very bottom.

- For each country, include:
    - total number of customers
    - total value of sales
    - average value of sales per customer
    - average order value
*/


