# Analyzing store data

This notebook will be analyzing different aspects of our music store. This notebook will help corporate make the best business decision based on data gathered throughout the store's lifetime.

First of all, we're going to be importing our necessary libraries to store our data in data frames using `pandas`, run queries using `sqlite` and to create our dashboards using `matplotlib`.

In [2]:
import sqlite3
import pandas as pd
import datetime

#our database
db = 'chinook.db'

#functoin to run queries
def run_query(query):
    with sqlite3.connect(db) as conn:
        return pd.read_sql(query, conn)

#function to run commands eg. create tables, updates
def run_command(command):
    with sqlite3.connect(db) as conn:
        conn.isolation_level = None
        conn.execute(c)

##function to see the current state of our tables.
def show_tables():
    query = '''
               SELECT name,
                      type
                 FROM sqlite_master
                WHERE type in ('table', 'view');
            '''
    return run_query(query)

show_tables()

Unnamed: 0,name,type
0,album,table
1,artist,table
2,customer,table
3,employee,table
4,genre,table
5,invoice,table
6,invoice_line,table
7,media_type,table
8,playlist,table
9,playlist_track,table


### Analyzing tracks sold in USA by genre

Our investors have given us capital for promoting new four albums:

| Artist Name          | Genre    |
|----------------------|----------|
| Regal                | Hip-Hop  |
| Red Tone             | Punk     |
| Meteor and the Girls | Pop      |
| Slim Jim Bites       | Blues    |

Let's decide which artist we should invest the new money on. We will make the decision based on the sales per genre.

In [2]:
query = '''
WITH tracks_sold_usa AS (
    SELECT il.quantity, 
           track_id,
           first_name
      FROM invoice_line il 
     INNER JOIN invoice i ON il.invoice_id = i.invoice_id
     INNER JOIN customer c on i.customer_id = c.customer_id
     WHERE c.country = 'USA' 
)

SELECT g.name as genre, 
       sum(quantity) as total_sales, 
       round(cast(sum(quantity) as float) / (SELECT count(*) FROM tracks_sold_usa ), 2) as percentage_total
FROM tracks_sold_usa ts 
INNER JOIN track t ON ts.track_id = t.track_id
INNER JOIN genre g ON g.genre_id = t.genre_id
GROUP BY g.name
order by 2 desc limit 10

'''
total_sales_by_genre_pct = run_query(query)
run_query(query)

Unnamed: 0,genre,total_sales,percentage_total
0,Rock,561,0.53
1,Alternative & Punk,130,0.12
2,Metal,124,0.12
3,R&B/Soul,53,0.05
4,Blues,36,0.03
5,Alternative,35,0.03
6,Pop,22,0.02
7,Latin,22,0.02
8,Hip Hop/Rap,20,0.02
9,Jazz,14,0.01


#### Best artist based on genre
We can conclude that *Red Tone* is probably the best artist to invest advertising on, based on our premise that out of the 4 artists, Punk has the most total sales in the United States. After that we would have *Slim Jim Bites*, then *Meteor and the Girls*, and finally *Regal*.

### Employee performance
Now, we're going to analyze employee's performance, and see how much each employee is either overpeforming or underperforming from each other.

In [3]:
#Write a query that finds the total dollar amount of sales assigned to each sales support agent within the company. 
#Add any extra attributes for that employee that you find are relevant to the analysis.
#Write a short statement describing your results, and providing a possible interpretation.

#create a join for the employees and respective clients
#get the number of sales per employee, based on their customers

query = '''
WITH employee_customer AS (
    SELECT e.first_name || " " || e.last_name as employee_name, c.customer_id as customer_id, hire_date
      FROM employee e LEFT JOIN customer c ON e.employee_id = c.support_rep_id 
)

SELECT employee_name,
       sum(total)
  FROM employee_customer ec
 INNER JOIN invoice i ON ec.customer_id = i.customer_id
 GROUP BY employee_name
'''
run_query(query)

Unnamed: 0,employee_name,sum(total)
0,Jane Peacock,1731.51
1,Margaret Park,1584.0
2,Steve Johnson,1393.92


Jane Peacock is outperforming the other two employees. This can be for a multiple of reasons, like more days working, working full time vs part time, an artist coming in and buying all of his albums, or many more. If none of this happened, maybe we should look into giving Jane a small bonus!

### Purchases by country

We are now going to analyze our dataset by country.

In [50]:
# Write a query that collates data on purchases from different countries.
# Where a country has only one customer, collect them into an "Other" group.
# The results should be sorted by the total sales from highest to lowest, with the "Other" group at the very bottom.
# For each country, include:
# total number of customers
# total value of sales
# average value of sales per customer
# average order value

query = """
WITH country_other AS (
SELECT CASE
       WHEN (SELECT COUNT(*)
             FROM CUSTOMER
             WHERE country = c.country
            ) = 1 THEN "Other"
       ELSE c.country
        END AS country,
       c.customer_id,
       il.*
 FROM invoice_line il
INNER JOIN invoice i ON i.invoice_id = il.invoice_id
INNER JOIN customer c ON c.customer_id = i.customer_id
)

SELECT
    country,
    customers,
    total_sales,
    average_order,
    customer_lifetime_value
FROM
    (
    SELECT
        country,
        count(distinct customer_id) customers,
        SUM(unit_price) total_sales,
        SUM(unit_price) / count(distinct customer_id) customer_lifetime_value,
        SUM(unit_price) / count(distinct invoice_id) average_order,
        CASE
            WHEN country = "Other" THEN 1
            ELSE 0
        END AS sort
    FROM country_other
    GROUP BY country
    ORDER BY sort ASC, total_sales DESC
    );"""
run_query(query)

Unnamed: 0,country,customers,total_sales,average_order,customer_lifetime_value
0,USA,13,1040.49,7.942672,80.037692
1,Canada,8,535.59,7.047237,66.94875
2,Brazil,5,427.68,7.011148,85.536
3,France,5,389.07,7.7814,77.814
4,Germany,4,334.62,8.161463,83.655
5,Czech Republic,2,273.24,9.108,136.62
6,United Kingdom,3,245.52,8.768571,81.84
7,Portugal,2,185.13,6.383793,92.565
8,India,2,183.15,8.721429,91.575
9,Other,15,1094.94,7.448571,72.996


In [52]:
query = """
WITH invoice_first_track AS
    (
     SELECT
         il.invoice_id invoice_id,
         MIN(il.track_id) first_track_id
     FROM invoice_line il
     GROUP BY 1
    )

SELECT
    album_purchase,
    COUNT(invoice_id) number_of_invoices,
    CAST(count(invoice_id) AS FLOAT) / (
                                         SELECT COUNT(*) FROM invoice
                                      ) percent
FROM
    (
    SELECT
        ifs.*,
        CASE
            WHEN
                 (
                  SELECT t.track_id FROM track t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track t2
                                      WHERE t2.track_id = ifs.first_track_id
                                     ) 

                  EXCEPT 

                  SELECT il2.track_id FROM invoice_line il2
                  WHERE il2.invoice_id = ifs.invoice_id
                 ) IS NULL
             AND
                 (
                  SELECT il2.track_id FROM invoice_line il2
                  WHERE il2.invoice_id = ifs.invoice_id

                  EXCEPT 

                  SELECT t.track_id FROM track t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track t2
                                      WHERE t2.track_id = ifs.first_track_id
                                     ) 
                 ) IS NULL
             THEN "yes"
             ELSE "no"
         END AS "album_purchase"
     FROM invoice_first_track ifs
    )
GROUP BY album_purchase;
"""

run_query(query)

Unnamed: 0,album_purchase,number_of_invoices,percent
0,no,500,0.814332
1,yes,114,0.185668
