# Answering Business Questions Using SQL
## Introduction

In this project, we will answer some business questions based on the Chinook Database, a sample database representing a digital media store. We will use SQLite to perform advanced SQL queries on the database for our store.

In [1]:
import sqlite3
import pandas as pd

In [2]:
# Takes a SQL query as an argument and returns a pandas df of the query
def run_query(q):
    with sqlite3.connect('chinook.db') as conn:
        return pd.read_sql_query(q, conn)

# Takes a SQL command as an argument and executes it using sqlite module
def run_command(c):
    with sqlite3.connect('chinook.db') as conn:
        conn.isolation_level = None
        conn.execute(c)

# Calls run_query() function to return a list of all tables and views in the database
def show_tables():
    q = '''SELECT
            name,
            type
        FROM sqlite_master
        WHERE type IN ("table","view");
        '''
    return run_query(q)

In [3]:
# Show current state of the database
show_tables()

Unnamed: 0,name,type
0,album,table
1,artist,table
2,customer,table
3,employee,table
4,genre,table
5,invoice,table
6,invoice_line,table
7,media_type,table
8,playlist,table
9,playlist_track,table


## Selecting New Albums to Purchase

For our first query, we're interested in finding out which genres sell the best in the USA. Then, we will recommend three albums to be added to our store based on our findings. We must select three albums from the following four artists: Regal (Hip-Hop), Red Tone (Punk), Meteor and the Girls (Pop), Slim Jim Bites (Blues).

In [4]:
q1 = '''
    WITH usa_tracks_sold AS
    (
        SELECT il.* FROM invoice_line il
        INNER JOIN invoice i on il.invoice_id = i.invoice_id
        INNER JOIN customer c on i.customer_id = c.customer_id
        WHERE c.country = "USA"
    )

    SELECT
        g.name genre,
        count(uts.invoice_line_id) tracks_sold,
        cast(count(uts.invoice_line_id) AS FLOAT) / (
            SELECT COUNT(*) from usa_tracks_sold
        ) percentage_sold
    FROM usa_tracks_sold uts
    INNER JOIN track t on t.track_id = uts.track_id
    INNER JOIN genre g on g.genre_id = t.genre_id
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 10;
    '''

genre_sales_usa = run_query(q1)
run_query(q1)

Unnamed: 0,genre,tracks_sold,percentage_sold
0,Rock,561,0.533777
1,Alternative & Punk,130,0.123692
2,Metal,124,0.117983
3,R&B/Soul,53,0.050428
4,Blues,36,0.034253
5,Alternative,35,0.033302
6,Pop,22,0.020932
7,Latin,22,0.020932
8,Hip Hop/Rap,20,0.019029
9,Jazz,14,0.013321


We recommend adding the following three albums to our store:

- Red Tone (Punk)
- Slim Jim Bites (Blues)
- Meteor and the Girls (Pop)

The following genres make up for 17% of all tracks sold, while Rock makes up for 53% of all tracks sold. We should also look into adding Rock albums in the future.

## Analyzing Employee Sales Performance

Now we would like to analyze employee sales performance to find out if any sales agents are performing better (or worse) than others.

In [5]:
q2 = '''
    WITH support_rep_sales AS
        (
         SELECT
             i.customer_id,
             c.support_rep_id,
             SUM(i.total) total
         FROM invoice i
         INNER JOIN customer c ON i.customer_id = c.customer_id
         GROUP BY 1,2
        )

    SELECT
        e.first_name || " " || e.last_name employee,
        e.hire_date,
        SUM(srs.total) total_sales
    FROM support_rep_sales srs
    INNER JOIN employee e ON e.employee_id = srs.support_rep_id
    GROUP BY 1;
     '''

sales_per_agent = run_query(q2)
run_query(q2)

Unnamed: 0,employee,hire_date,total_sales
0,Jane Peacock,2017-04-01 00:00:00,1731.51
1,Margaret Park,2017-05-03 00:00:00,1584.0
2,Steve Johnson,2017-10-17 00:00:00,1393.92


It seems that Jane is the top sales employee in our store. She is selling about 20% more tracks than the bottom sales employee, Steve. However, upon further analysis, Jane was hired at least 6 months earlier than Steve, so there should be no cause for concern. We can analyze the data even further to find out if there are any differences in sales depending on the time of year.

## Analyzing Sales by Country

Now we will analyze sales by country to see which countries have the most potential for growth. Countries with only one customer will be grouped as "Other".

In [6]:
q3 = '''
    WITH country_or_other AS
        (
         SELECT
           CASE
               WHEN (
                     SELECT count(*)
                     FROM customer
                     where country = c.country
                    ) = 1 THEN "Other"
               ELSE c.country
           END AS country,
           c.customer_id,
           il.*
         FROM invoice_line il
         INNER JOIN invoice i ON i.invoice_id = il.invoice_id
         INNER JOIN customer c ON c.customer_id = i.customer_id
        )

    SELECT
        country,
        customers,
        total_sales,
        average_order,
        customer_lifetime_value
    FROM
        (
        SELECT
            country,
            count(distinct customer_id) customers,
            SUM(unit_price) total_sales,
            SUM(unit_price) / count(distinct customer_id) customer_lifetime_value,
            SUM(unit_price) / count(distinct invoice_id) average_order,
            CASE
                WHEN country = "Other" THEN 1
                ELSE 0
            END AS sort
        FROM country_or_other
        GROUP BY country
        ORDER BY sort ASC, total_sales DESC
        );
     '''

sales_by_country = run_query(q3)
run_query(q3)

Unnamed: 0,country,customers,total_sales,average_order,customer_lifetime_value
0,USA,13,1040.49,7.942672,80.037692
1,Canada,8,535.59,7.047237,66.94875
2,Brazil,5,427.68,7.011148,85.536
3,France,5,389.07,7.7814,77.814
4,Germany,4,334.62,8.161463,83.655
5,Czech Republic,2,273.24,9.108,136.62
6,United Kingdom,3,245.52,8.768571,81.84
7,Portugal,2,185.13,6.383793,92.565
8,India,2,183.15,8.721429,91.575
9,Other,15,1094.94,7.448571,72.996


Based on the data we found, Czech Reublic, United Kingdom, and India are the top three countries that seem to have the most potential for growth. These countries have the highest average orders and customer lifetime values compared to the rest of the countries. However, because there aren't many customers in this data, we cannot make any definitive conclusions. Our sample data is too small, so we would have to be careful when targeting sales in these countries.

## Albums vs Individual Tracks

In order to save money, our store is considering changing their purchasing strategy to target the most popular tracks from each album rather than purchasing every track from an album. We need to find out what percentage of purchases are individual tracks vs whole albums.

In [7]:
q4 = '''
    WITH invoice_first_track AS
        (
         SELECT
             il.invoice_id invoice_id,
             MIN(il.track_id) first_track_id
         FROM invoice_line il
         GROUP BY 1
        )

    SELECT
        album_purchase,
        COUNT(invoice_id) number_of_invoices,
        CAST(count(invoice_id) AS FLOAT) / (
                                             SELECT COUNT(*) FROM invoice
                                          ) percent
    FROM
        (
        SELECT
            ifs.*,
            CASE
                WHEN
                     (
                      SELECT t.track_id FROM track t
                      WHERE t.album_id = (
                                          SELECT t2.album_id FROM track t2
                                          WHERE t2.track_id = ifs.first_track_id
                                         ) 

                      EXCEPT 

                      SELECT il2.track_id FROM invoice_line il2
                      WHERE il2.invoice_id = ifs.invoice_id
                     ) IS NULL
                 AND
                     (
                      SELECT il2.track_id FROM invoice_line il2
                      WHERE il2.invoice_id = ifs.invoice_id

                      EXCEPT 

                      SELECT t.track_id FROM track t
                      WHERE t.album_id = (
                                          SELECT t2.album_id FROM track t2
                                          WHERE t2.track_id = ifs.first_track_id
                                         ) 
                     ) IS NULL
                 THEN "yes"
                 ELSE "no"
             END AS "album_purchase"
         FROM invoice_first_track ifs
        )
    GROUP BY album_purchase;
     '''

album_purchases = run_query(q4)
run_query(q4)

Unnamed: 0,album_purchase,number_of_invoices,percent
0,no,500,0.814332
1,yes,114,0.185668


According to our results, individual tracks make up 81.4% of all total purchases while albums make up only 18.6% of all total purchases. Because 18.6% is still a significant amount of purchases in our store, we would not recommend implementing the proposed strategy as the store would most likely lose revenue in the long run.

## Conclusion

We were able to successfully answer business questions using advanced SQL queries for our digital media store. We can answer even more complex questions using SQL in the future to help improve our business and profits.

