# Answering Business Questions With SQL

This notebook will go over using the sqlite3 module and some pandas to query a sqlite database to answer some business questions about album sales, purchase behaviors, etc.

In [1]:
import pandas as pd
import sqlite3

## Helper Functions

Let's start off by making the following helper functions:

run_query(): takes a SQL query as an argument and returns a pandas dataframe of that query.

In [2]:
def run_query(query):
    with sqlite3.connect("chinook.db") as conn:
        return pd.read_sql(query, conn)

run_command(): takes a SQL command as an argument and executes it using the sqlite module.

In [3]:
def run_command(query):
    with sqlite3.connect("chinook.db") as conn:
        conn.isolation_level = None
        conn.execute(query)

show_tables(): calls the run_query() function to return a list of all tables and views in the database.

In [4]:
def show_tables():
    query = """
        SELECT
            name,
            type
        FROM sqlite_master
        WHERE type IN ("table","view");
    """
    return run_query(query)

Now, let's run the show_tables() function.

In [5]:
show_tables()

Unnamed: 0,name,type
0,album,table
1,artist,table
2,customer,table
3,employee,table
4,genre,table
5,invoice,table
6,invoice_line,table
7,media_type,table
8,playlist,table
9,playlist_track,table


## Selecting Albums to Purchase

The Chinook record store has just signed a deal with a record label. We are tasked with selecting 3 albums from a list of 4 albums to add to the store. The record label has given Chinook some money to advertise the new albums in the USA, so we're interested in finding out which genres sell the best in the USA.

We will start by writing a query that returns each genre and the number of tracks sold in the USA in both absolute numbers and percentages.

In [6]:
best_selling_usa_genres_query = """
    WITH tracks_sold_by_genre AS (
        SELECT
            g.name AS genre,
            COUNT(t.track_id) AS num_tracks_sold
        FROM
            track t
            LEFT JOIN genre g
                ON t.genre_id = g.genre_id
            LEFT JOIN invoice_line il
                ON t.track_id = il.track_id
            LEFT JOIN invoice i
                ON il.invoice_id = i.invoice_id
        WHERE
            i.billing_country = "USA"
        GROUP BY genre
    )
    SELECT
        genre,
        num_tracks_sold,
        ROUND(100.0 * num_tracks_sold / (SELECT SUM(num_tracks_sold) FROM tracks_sold_by_genre), 2) AS pct_of_usa_tracks_sold
    FROM tracks_sold_by_genre
    ORDER BY num_tracks_sold DESC;
"""

run_query(best_selling_usa_genres_query)

Unnamed: 0,genre,num_tracks_sold,pct_of_usa_tracks_sold
0,Rock,561,53.38
1,Alternative & Punk,130,12.37
2,Metal,124,11.8
3,R&B/Soul,53,5.04
4,Blues,36,3.43
5,Alternative,35,3.33
6,Latin,22,2.09
7,Pop,22,2.09
8,Hip Hop/Rap,20,1.9
9,Jazz,14,1.33


The 4 tracks that we have to choose from are:

| Artist Name | Genre |
| ----------- | ----------- |
| Regal | Hip-Hop |
| Red Tone | Punk |
| Meteor and the Girls | Pop |
| Slim Jim Bites | Blues |

Based on our table of genres and number of tracks sold in the USA, we should purchase Red Tone (Punk), Slim Jim Bites (Blues), Meteor and the Girls (Pop) albums for the store sinc

## Analyzing Sales By Country
Will will now analyze sales for each country. For each country, we will look at total number of customers, total value of sales, average value of sales per customer, and average order value. In addition, there will be a couple countries with only one customer. These countries will be grouped under "Other" country.

In [7]:
sales_by_country = '''
WITH country_or_other AS
    (
     SELECT
       CASE
           WHEN (
                 SELECT count(*)
                 FROM customer
                 where country = c.country
                ) = 1 THEN "Other"
           ELSE c.country
       END AS country,
       c.customer_id,
       il.*
     FROM invoice_line il
     INNER JOIN invoice i ON i.invoice_id = il.invoice_id
     INNER JOIN customer c ON c.customer_id = i.customer_id
    )

SELECT
    country,
    customers,
    total_sales,
    average_order,
    customer_lifetime_value
FROM
    (
    SELECT
        country,
        count(distinct customer_id) customers,
        SUM(unit_price) total_sales,
        SUM(unit_price) / count(distinct customer_id) customer_lifetime_value,
        SUM(unit_price) / count(distinct invoice_id) average_order,
        CASE
            WHEN country = "Other" THEN 1
            ELSE 0
        END AS sort
    FROM country_or_other
    GROUP BY country
    ORDER BY sort ASC, total_sales DESC
    );
'''

run_query(sales_by_country)

Unnamed: 0,country,customers,total_sales,average_order,customer_lifetime_value
0,USA,13,1040.49,7.942672,80.037692
1,Canada,8,535.59,7.047237,66.94875
2,Brazil,5,427.68,7.011148,85.536
3,France,5,389.07,7.7814,77.814
4,Germany,4,334.62,8.161463,83.655
5,Czech Republic,2,273.24,9.108,136.62
6,United Kingdom,3,245.52,8.768571,81.84
7,Portugal,2,185.13,6.383793,92.565
8,India,2,183.15,8.721429,91.575
9,Other,15,1094.94,7.448571,72.996


Based on the data, there may be opportunity in the following countries:

* Czech Republic
* United Kingdom
* India

It's worth keeping in mind that because the amount of data from each of these countries is relatively low. Because of this, we should be cautious spending too much money on new marketing campaigns, as the sample size is not large enough to give us high confidence. A better approach would be to run small campaigns in these countries, collecting and analyzing the new customers to make sure that these trends hold with new customers.