# SQL data analysis

In this project I've used chinook database that contains information about the artists, songs, and albums from the music shop, as well as information on the shop's employees, customers, and the customers purchases. This information is contained in eleven tables.

In [3]:
%%capture
%load_ext sql
%sql sqlite:///chinook.db #To connect Jupyter Notebook to database file:

'Connected: None@chinook.db'

### Overview of the Data

In [5]:
%%sql
SELECT
    name,
    type
 FROM sqlite_master
 WHERE type IN ("table","view");

Done.


name,type
album,table
artist,table
customer,table
employee,table
genre,table
invoice,table
invoice_line,table
media_type,table
playlist,table
playlist_track,table


In [6]:
%%sql
SELECT *
 FROM album
 LIMIT 3

Done.


album_id,title,artist_id
1,For Those About To Rock We Salute You,1
2,Balls to the Wall,2
3,Restless and Wild,2


In [7]:
%%sql
SELECT *
 FROM playlist_track
 LIMIT 3

Done.


playlist_id,track_id
1,3402
1,3389
1,3390


In [8]:
%%sql
SELECT *
 FROM customer
 LIMIT 3

Done.


customer_id,first_name,last_name,company,address,city,state,country,postal_code,phone,fax,email,support_rep_id
1,Luís,Gonçalves,Embraer - Empresa Brasileira de Aeronáutica S.A.,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,+55 (12) 3923-5555,+55 (12) 3923-5566,luisg@embraer.com.br,3
2,Leonie,Köhler,,Theodor-Heuss-Straße 34,Stuttgart,,Germany,70174,+49 0711 2842222,,leonekohler@surfeu.de,5
3,François,Tremblay,,1498 rue Bélanger,Montréal,QC,Canada,H2G 1A7,+1 (514) 721-4711,,ftremblay@gmail.com,3


In [9]:
%%sql
SELECT *
 FROM cust
 LIMIT 3

Done.


country,total_cust,total_sales,avg,cust_name
Other,1,39.6,7.919999999999999,Diego
Other,1,81.17999999999995,8.117999999999995,Mark
Other,1,69.30000000000001,7.700000000000001,Astrid


I've used existing data to find out which new artist to be added in Chinook db based on different genres music customers are enjoying in USA, so I'm interested in finding out which genres sell the best in the USA.

In [29]:
%%sql
WITH usa AS
 ( SELECT c.*, il.*
    FROM customer c
    INNER JOIN invoice i ON i.customer_id=c.customer_id
    INNER JOIN invoice_line il ON il.invoice_id=i.invoice_id
    WHERE c.country = 'USA'  
 )
 
SELECT 
    g.name Genre, 
    ar.name Artist_Name,
    COUNT(u.invoice_line_id) Tracks_Sold,
    CAST(COUNT(u.invoice_line_id) AS FLOAT)/(SELECT COUNT(*) from usa) *100 Per_Sold 
    FROM usa u
    INNER JOIN track t ON t.track_id = u.track_id 
    INNER JOIN genre g ON g.genre_id = t.genre_id
    INNER JOIN album al ON al.album_id = t.album_id
    INNER JOIN artist ar ON ar.artist_id = al.artist_id
    GROUP BY 1
    ORDER BY 3 DESC
    limit 8
    

Done.


Genre,Artist_Name,Tracks_Sold,Per_Sold
Rock,The Who,561,53.37773549000951
Alternative & Punk,Green Day,130,12.369172216936253
Metal,Godsmack,124,11.798287345385347
R&B/Soul,Amy Winehouse,53,5.042816365366318
Blues,Buddy Guy,36,3.425309229305423
Alternative,Chris Cornell,35,3.3301617507136063
Latin,Eric Clapton,22,2.093244529019981
Pop,U2,22,2.093244529019981


Based on the tracks sold in USA across different genres, we should purchase new albums by following artist:
1. Red Tone
2. Slim Jim Bites
3. Meteor and the Girls

### Analyzing Employee Sales Performance

Each customer for the Chinook store gets assigned to a sales support agent within the company when they first make a purchase. The task is to analyze the purchases of customers belonging to each employee to see how the sales support agent is performing.

In [42]:
%%sql
SELECT 
    sum(i.total) Amount,
    e.first_name||' '||e.last_name Emp_Name,
    c.first_name||' '||c.last_name Cust_Name,
    e.hire_date
    FROM employee e
    INNER JOIN customer c ON c.support_rep_id=e.employee_id
    INNER JOIN invoice i ON i.customer_id=c.customer_id
    WHERE e.title = 'Sales Support Agent'
    GROUP BY 2
    ORDER BY 1 DESC

Done.


Amount,Emp_Name,Cust_Name,hire_date
1731.510000000004,Jane Peacock,Phil Hughes,2017-04-01 00:00:00
1584.0000000000034,Margaret Park,Dan Miller,2017-05-03 00:00:00
1393.920000000002,Steve Johnson,Mark Philips,2017-10-17 00:00:00


As per the observation the difeerence between Jane Peacock and Steve Johnson is around $338 , this might be because of the difference in the hiring dates.

### Analyzing Sales by Country

In [73]:
%%sql

CREATE VIEW cust AS
  SELECT
    CASE
        WHEN (
                SELECT count(*)
                FROM customer
                WHERE country = c.country
                ) = 1 THEN "Other" # Where a country has only one customer, collected them into an "Other" group.
        ELSE country
    END AS Country,
    COUNT(DISTINCT c.customer_id) Total_Customers,
    SUM(il.unit_price) Total_Sales,
    SUM(il.unit_price)/COUNT(DISTINCT i.invoice_id) AVG_Sales_Per_Customer,
    c.first_name Customer_name
       
    FROM customer c 
    INNER JOIN invoice i ON i.customer_id =c.customer_id
    INNER JOIN invoice_line il ON il.invoice_id=i.invoice_id     
    GROUP BY country;


Done.


[]

In [75]:
%%sql
SELECT * 
 FROM cust

Done.


Country,Total_Customers,Total_Sales,AVG_Sales_Per_Customer,Customer_name
Other,1,39.6,7.919999999999999,Diego
Other,1,81.17999999999995,8.117999999999995,Mark
Other,1,69.30000000000001,7.700000000000001,Astrid
Other,1,60.39000000000004,8.627142857142863,Daan
Brazil,5,427.6800000000025,7.011147540983647,Fernanda
Canada,8,535.5900000000034,7.047236842105309,Ellie
Other,1,97.01999999999988,7.463076923076913,Luis
Czech Republic,2,273.24000000000103,9.108000000000034,František
Other,1,37.61999999999999,3.761999999999999,Kara
Other,1,79.19999999999996,7.199999999999997,Terhi


In [93]:
%%sql
SELECT *,
    CASE
        WHEN Country ='Other' THEN 1
        ELSE 0
        END
        AS Sorting
    FROM cust 
    GROUP BY country
    ORDER BY AVG_Sales_Per_Customer desc


Done.


Country,Total_Customers,Total_Sales,AVG_Sales_Per_Customer,Customer_name,Sorting
Czech Republic,2,273.24000000000103,9.108000000000034,František,0
United Kingdom,3,245.5200000000008,8.768571428571457,Phil,0
India,2,183.1500000000002,8.72142857142858,Puja,0
Germany,4,334.6200000000016,8.161463414634186,Leonie,0
USA,13,1040.490000000008,7.942671755725252,Dan,0
France,5,389.0700000000021,7.781400000000042,Wyatt,0
Other,1,75.23999999999998,7.523999999999998,Joakim,1
Canada,8,535.5900000000034,7.047236842105309,Ellie,0
Brazil,5,427.6800000000025,7.011147540983647,Fernanda,0
Portugal,2,185.13000000000025,6.383793103448284,Madalena,0


As per the observation, following countries are having highest average sales:
1. Czech Republic
2. United Kingdom
3. India

Using Chinook data I've analyzed Employee Sales Performance, sales by country, Top albums based on Genres.