# Dashboard for analytic queries to the Sparkify Date Warehouse

Here is examples of possible analytic queries. You can create your own queries just using the SQL syntax. Database structure is described in the README.

In [1]:
%load_ext sql

In [2]:
import configparser

First we should connect to the Data Warehouse (Amazon Redshift) and then we can answer on some analytical questions. 
> `CLUSTER` section of the configuration file `dwh.cfg` should be filled with Amazon Redshift cluster paramters.

In [6]:
config = configparser.ConfigParser()
config.read_file(open('dwh.cfg'))

DWH_ENDPOINT = config.get("CLUSTER", "HOST")
DWH_PORT = config.get("CLUSTER", "DB_PORT")
DWH_DB = config.get("CLUSTER", "DB_NAME")
DWH_DB_USER = config.get("CLUSTER", "DB_USER")
DWH_DB_PASSWORD = config.get("CLUSTER", "DB_PASSWORD")

In [7]:
conn_string="postgresql://{}:{}@{}:{}/{}".format(DWH_DB_USER, DWH_DB_PASSWORD, DWH_ENDPOINT, DWH_PORT, DWH_DB)
%sql $conn_string

'Connected: sparkifydwhuser@dev'

## Find top 10 most popular songs

Company want to publish top charts of songs. Find top 10 songs that users listened to most often. Print `song` (name of the song), `artist` and `play_count` (how many times users listened the song).

In [16]:
%sql SELECT s.title as song \
            , a.name as artist  \
            , COUNT(*) as play_count \
        FROM songplays sp \
        INNER JOIN songs s ON s.song_id = sp.song_id \
        LEFT JOIN artists a ON a.artist_id = sp.artist_id \
        GROUP BY s.title, a.name \
        ORDER BY play_count DESC \
        LIMIT 10;

 * postgresql://sparkifydwhuser:***@udacity-redshift-cluster-1.cuu7kzryitpe.us-west-2.redshift.amazonaws.com:5439/dev
10 rows affected.


song,artist,play_count
You're The One,Dwight Yoakam,37
Secrets,Carleen Anderson,17
Home,Gemma Hayes,13
Home,Eli Young Band,13
Home,Working For A Nuclear Free City,13
Home,Frozen Plasma,13
Catch You Baby (Steve Pitron & Max Sanna Radio Edit),Lonnie Gordon,9
I CAN'T GET STARTED,Ron Carter,9
Nothin' On You [feat. Bruno Mars] (Album Version),B.o.B,8
Float On,Rivera Rotation,7


## Weekly statistics

Build a report for each year, month and week to show how many songs were played and how many unique users uses Sparkify service. Report should contain following fields: `year`, `month`, `week`, `song_count` (how many songs were played), `user_count` (unique users which used the service at least once this month).

In [17]:
%sql SELECT t.year \
        , t.month \
        , t.week \
        , COUNT(*) as song_count \
        , COUNT(DISTINCT sp.user_id) as user_count \
        FROM songplays sp \
        INNER JOIN time t ON t.start_time = sp.start_time \
        GROUP BY t.year, t.month, t.week \
        ORDER BY t.year ASC, t.month, t.week ASC;

 * postgresql://sparkifydwhuser:***@udacity-redshift-cluster-1.cuu7kzryitpe.us-west-2.redshift.amazonaws.com:5439/dev
5 rows affected.


year,month,week,song_count,user_count
2018,11,44,63,15
2018,11,45,195,38
2018,11,46,358,37
2018,11,47,258,39
2018,11,48,270,32
