# Dashboard for analitic queries to the Sparkify songs data

Here is examples of possible analitic queries. You can create your own queries just using the SQL syntax. Database structure is described in the README.

First we should connect to the database `sparkifydb` (next step) and then we can answer on some analytical questions.

In [29]:
%load_ext sql
%sql postgresql://student:student@127.0.0.1/sparkifydb

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


'Connected: student@sparkifydb'

## How many unique users have Sparkify? How many free/paid users?

We need to find all unique users and calculate `total_count` (unique users count), `paid_count` (users on paid plan count) and `free_count` (users on free plan count).

In [28]:
%sql SELECT COUNT(DISTINCT user_id) as total_count \
        , COUNT(DISTINCT user_id) FILTER (WHERE level = 'paid') as paid_count \
        , COUNT(DISTINCT user_id) FILTER (WHERE level = 'free') as free_count \
        FROM users;

 * postgresql://student:***@127.0.0.1/sparkifydb
1 rows affected.


total_count,paid_count,free_count
96,22,82


## Find top 10 most popular songs

Company want to publish top charts of songs. Find top 10 songs that users listened to most often. Print `song` (name of the song), `artist` and `play_count` (how many times users listened the song).

> This query will return only single row because the subset of data contains only one match between data in songs and user logs. But it should work on real data.

In [44]:
%sql SELECT s.title as song \
            , a.name as artist  \
            , COUNT(*) as play_count \
        FROM songplays sp \
        INNER JOIN songs s ON s.song_id = sp.song_id \
        LEFT JOIN artists a ON a.artist_id = sp.artist_id \
        GROUP BY s.title, a.name \
        ORDER BY play_count DESC;

 * postgresql://student:***@127.0.0.1/sparkifydb
1 rows affected.


song,artist,play_count
Setanta matins,Elena,1


## Weekly statistics

Build a report for each year, month and week to show how many songs were played and how many unique users uses Sparkify service. Report should contain following fields: `year`, `month`, `week`, `song_count` (how many songs were played), `user_count` (unique users which used the service at least once this month).

In [50]:
%sql SELECT t.year \
        , t.month \
        , t.week \
        , COUNT(*) as song_count \
        , COUNT(DISTINCT sp.user_id) as user_count \
        FROM songplays sp \
        INNER JOIN time t ON t.start_time = sp.start_time \
        GROUP BY t.year, t.month, t.week \
        ORDER BY t.year ASC, t.month, t.week ASC;

 * postgresql://student:***@127.0.0.1/sparkifydb
5 rows affected.


year,month,week,song_count,user_count
2018,11,44,410,41
2018,11,45,1257,69
2018,11,46,1962,60
2018,11,47,1715,74
2018,11,48,1490,62


## REMEMBER: Restart this notebook to close connection to `sparkifydb`
Each time you run the cells above, remember to restart this notebook to close the connection to your database. Otherwise, you won't be able to run your code in `create_tables.py`, `etl.py`, `etl.ipynb` files since you can't make multiple connections to the same database (in this case, sparkifydb).