    Problem Statement.

    Table: Activity

    +--------------+---------+
    | Column Name  | Type    |
    +--------------+---------+
    | player_id    | int     |
    | device_id    | int     |
    | event_date   | date    |
    | games_played | int     |
    +--------------+---------+
    (player_id, event_date) is the primary key of this table.
    This table shows the activity of players of some games.
    Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.



    The install date of a player is the first login day of that player.

    We define day one retention of some date x to be the number of players whose install date is x and they logged back in on the day right after x, divided by the number of players whose install date is x, rounded to 2 decimal places.

    Write an SQL query to report for each install date, the number of players that installed the game on that day, and the day one retention.

    Return the result table in any order.

    The query result format is in the following example.



    Example 1:

    Input: 
    Activity table:
    +-----------+-----------+------------+--------------+
    | player_id | device_id | event_date | games_played |
    +-----------+-----------+------------+--------------+
    | 1         | 2         | 2016-03-01 | 5            |
    | 1         | 2         | 2016-03-02 | 6            |
    | 2         | 3         | 2017-06-25 | 1            |
    | 3         | 1         | 2016-03-01 | 0            |
    | 3         | 4         | 2016-07-03 | 5            |
    +-----------+-----------+------------+--------------+
    Output: 
    +------------+----------+----------------+
    | install_dt | installs | Day1_retention |
    +------------+----------+----------------+
    | 2016-03-01 | 2        | 0.50           |
    | 2017-06-25 | 1        | 0.00           |
    +------------+----------+----------------+
    Explanation: 
    Player 1 and 3 installed the game on 2016-03-01 but only player 1 logged back in on 2016-03-02 so the day 1 retention of 2016-03-01 is 1 / 2 = 0.50
    Player 2 installed the game on 2017-06-25 but didn't log back in on 2017-06-26 so the day 1 retention of 2017-06-25 is 0 / 1 = 0.00



# With

In [None]:
with rank_days as
(select player_id, event_date,
rank() over (partition by player_id order by event_date) as day_rank
from Activity),

day_one as
(select distinct player_id, event_date
from rank_days
where day_rank = 1),

day_two as
(select distinct d.player_id, d.event_date, a.event_date as next_date
from day_one d left join Activity a
on d.player_id = a.player_id and DATE_ADD(d.event_date, INTERVAL 1 DAY) = a.event_date),

day_one_retention as
(select event_date as install_dt, count(next_date) as day_one_retention_count
from day_two
group by 1),

original_installs as
(select event_date as install_dt, count(distinct player_id ) as installs from rank_days
where day_rank = 1
group by 1)

select o.install_dt, o.installs, round(d.day_one_retention_count/o.installs, 2) as Day1_retention
from original_installs o left join day_one_retention d
on o.install_dt = d.install_dt
order by 1

# Simpler using Join

In [None]:
SELECT install_dt, COUNT(player_id) AS installs,
ROUND(COUNT(next_day) / COUNT(player_id), 2) AS Day1_retention
FROM (
    SELECT a1.player_id, a1.install_dt, a2.event_date AS next_day
    FROM
    (
        SELECT player_id, MIN(event_date) AS install_dt 
        FROM Activity
        GROUP BY player_id
    ) AS a1 
    LEFT JOIN Activity AS a2
    ON a1.player_id = a2.player_id
    AND a2.event_date = a1.install_dt + 1
) AS t
GROUP BY install_dt