## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**550. Game Play Analysis IV(Medium)**

**Table: Activity**

| Column Name  | Type    |
|--------------|---------|
| player_id    | int     |
| device_id    | int     |
| event_date   | date    |
| games_played | int     |

(player_id, event_date) is the primary key (combination of columns with unique values) of this table.
This table shows the activity of players of some games.
Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.
 

**Write a solution to report the fraction of players that logged in again on the day after the day they first logged in, rounded to 2 decimal places. In other words, you need to count the number of players that logged in for at least two consecutive days starting from their first login date, then divide that number by the total number of players.**

The result format is in the following example.

**Example 1:**

**Input:**
**Activity table:**
| player_id | device_id | event_date | games_played |
|-----------|-----------|------------|--------------|
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-03-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |

**Output:**
| fraction  |
|-----------|
| 0.33      |

**Explanation:**
Only the player with id 1 logged back in after the first day he had logged in so the answer is 1/3 = 0.33


In [0]:
activity_data_550 = [
    (1, 2, "2016-03-01", 5),
    (1, 2, "2016-03-02", 6),
    (2, 3, "2017-06-25", 1),
    (3, 1, "2016-03-02", 0),
    (3, 4, "2018-07-03", 5)
]

activity_columns_550 = ["player_id", "device_id", "event_date", "games_played"]
activity_df_550 = spark.createDataFrame(activity_data_550, schema=activity_columns_550)
activity_df_550.show()

+---------+---------+----------+------------+
|player_id|device_id|event_date|games_played|
+---------+---------+----------+------------+
|        1|        2|2016-03-01|           5|
|        1|        2|2016-03-02|           6|
|        2|        3|2017-06-25|           1|
|        3|        1|2016-03-02|           0|
|        3|        4|2018-07-03|           5|
+---------+---------+----------+------------+



In [0]:
windowSpec = Window.partitionBy("player_id").orderBy(col("event_date").asc())

activity_df_550\
    .withColumn("prev_event_date", lag("event_date",1,None).over(windowSpec))\
        .withColumn("fraction", when(datediff(col("event_date"),col("prev_event_date"))  == 1,1).otherwise(0))\
            .agg( round(sum("fraction")/ countDistinct("player_id"), 2).alias("fraction")).show()

+--------+
|fraction|
+--------+
|    0.33|
+--------+

