Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

45 游戏玩法分析 V #54

Open
astak16 opened this issue Feb 19, 2022 · 0 comments
Open

45 游戏玩法分析 V #54

astak16 opened this issue Feb 19, 2022 · 0 comments
Labels

Comments

@astak16
Copy link
Owner

astak16 commented Feb 19, 2022

题目

题目链接:游戏玩法分析 V

玩家的安装日期 event_date 定义为该玩家的第一个登录日。

玩家的 第一天留存率 定义为:假定安装日期为 X  的玩家的数量为 N ,其中在 X  之后的一天重新登录的玩家数量为 MM/N 就是第一天留存率,四舍五入到小数点后两位。

编写一个 SQL 查询,报告所有安装日期、当天安装游戏的玩家数量和玩家的第一天留存率。

create table activity (
	player_id int,
	device_id int,
	event_date date,
	games_played int
);

insert into activity (player_id, device_id, event_date, games_played) values
('1', '2', '2016-03-01', '5'),
('1', '2', '2016-03-02', '6'),
('2', '3', '2017-06-25', '1'),
('3', '1', '2016-03-01', '0'),
('3', '4', '2018-07-03', '5');

分析

需要几个值

  1. 每个玩家第一个登录日
  2. 第一个登陆日之后有没有登录
  3. 第一个登录日的玩家的数量,第一个登录日的第二天登录玩家的数量

SQL:方法一

select
	first_date as install_dt,
	count(*) installs,
	round(count(activity.event_date) / count(*), 2) as day1_retention
from (
	select
		player_id,
		min(event_date) as first_date
	from activity group by player_id
) t1 left join activity
on t1.player_id = activity.player_id
and datediff(activity.event_date, t1.first_date) = 1
group by first_date;

解析

  1. 每个玩家第一个登录日,作为 t1

    select
    	player_id,
    	min(event_date) as first_date
    from activity group by player_id;
  2. 需要得到每个玩家第一个登陆日之后第二天有没有登录

    1. t1 表和 activity 左连
    2. 这要注意输出的字段 event_date 是第一天登录之后第二天又登录的玩家日期
    select * from t1
    left join activity
    on t1.player_id = activity.player_id
    and datediff(activity.event_date, t1.first_date) = 1;
  3. 查询 t1activity 左连后的表,按照第一个登陆日进行分组 first_date

    1. count(*) 第一个登陆日的总人数
    2. count(activity.event_date) 第二个登录日的人数
    3. a/b 是第一天玩家的留存率

SQL:方法二

select
	first_date as install_dt,
	count(distinct player_id) as installs,
	round(
		sum(if(date_add(first_date, interval 1 day) = event_date, 1, 0))
		/ count(distinct player_id),
	2) as day1_retention
from (
	select
		player_id,
		event_date,
		min(event_date) over(partition by player_id) as first_date
	from activity
) t1 group by first_date;

解析

方法二的思路和方法一是一样的,使用了窗口函数代替了分组。

  1. 使用窗口函数,计算出每个玩家的第一次登录日期,作为临时表 t1

    select
    	player_id,
    	event_date,
    	min(event_date) over(partition by player_id) as first_date
    from activity;

    输出

    player_id	 |  event_date  |  first_date
    1	         |  2016-03-01  |  2016-03-01
    1             |	2016-03-02	|  2016-03-01
    2             |	2017-06-25	|  2017-06-25
    3	         |  2016-03-01	|  2016-03-01
    3	         |  2018-07-03	|  2016-03-01
  2. 查询 t1 ,按照 first_date 进行分组

    1. count(distinct player_id) 第一个登陆日的总人数
    2. sum(if(date_add(first_date, interval 1 day) = event_date, 1, 0))
      1. date_add(first_date, interval 1 day) = event_date 第一个登陆日后一天也登录的用户
      2. sum(if(expr, 1, 0)) 或者 count(if(expr, 1, null)) ,使用 sum 求和 if 表达式的 false 应该用 0 ,使用 count 求和 if 表达式的 false 应该用 null ,因为 count 会忽略 null

SQL:方法三

select
	a1.event_date as install_dt,
	count(a1.event_date) as installs,
	round(
		sum(if(datediff(a2.event_date, a1.event_date) = 1, 1, 0))
		 / count(a1.event_date),
	2) as day1_retention
from activity a1
left join activity a2 on a1.player_id = a2.player_id
and datediff(a2.event_date, a1.event_date) = 1
left join activity a3 on a1.player_id = a3.player_id
and a1.event_date > a3.event_date
where a3.event_date is null
group by a1.event_date;

解析

activity 表自身左连 2

  1. 筛选出第一天登录的玩家,第二天也登录了 datediff(a2.event_date, a1.event_date) = 1

    select
    	*
    from activity a1
    left join activity a2 on a1.player_id = a2.player_id
    and datediff(a2.event_date, a1.event_date) = 1
  2. 筛选出第一个登录日,因为 a1.event_date > a3.event_date ,所以第一个登录日 a3.event_datenull

    select
    	*
    from activity a1
    left join activity a3 on a1.player_id = a3.player_id
    and a1.event_date > a3.event_date where a3.event_date is null
  3. 合并第一步和第二步

    select
    	*
    from activity a1
    left join activity a2 on a1.player_id = a2.player_id
    and datediff(a2.event_date, a1.event_date) = 1
    left join activity a3 on a1.player_id = a3.player_id
    and a1.event_date > a3.event_date where a3.event_date is null
  4. 计算出对应的值

    1. a1.event_date 是第一个登录日
    2. count(a1.event_date) 第一个登录日的总人数
    3. sum(if(datediff(a2.event_date, a1.event_date) = 1, 1, 0)) 第一个登录日的第二天登录的总人数
    4. b/c 是第一天玩家的留存率

SQL:方法四

select
	first_date as install_dt,
	count(a1.event_date) as installs,
	round(
		sum(if(datediff(a2.event_date, a1.event_date) = 1, 1, 0))
		/ count(a1.event_date),
	2) as day1_retention
from (
	select
		player_id,
		min(event_date) first_date
	from activity group by player_id
) t1
left join activity a1 on t1.player_id = a1.player_id
and t1.first_date = a1.event_date
left join activity a2 on a1.player_id = a2.player_id
and datediff(a2.event_date, a1.event_date) = 1
group by first_date;

解析

方法四和方法三是一个思路,不同都是查询第一个登录日,用下面的方法代替方法三中的第二步,方法三中的第二步迷惑性很大,不太好理解。

select
	player_id,
	min(event_date) first_date
from activity group by player_id
@astak16 astak16 added the 困难 label Feb 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant