# 滴滴出行A/B测试和城市运营分析 

## 项目背景及分析目标：数据包含滴滴出行某次A/B测试的测试结果数据和在某城市的运营数据，针对测试结果做效果分析，并以提高平台订单完成率为目标，给出平台运营建议。

In [32]:
import pandas as pd
test = pd.read_excel('F:\\迅雷下载\\test.xlsx') #左斜杠需要2下
from pyecharts.charts import *
import pyecharts.options as opts
from pyecharts.components import Table
from pyecharts.globals import ThemeType
from pyecharts.commons.utils import JsCode
from pyecharts.options import ComponentTitleOpts

## A/B测试效果分析：在对照组和实验组分别对订单请求数，成交总额的方差和均值（t检验）进行假设检验。

## 1. 计算ROI

In [108]:
test.head()
test.group.value_counts()

experiment    29
control       29
Name: group, dtype: int64

In [4]:
test.isnull().sum()

date                 0
group                0
requests             0
gmv                  0
coupon per trip      0
trips                0
canceled requests    0
dtype: int64

In [5]:
test['roi']=test['gmv']/test['trips']*test['coupon per trip']

In [109]:
test['group'].value_counts()

experiment    29
control       29
Name: group, dtype: int64

## 2. 对requests进行方差和均值的假设检验

In [7]:
#一共29组用t检验
requests_A=test[test.group=='control'].requests
requests_B=test[test.group=='experiment'].requests

In [9]:
import scipy.stats as st
st.levene(requests_A,requests_B)    #用列文检测方差齐性

LeveneResult(statistic=0.014685075667736849, pvalue=0.903980667108546)

requests均值检验

In [10]:
#配对样本t检验（两独立样本t检验之前需检验是否齐方差，此处不需要）

In [11]:
st.ttest_rel(requests_A,requests_B)

Ttest_relResult(statistic=1.6436140982479508, pvalue=0.11143970454099938)

p值大于0.05，不拒绝原假设，因此可认为实验条件对requests影响不显著。

## 3. gmv的均值和方差检验

gmv的方差检验

In [18]:
gmv_a=test[test.group=='control'].gmv
gmv_b=test[test.group=='experiment'].gmv

In [19]:
st.levene(gmv_a,gmv_b)

LeveneResult(statistic=0.02865341299111212, pvalue=0.8661917430097603)

p值大于0.05，认为方差齐次。

In [20]:
st.ttest_rel(gmv_a,gmv_b)

Ttest_relResult(statistic=4.247583846321442, pvalue=0.00021564303983362577)

p值小于0.05,认为两组的gmv均值有显著差异。

## 城市运营分析

##  1.按时段维度分析：分析各个时段订单完成率，针对完成率低的时段结合各个时段的订单请求数，司机服务时长以及在忙率进行进一步分析。

In [24]:
# 各个时段订单完成率

In [25]:
city=pd.read_excel('F:\\迅雷下载\\city.xlsx')

In [26]:
city.head()

Unnamed: 0,date,hour,requests,trips,supply hours,average minutes of trips,pETA,aETA,utiliz
0,2013-09-01,11,79,55,42.63,20.43,5.51,7.19,0.47924
1,2013-09-01,12,73,41,36.43,15.53,5.48,8.48,0.426297
2,2013-09-01,13,54,50,23.02,17.76,5.07,8.94,0.771503
3,2013-09-02,11,193,170,64.2,31.47,5.31,6.55,0.490187
4,2013-09-02,12,258,210,80.28,38.68,4.94,6.08,0.481814


In [29]:
com_hour=city.groupby(['hour'],as_index=False).agg({'trips':sum,'requests':sum}).reset_index()

In [40]:
com_hour['rate']=(com_hour['trips']/com_hour['requests']*100).round(2)

In [41]:
c = (
    Bar()
    .add_xaxis(
   com_hour['hour'].astype(str).tolist()
    )
  .add_yaxis('订单完成率',com_hour['rate'].tolist())
    .set_global_opts(
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)),
        title_opts=opts.TitleOpts(title="各时段订单完成率"),
    ))
    
c.render_notebook()

各个时段的订单请求数

In [45]:
t2=city.groupby(['hour'],as_index=False).agg({'trips':sum})

In [46]:
c = (
    Bar()
    .add_xaxis(
   t2['hour'].astype(str).tolist()
    )
  .add_yaxis('订单总数',t2['trips'].tolist())
    .set_global_opts(
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)),
        title_opts=opts.TitleOpts(title="各时段订单总数"),
    ))
    
c.render_notebook()

司机服务时长和司机在忙率

司机在忙率

In [55]:
city['busy']=city['supply hours']*city['utiliz']

t4=city.groupby(['hour'],as_index=False).agg({'supply hours':sum,'busy':sum})
t4['busy_rate']=city['busy']/city['supply hours']

In [56]:
t4

Unnamed: 0,hour,supply hours,busy,busy_rate
0,11,1660.01,768.51,0.47924
1,12,1818.03,1115.28,0.426297
2,13,1274.6,730.83,0.771503


订单完成率最低的是13点，原因但并不是订单订单总数的原因，而是司机服务时长短（最低），在忙率最高。

## 2. 按日期的维度分析：分析每日的订单请求数，每日的订单完成率和每日司机在忙率，总结规律。

每日的订单请求数

In [68]:
t5=city.groupby('date',as_index=False).agg({'requests':sum})
t5

Unnamed: 0,date,requests
0,2013-09-01,206
1,2013-09-02,604
2,2013-09-03,238
3,2013-09-04,199
4,2013-09-05,782
5,2013-09-06,654
6,2013-09-07,1432
7,2013-09-08,965
8,2013-09-09,903
9,2013-09-10,146


In [71]:
c = (
    Bar()
    .add_xaxis(
   t5['date'].astype(str).tolist()
    )
  .add_yaxis('订单总数',t5['requests'].tolist())
    .set_global_opts(
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)),
        title_opts=opts.TitleOpts(title="各时段订单总数"),
    ))
    
c.render_notebook()

每日订单完成率分析

In [74]:
t6=city.groupby('date',as_index=False).agg({'requests':sum,'trips':sum})

t6['rate']=(t6['trips']/t6['requests']*100).round(2)

In [83]:
c = (
    Line()
    .add_xaxis(
       t6['date'].astype(str).tolist()
        )
      .add_yaxis('订单数',t6['requests'].tolist())
     .add_yaxis('订单完成率',t6['rate'].tolist(),yaxis_index=1)
     .extend_axis(
        yaxis=opts.AxisOpts(
            name="订单完成率",
            type_="value",
            axislabel_opts=opts.LabelOpts(formatter="{value}%"),
        ))
        .set_global_opts(
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)),
        title_opts=opts.TitleOpts(title="每日订单数和订单完成率"),
    
        yaxis_opts=opts.AxisOpts(
            name="点单数",
            type_="value",
            axislabel_opts=opts.LabelOpts(formatter="{value}个"
        ),
        )))
c.render_notebook()




In [80]:
city.head()

Unnamed: 0,date,hour,requests,trips,supply hours,average minutes of trips,pETA,aETA,utiliz,busy,busy_rate
0,2013-09-01,11,79,55,42.63,20.43,5.51,7.19,0.47924,20.43,0.47924
1,2013-09-01,12,73,41,36.43,15.53,5.48,8.48,0.426297,15.53,0.426297
2,2013-09-01,13,54,50,23.02,17.76,5.07,8.94,0.771503,17.76,0.771503
3,2013-09-02,11,193,170,64.2,31.47,5.31,6.55,0.490187,31.47,0.490187
4,2013-09-02,12,258,210,80.28,38.68,4.94,6.08,0.481814,38.68,0.481814


In [85]:
t7=city.groupby('date',as_index=False).agg({'supply hours':sum,'busy':sum})
t7['busy_rate']=(t7['busy']/t7['supply hours']).round(2)

In [86]:
c = (
    Bar()
    .add_xaxis(
   t7['date'].astype(str).tolist()
    )
  .add_yaxis('在忙率',t7['busy_rate'].tolist())
    .set_global_opts(
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)),
        title_opts=opts.TitleOpts(title="各时段在忙率"),
    ))
    
c.render_notebook()

## 3. 顾客等待时长和预计等待时长

In [103]:
import numpy as np
t8=city.groupby('date',as_index=False).agg({'pETA':np.mean,'aETA':np.mean})
t8[['pETA','aETA']]=t8[['pETA','aETA']].round(1)
t8

Unnamed: 0,date,pETA,aETA
0,2013-09-01,5.4,8.2
1,2013-09-02,5.1,6.3
2,2013-09-03,7.0,8.4
3,2013-09-04,6.7,7.6
4,2013-09-05,6.0,6.8
5,2013-09-06,6.0,7.1
6,2013-09-07,6.9,7.8
7,2013-09-08,5.8,6.9
8,2013-09-09,6.8,8.2
9,2013-09-10,6.5,7.9


In [104]:
c = (
    Bar()
    .add_xaxis(
   t8['date'].astype(str).tolist()
    )
  .add_yaxis('顾客预计等待时长',t8['pETA'].tolist())
    .add_yaxis('顾客实际等待时长',t8['aETA'].tolist())
    .set_global_opts(
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)),
        title_opts=opts.TitleOpts(title="每日顾客预计和顾客实际等待时长"),
    ))
    
c.render_notebook()

顾客的实际等待时长都要大于预计等待时长

## 重要分析结论及建议：
1)	实验条件对订单请求数均值影响不显著，对gmv均值有显著影响，控制组gmv均值更高。
2)	在13点的时段里司机订单完成率最低，但订单请求数不高，司机服务时长低，在忙绿率高，说明在该时段是可服务的司机数量少导致的。
3)	每日的订单完成率低谷出现在周末附近，同时周末的订单请求数，司机在忙率也是高峰，需要加大车辆在周末的投入。
4)	顾客实际等待时长明显大于顾客预计等待时长，应提升用户预计等待时长的准确性，优化平台派单逻辑。