In [1]:
import pandas as pd
path1 = "test_result.csv"

In [2]:
df1 = pd.read_csv(path1)

In [3]:
df1.head(2)

Unnamed: 0,user_id,timestamp,source,device,operative_system,test,price,converted
0,604839,2015-05-08 03:38:34,ads_facebook,mobile,iOS,0,39,0
1,624057,2015-05-10 21:08:46,seo-google,mobile,android,0,39,0


## Business Context

Pricing optimization is, non surprisingly, another area where data science can provide huge value.
The goal here is to evaluate whether a pricing test running on the site has been successful. As always, you should focus on user segmentation and provide insights about segments who behave differently as well as any other insights you might find.


### Problem Statement

Company XYZ sells a software for 39 dollars. Since revenue has been flat for some time, the VP of Product has decided to run a test increasing the price. She hopes that this would increase revenue. In the experiment, 66% of the users have seen the old price (39 dollars), while a random sample of 33% users a higher price (59 dollars).

The test has been running for some time and the VP of Product is interested in understanding how it went and whether it would make sense to increase the price for all the users.
Especially he asked you the following questions:

- Should the company sell its software for 39 or 59 dollars?
- The VP of Product is interested in having a holistic view into user behavior, especially focusing on actionable insights that might increase conversion rate. What are your main findings looking at the data?


#### Data

**test_results.csv**

Columns:
- user_id : the Id of the user. Can be joined to user_id in user_table
- timestamp : the date and time when the user hit for the first time company XYZ webpage. It is in user local time
- source : marketing channel that led to the user coming to the site. It can be:
    - ads-["google", "facebook", "bing", "yahoo", "other"]. That is, user coming from google ads, yahoo ads, etc.
    - seo - ["google", "facebook", "bing", "yahoo", "other"]. That is, user coming from google search, yahoo, facebook, etc.
    - friend_referral : user coming from a referral link of another user
    - direct_traffic: user coming by directly typing the address of the site on the browser
- device : user device. Can be mobile or web
- operative_system : user operative system. Can be: "windows", "linux", "mac" for web, and "android", "iOS" for mobile.Other if it is none of the above
- test: whether the user was in the test (i.e. 1 -> higher price) or in control (0 -> oldlower price)
- price : the price the user sees. It should match test
- converted : whether the user converted (i.e. 1 -> bought the software) or not (0 -> left the site without buying it).


In [4]:
### Explore the data
df1[df1['test']==0]

Unnamed: 0,user_id,timestamp,source,device,operative_system,test,price,converted
0,604839,2015-05-08 03:38:34,ads_facebook,mobile,iOS,0,39,0
1,624057,2015-05-10 21:08:46,seo-google,mobile,android,0,39,0
2,317970,2015-04-04 15:01:23,ads-bing,mobile,android,0,39,0
4,820854,2015-05-24 11:04:40,ads_facebook,web,mac,0,39,0
5,169971,2015-04-13 12:07:08,ads-google,mobile,iOS,0,39,0
...,...,...,...,...,...,...,...,...
316793,680578,2015-04-24 10:13:45,seo-yahoo,mobile,iOS,0,39,0
316795,17427,2015-04-11 09:29:15,ads_facebook,web,windows,0,39,0
316796,687787,2015-03-16 23:31:55,direct_traffic,web,windows,0,39,0
316797,618863,2015-04-11 01:35:19,ads-google,web,mac,0,39,0


In [5]:
df1[df1['test']==0]['price'].describe()

count    202727.000000
mean         39.020718
std           0.643369
min          39.000000
25%          39.000000
50%          39.000000
75%          39.000000
max          59.000000
Name: price, dtype: float64

In [6]:
df1[(df1['test']==0)&(df1['price']==59)] ### these need to be replaced

Unnamed: 0,user_id,timestamp,source,device,operative_system,test,price,converted
8238,500863,2015-05-06 22:40:51,ads_other,mobile,iOS,0,59,0
8369,791541,2015-04-13 10:24:19,ads-bing,web,windows,0,59,0
11555,402699,2015-05-16 12:08:45,direct_traffic,mobile,other,0,59,0
12848,624380,2015-05-22 12:37:19,seo-google,mobile,iOS,0,59,0
14630,577544,2015-04-24 17:44:57,seo-google,mobile,android,0,59,0
...,...,...,...,...,...,...,...,...
312725,894867,2015-03-05 10:44:54,ads_facebook,web,windows,0,59,0
313735,237101,2015-05-06 22:40:51,ads_other,mobile,iOS,0,59,0
314275,666946,2015-05-22 13:32:48,direct_traffic,web,mac,0,59,0
315529,590389,2015-04-14 04:07:41,direct_traffic,mobile,iOS,0,59,0


In [7]:
### Check for the test group
df1[df1['test']==1]

Unnamed: 0,user_id,timestamp,source,device,operative_system,test,price,converted
3,685636,2015-05-07 07:26:01,direct_traffic,mobile,iOS,1,59,0
7,798371,2015-03-15 08:19:29,ads-bing,mobile,android,1,59,1
8,447194,2015-03-28 12:28:10,ads_facebook,web,windows,1,59,0
9,431639,2015-04-24 12:42:18,ads_facebook,web,windows,1,59,0
15,552048,2015-03-22 08:58:32,ads-bing,web,windows,1,59,0
...,...,...,...,...,...,...,...,...
316777,190563,2015-05-17 12:03:19,seo_facebook,mobile,android,1,59,0
316778,796427,2015-04-02 09:33:18,seo-google,web,windows,1,59,0
316782,964001,2015-05-05 13:31:19,ads_other,web,windows,1,59,0
316794,388438,2015-05-20 11:34:44,seo-google,web,windows,1,59,0


In [8]:
df1[df1['test']==1]['price'].describe()

count    114073.000000
mean         58.972824
std           0.736735
min          39.000000
25%          59.000000
50%          59.000000
75%          59.000000
max          59.000000
Name: price, dtype: float64

In [9]:
df1[(df1['test']==1)&(df1['price']==39)] ## these need to be removed

Unnamed: 0,user_id,timestamp,source,device,operative_system,test,price,converted
1457,686486,2015-03-28 15:26:19,seo-other,mobile,android,1,39,0
1912,128338,2015-05-15 11:41:49,direct_traffic,mobile,android,1,39,0
2337,220590,2015-03-27 12:31:43,ads-google,web,windows,1,39,0
3147,246390,2015-05-30 08:29:44,direct_traffic,mobile,iOS,1,39,0
4277,906451,2015-04-05 11:09:18,ads-google,web,windows,1,39,0
...,...,...,...,...,...,...,...,...
313723,557784,2015-05-17 16:04:53,ads-yahoo,web,windows,1,39,0
314391,24049,2015-04-12 14:39:38,seo-google,web,mac,1,39,0
314402,191130,2015-04-10 15:45:42,direct_traffic,mobile,android,1,39,0
314696,237644,2015-05-15 11:41:49,direct_traffic,mobile,android,1,39,0


In [10]:
df1.drop(df1[(df1['test']==1)&(df1['price']==39)].index,inplace=True)
df1.drop(df1[(df1['test']==0)&(df1['price']==59)].index,inplace=True)

In [11]:
control = df1[df1['test']==0]
test = df1[df1['test']==1]

In [12]:
def get_day(x):
    return x.split("-")[2].split(" ")[0]

control['date'] = control['timestamp'].map(get_day).copy()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  control['date'] = control['timestamp'].map(get_day).copy()


In [13]:
(control.groupby('date')['converted'].mean()).mean()

0.019902288890802824

In [14]:
test['date'] = test['timestamp'].map(get_day).copy()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test['date'] = test['timestamp'].map(get_day).copy()


In [15]:
(test.groupby('date')['converted'].mean()).mean()

0.01558622633155859

In [16]:
import scipy.stats as stats

In [17]:
stats.ttest_ind(control.groupby('date')['converted'].mean(),
                test.groupby('date')['converted'].mean())

Ttest_indResult(statistic=9.157286886253278, pvalue=5.371296936448168e-13)

```Conclusion
p-value is small so we can reject H0, lower pricing is increasing the conversions
```