Looking at P2P donation form redesign performance compared to the legacy design. Specifically, we want to look at donation conversion and average gift size. We also should look at adaption rate to see if we have a statistically significant sample and traffic to make sure each cohort is receiving relatively equitable attention.

In [1]:
import sys, datetime
sys.path.append("../../scripts/")
from s3_support import *

import pandas as pd
import numpy as np

%matplotlib inline

# 0. load data

## get form upgrades

We can find the updgrade/downgrade dates in the `syslog_logs` table. 

In [41]:
q = """select 
            created, 
            form, 
            message 
        from syslog_logs
        where 
            message like '%P2P Donation Form%'"""
logs = redshift_query_read(q, schema='production')

In [70]:
print("{:,} entries".format(len(logs)))
print("{:,} unique forms".format(len(logs['form'].unique())))
print("{:%Y-%m-%d} to {:%Y-%m-%d}".format(logs['created'].min(), logs['created'].max()))

373 entries
303 unique forms
2024-03-18 to 2024-10-28


In [68]:
START_DATE = logs['created'].min()

print(f"""The earliest conversion date we can find in 
the database is {START_DATE:%Y-%m-%d} so we will limit 
further queries for transactions and traffic to 
that date, forward.""")

The earliest conversion date we can find in 
the database is 2024-03-18 so we will limit 
further queries for transactions and traffic to 
that date, forward.


In [44]:
logs.tail(3)

Unnamed: 0,created,form,message
370,2024-09-15 11:44:17,1043473,P2P Donation Form Upgraded
371,2024-10-01 11:29:12,1045497,P2P Donation Form Upgraded
372,2024-10-01 15:32:33,1046780,P2P Donation Form Upgraded


## p2p trans

In [14]:
q = '''select
            date,
            form,
            count(id) as transactions,
            sum(amount) as volume
        from transactions
        where
            source='p2p' and
            date>='{}' and
            donations_count>0 and
            registrations_count=0
        group by form, date'''
q = q.format(START_DATE)
p2p_trans = redshift_query_read(q, schema='production')

In [69]:
print("{:,} entries".format(len(p2p_trans)))
print("{:,} unique forms".format(len(p2p_trans['form'].unique())))
print("{:%Y-%m-%d} to {:%Y-%m-%d}".format(p2p_trans['date'].min(), p2p_trans['date'].max()))

202,089 entries
6,506 unique forms
2021-04-02 to 2024-10-28


In [17]:
p2p_trans.head(3)

Unnamed: 0,date,form,transactions,volume
0,2022-11-08,919637,2,45.75
1,2022-11-22,919637,1,75.0
2,2021-09-22,938966,8,5985.0


## traffic

In [18]:
q = '''select
            date,
            form,
            sum(views) as views
        from ga
        where date>'{}' 
        group by form, date'''.format(START_DATE)
traffic = redshift_query_read(q, schema='production')

In [71]:
print("{:,} entries".format(len(traffic)))
print("{:,} unique forms".format(len(traffic['form'].unique())))
print("{:%Y-%m-%d} to {:%Y-%m-%d}".format(traffic['date'].min(), traffic['date'].max()))

3,240,879 entries
66,770 unique forms
2021-04-02 to 2024-09-03


In [20]:
traffic.head(3)

Unnamed: 0,date,form,views
0,2021-05-04,971474,2327
1,2024-04-19,1028658,124
2,2022-05-09,941532,1314


## merge data

In [21]:
mrgd = p2p_trans.merge(traffic, on=['date', 'form'], how='left')

In [27]:
mrgd['views'].fillna(0, inplace=True)
mrgd['conversion'] = mrgd['transactions'] / mrgd['views']

In [72]:
print("{:,} entries".format(len(mrgd)))
print("{:,} unique forms".format(len(mrgd['form'].unique())))
print("{:%Y-%m-%d} to {:%Y-%m-%d}".format(mrgd['date'].min(), mrgd['date'].max()))
print("{:.1f} average views".format(mrgd['views'].mean()))
print("{:.1f} average transactions".format(mrgd['transactions'].mean()))

mn_conv = mrgd[mrgd['conversion']<1]['conversion'].mean() * 100.
mdn_conv = mrgd[mrgd['conversion']<1]['conversion'].median() * 100.
print("{:.1f}% mean conversion; {:.1f}% median conversion".format(mn_conv, mdn_conv))

202,089 entries
6,506 unique forms
2021-04-02 to 2024-10-28
51.1 average views
6.3 average transactions
12.1% mean conversion; 6.7% median conversion


In [29]:
mrgd.head()

Unnamed: 0,date,form,transactions,volume,views,conversion
0,2022-11-08,919637,2,45.75,46.0,0.043478
1,2022-11-22,919637,1,75.0,8.0,0.125
2,2021-09-22,938966,8,5985.0,52.0,0.153846
3,2021-12-04,919637,2,100.0,31.0,0.064516
4,2021-06-15,938966,2,292.0,0.0,inf


In [47]:
# tag converted forms
relevant_forms = logs['form'].unique()
new_data = None

for form in relevant_forms:
    _df = mrgd[mrgd['form']==form]
    _logs = logs[logs['form']==form]
    
    try:
        start_date = _logs[_logs['message'].str.contains('Upgraded')]['created'].iloc[0]
    except:
        start_date = None
    try:
        end_date = _logs[_logs['message'].str.contains('Reverted')]['created'].iloc[0]
    except:
        end_date = None
    
    if start_date is not None and end_date is None:
        mrgd['is_new_form'] = mrgd['date']>=start_date
    elif start_date is None and end_date is not None:
        mrgd['is_new_form'] = mrgd['date']<=end_date
    else:
        mrgd['is_new_form'] = (mrgd['date']>=start_date)&(mrgd['date']<=end_date)
    
    new_data = pd.concat([_df, new_data])

In [50]:
df = mrgd[~mrgd['form'].isin(relevant_forms)].copy()
df = pd.concat([df, new_data])

# analysis

In [58]:
print("{:,} total entries".format(len(df)))
print("{:,} len removing outliers".format(len(df[df['conversion']<1])))

202,089 total entries
77,195 len removing outliers


_It appears we are missing a lot of recent traffic as more than 60% of the observations have more transactions than page views. Seeing this increasingly in later data, rapidly becoming a serious problem._

In [56]:
print("Average conversion:")
df[df['conversion']<1].groupby('is_new_form')['conversion'].agg(['mean', 'median']).reset_index()

Average conversion:


Unnamed: 0,is_new_form,mean,median
0,False,0.121391,0.066667
1,True,0.107043,0.076923


In [60]:
print("Average volume:")
df.groupby('is_new_form')['volume'].agg(['mean', 'median']).reset_index()

Average volume:


Unnamed: 0,is_new_form,mean,median
0,False,737.705705,165.0
1,True,903.377226,235.0


In [62]:
df['per_transaction'] = df['volume'] / df['transactions']
print("Average value per transaction:")
df.groupby('is_new_form')['per_transaction'].agg(['mean', 'median']).reset_index()

Average value per transaction:


Unnamed: 0,is_new_form,mean,median
0,False,144.932523,65.0
1,True,135.912078,75.0


In [57]:
print("Average views:")
df[df['conversion']<1].groupby('is_new_form')['views'].agg(['mean', 'median']).reset_index()

Average views:


Unnamed: 0,is_new_form,mean,median
0,False,133.575422,45.0
1,True,119.241414,51.5
