# objective

explore influence of traffic, relation to performance of auctions. does item performance change with traffic differences? can we disqualify certain auctions from the data due to low/inadequate traffic?

In [1]:
import pandas as pd
import numpy as np

import sys
sys.path.insert(1, '../../scripts/')
from s3_support import *
%matplotlib inline

# load data

## transactions

In [2]:
# product, # of bids, bid increment, value, start $, end $
q = "select * from auctionitem"
df_ai = redshift_query_read(q, schema='production')

In [11]:
q = '''select
            ta.form,
            ta.product,
            ta.total as price,
            ta.status as status,
            count(distinct(b.ticketholder)) as bidders,
            count(b.ticketholder) as bids,
            count(distinct(ta.id)) as transactions,
            count(distinct(ta.bidder)) as winners
        from bidders as b
            left join transauction as ta on b.product=ta.product
        group by ta.product, ta.total, ta.status, ta.form'''
df_bids = redshift_query_read(q, schema='production')

In [12]:
df = df_ai.drop('id', axis=1).merge(df_bids, on='product')

In [19]:
df = df[(df['value']>0)&(df['bids']>1)]
df['price_ratio'] = df['price'] / df['value']
df['outperformer'] = df['price'] > df['value']
df['underperformer'] = df['price'] <= df['value']

In [20]:
print("{:,} entries".format(len(df)))
print("{:,} unique forms".format(len(df['form'].unique())))

14,664 entries
457 unique forms


## traffic

In [8]:
# get traffic for auction forms only
q = '''select
            ga.form,
            ga.date,
            sum(ga.views) as views,
            sum(ga.bounces) as bounces
        from googleanalytics_traffic as ga
            left join form as f on f.id=ga.form
        where f.type=5
        group by ga.form, ga.date'''
traffic = redshift_query_read(q, schema='production')

In [9]:
print("{:,} entries".format(len(traffic)))
print("{:,} unique forms".format(len(traffic['form'].unique())))

13,351 entries
696 unique forms


In [27]:
form_traffic = traffic.groupby('form')[['views', 'bounces']].sum().reset_index()
form_traffic['views_stayed'] = form_traffic['views'] - form_traffic['bounces']

In [26]:
form_traffic['views_stayed'].agg(['mean', 'median'])

mean      691.324713
median     16.000000
Name: views_stayed, dtype: float64

In [30]:
form_consistency = df.groupby('form')[['underperformer', 'outperformer']].median().reset_index()

In [31]:
op_forms = form_consistency[form_consistency['outperformer']>.5]['form'].unique()
up_forms = form_consistency[form_consistency['underperformer']>.5]['form'].unique()

## merge

In [25]:
df = df.merge(form_traffic, on='form')

In [40]:
len(df['form'].unique()), len(form_traffic['form'].unique())

(217, 696)

In [43]:
form_traffic = form_traffic[form_traffic['form'].isin(df['form'].unique().tolist())]
len(form_traffic['form'].unique())

217

# analysis

In [44]:
op_views = form_traffic[form_traffic['form'].isin(op_forms)]['views_stayed'].median()
up_views = form_traffic[form_traffic['form'].isin(up_forms)]['views_stayed'].median()

print("Median form page views:")
print("Outperformers: {:.1f}".format(op_views))
print("Underperformers: {:.1f}".format(up_views))

Median form page views:
Outperformers: 86.0
Underperformers: 32.0


Underperformers are seeing nearly a third of the traffic of the overperformers, which could explain the lack of fundraising as a function of attention. The bid increment/item value ratios hold fairly consistently in both groups so the correlation is clearly present. 

We will attempt to calculate a traffic adjusted bidding performance rank in order to account for this difference.

In [48]:
# look at traffic adjusted bid counts
df['bids_traffic_adjusted'] = df['bids'] / df['views_stayed']

df.groupby('outperformer')['bids_traffic_adjusted'].median()

outperformer
False    0.125000
True     0.178571
Name: bids_traffic_adjusted, dtype: float64

Outperformers do indeed receive more bids per page view than underperformers. So traffic alone is to not accountable for the differences.

In [51]:
# look at winning bid/value ratio by traffic
# do less seen items make less money?
df.groupby('outperformer')[['price_ratio', 'views_stayed']].corr()

Unnamed: 0_level_0,Unnamed: 1_level_0,price_ratio,views_stayed
outperformer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
False,price_ratio,1.0,0.01615
False,views_stayed,0.01615,1.0
True,price_ratio,1.0,-0.031817
True,views_stayed,-0.031817,1.0


The correlation between traffic and winning bid/value ratio is very weak in both groups, so it is clear that traffic is not a strong factor of maximizing winning bid

In [80]:
bins = [0, 10, 25, 50, 100, 500, 1000, 2500, 5000, 7500, 
        10000, 20000, 30000]
views_bins = pd.cut(df['views_stayed'], bins=bins).reset_index()

In [81]:
views_bins.groupby('views_stayed')['index'].count()

views_stayed
(0, 10]           1081
(10, 25]          1256
(25, 50]           881
(50, 100]          943
(100, 500]         704
(500, 1000]         10
(1000, 2500]       289
(2500, 5000]       645
(5000, 7500]       186
(7500, 10000]      222
(10000, 20000]     120
(20000, 30000]     337
Name: index, dtype: int64

In [83]:
views_bins['outperformer'] = df['outperformer']
views_bins['price_ratio'] = df['price_ratio']
views_bins['bids'] = df['bids']

In [91]:
print("Medians")
views_bins.groupby('views_stayed')[['outperformer', 'price_ratio', 'bids']].median().reset_index()

Medians


Unnamed: 0,views_stayed,outperformer,price_ratio,bids
0,"(0, 10]",0.0,0.72,8.0
1,"(10, 25]",0.0,0.75,7.0
2,"(25, 50]",0.0,0.7,7.0
3,"(50, 100]",0.0,0.76,7.0
4,"(100, 500]",0.0,0.74037,7.0
5,"(500, 1000]",0.0,0.775,2.5
6,"(1000, 2500]",0.0,0.714286,7.0
7,"(2500, 5000]",0.0,0.7,7.0
8,"(5000, 7500]",0.0,1.0,4.0
9,"(7500, 10000]",0.0,0.64,6.0


In [90]:
print("Means")
views_bins.groupby('views_stayed')[['outperformer', 'price_ratio', 'bids']].mean().reset_index()

Means


Unnamed: 0,views_stayed,outperformer,price_ratio,bids
0,"(0, 10]",0.19519,2.64674,52.093432
1,"(10, 25]",0.259554,1.041826,8.763535
2,"(25, 50]",0.160045,0.733357,9.860386
3,"(50, 100]",0.249205,36.232965,10.31071
4,"(100, 500]",0.294034,1.252121,9.12642
5,"(500, 1000]",0.2,0.877254,463.9
6,"(1000, 2500]",0.304498,0.953103,75.321799
7,"(2500, 5000]",0.209302,0.785453,8.285271
8,"(5000, 7500]",0.456989,2.864416,6.795699
9,"(7500, 10000]",0.171171,0.703837,8.5


Grouping by traffic, we can see that outperform/underperform and price/value ratios are evenly distributed while the bids counts very much appear random. These bins are fairly evenly accounting for representation in the sample sets so it appears to either have no or very weak influence over other features beyond extremes.