# 10.4 Competitive Auctions on eBay.com.

The file eBayAuctions.csv contains information on 1972 auctions transacted on eBay.com during May–June 2004. The goal is to use these data to build a model that will distinguish competitive auctions from noncompetitive ones. A competitive auction is defined as an auction with at least two bids placed on the item being auctioned. The data include variables that describe the item (auction category), the seller (his or her eBay rating), and the auction terms that the seller selected (auction duration, opening price, currency, day of week of auction close). In addition, we have the price at which the auction closed. The goal is to predict whether or not an auction of interest will be competitive. 

Shmueli, Galit,Bruce, Peter C.,Gedeck, Peter,Patel, Nitin R.. Data Mining for Business Analytics (Kindle Locations 9315-9320). Wiley. Kindle Edition. 

In [None]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV
from sklearn.model_selection import train_test_split
import statsmodels.api as sm
from mord import LogisticIT
import matplotlib.pylab as plt
import seaborn as sns
from dmba import classificationSummary, gainsChart, liftChart
from dmba.metric import AIC_score 

no display found. Using non-interactive Agg backend


In [None]:
ebay_df = pd.read_csv('eBayAuctions.csv')
ebay_df

# Treat Category as categorical, convert to dummy variables
ebay_df['Category'] = ebay_df['Category'].astype('category')
ebay_df['currency'] = ebay_df['currency'].astype('category')
ebay_df['endDay'] = ebay_df['endDay'].astype('category')

# Category
#cat_labels = ebay_df['Category'].astype('category').cat.categories.tolist()
#replace_map_cat = {'Category' : {k: v for k,v in zip(cat_labels,list(range(1,len(cat_labels)+1)))}}
#ebay_df.replace(replace_map_cat, inplace=True)

# currency
#cur_labels = ebay_df['currency'].astype('category').cat.categories.tolist()
#replace_map_cur = {'currency' : {k: v for k,v in zip(cur_labels,list(range(1,len(cur_labels)+1)))}}
#ebay_df.replace(replace_map_cur, inplace=True)

# endDay
#endDay_labels = ebay_df['endDay'].astype('category').cat.categories.tolist()
#replace_map_endDay = {'endDay' : {k: v for k,v in zip(endDay_labels,list(range(1,len(endDay_labels)+1)))}}
#ebay_df.replace(replace_map_endDay, inplace=True)

#ebay_df = pd.get_dummies(ebay_df, prefix_sep='_', drop_first=True)
ebay_df

Unnamed: 0,Category,currency,sellerRating,Duration,endDay,ClosePrice,OpenPrice,Competitive?
0,Music/Movie/Game,US,3249,5,Mon,0.01,0.01,0
1,Music/Movie/Game,US,3249,5,Mon,0.01,0.01,0
2,Music/Movie/Game,US,3249,5,Mon,0.01,0.01,0
3,Music/Movie/Game,US,3249,5,Mon,0.01,0.01,0
4,Music/Movie/Game,US,3249,5,Mon,0.01,0.01,0
...,...,...,...,...,...,...,...,...
1967,Automotive,US,2992,5,Sun,359.95,359.95,0
1968,Automotive,US,21,5,Sat,610.00,300.00,1
1969,Automotive,US,1400,5,Mon,549.00,549.00,0
1970,Automotive,US,57,7,Fri,820.00,650.00,1


In [None]:
excludeColumns = ('Competitive?')
predictors = [s for s in ebay_df.columns if s not in excludeColumns]
outcome = 'Competitive?'

X = pd.get_dummies(ebay_df[predictors], drop_first=True)
y = ebay_df[outcome]

# partition data
train_X, valid_X, train_y, valid_y = train_test_split(X, y, test_size=0.4, random_state=1)

train_X


Unnamed: 0,sellerRating,Duration,ClosePrice,OpenPrice,Category_Automotive,Category_Books,Category_Business/Industrial,Category_Clothing/Accessories,Category_Coins/Stamps,Category_Collectibles,...,Category_SportingGoods,Category_Toys/Hobbies,currency_GBP,currency_US,endDay_Mon,endDay_Sat,endDay_Sun,endDay_Thu,endDay_Tue,endDay_Wed
503,578,10,4.93,2.45,0,0,0,0,0,1,...,0,0,0,0,1,0,0,0,0,0
733,2349,7,5.61,3.60,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
383,884,10,2.45,2.45,0,0,0,0,0,1,...,0,0,0,0,1,0,0,0,0,0
725,2349,7,5.50,3.60,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
310,104,7,3.07,1.23,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1791,2427,3,33.95,33.95,1,0,0,0,0,0,...,0,0,0,1,0,0,1,0,0,0
1096,2046,5,7.50,7.50,0,0,0,0,0,1,...,0,0,0,1,0,0,0,0,0,0
1932,534,7,154.23,79.99,0,0,0,0,0,0,...,0,1,0,1,0,0,1,0,0,0
235,1853,10,1.23,1.23,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,1,0


In [None]:
# fit a logistic regression (set penalty=l2 and C=1e42 to avoid regularization)
logit_full = LogisticRegression(penalty="l2", C=1e42, solver='liblinear')
logit_full.fit(train_X, train_y)
print('intercept ', logit_full.intercept_[0])
print(pd.DataFrame({'coeff': logit_full.coef_[0]}, index=X.columns).transpose())
print('AIC', AIC_score(valid_y, logit_full.predict(valid_X), df = len(train_X.columns) + 1)) 


intercept  -0.4452116538916522
       sellerRating  Duration  ClosePrice  OpenPrice  Category_Automotive  \
coeff     -0.000045  0.012535     0.08934  -0.106258            -0.541553   

       Category_Books  Category_Business/Industrial  \
coeff        0.238868                      1.386651   

       Category_Clothing/Accessories  Category_Coins/Stamps  \
coeff                      -1.268502              -2.048303   

       Category_Collectibles  ...  Category_SportingGoods  \
coeff              -0.017439  ...               -0.030414   

       Category_Toys/Hobbies  currency_GBP  currency_US  endDay_Mon  \
coeff               0.306249      1.736434     0.541007    0.422789   

       endDay_Sat  endDay_Sun  endDay_Thu  endDay_Tue  endDay_Wed  
coeff   -0.627914   -0.450058   -0.641878   -0.188319   -0.646831  

[1 rows x 29 columns]
AIC 1177.7523711842023


In [None]:
logit_reg_pred = logit_full.predict_proba(valid_X)
full_result = pd.DataFrame({'actual': valid_y, 'p(0)': [p[0] for p in logit_reg_pred],'p(1)': [p[1] for p in logit_reg_pred],'predicted': logit_full.predict(valid_X)})
full_result = full_result.sort_values(by=['p(1)'], ascending=False)
# confusion matrix
classes = ['competitive', 'weak']
classificationSummary(full_result.actual, full_result.predicted, class_names=classes)
gainsChart(full_result.actual, figsize=[5, 5])
plt.show() 


Confusion Matrix (Accuracy 0.7592)

            Prediction
     Actual competitive        weak
competitive         276          77
       weak         113         323


In [None]:
# An alternative to scikit ’s LogisticRegression is method sm.glm in 
# Statsmodel...

# add constant column
ebay_df = sm.add_constant(ebay_df, prepend=True)

excludeColumns = ('Competitive?')
predictors = [s for s in ebay_df.columns if s not in excludeColumns]
outcome = 'Competitive?'

X = pd.get_dummies(ebay_df[predictors], drop_first=True)
y = ebay_df[outcome]

# partition data
train_X, valid_X, train_y, valid_y = train_test_split(X, y, test_size=0.4, random_state=1)

# use GLM (general linear model) with the binomial family to fit a logistic regression
logit_reg = sm.GLM(train_y, train_X, family=sm.families.Binomial())
logit_result = logit_reg.fit()
logit_result.summary() 

0,1,2,3
Dep. Variable:,Competitive?,No. Observations:,1183.0
Model:,GLM,Df Residuals:,1153.0
Model Family:,Binomial,Df Model:,29.0
Link Function:,logit,Scale:,1.0
Method:,IRLS,Log-Likelihood:,-592.11
Date:,"Fri, 12 Mar 2021",Deviance:,1184.2
Time:,20:51:26,Pearson chi2:,9500000000.0
No. Iterations:,22,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,-0.4758,0.517,-0.920,0.358,-1.490,0.538
sellerRating,-4.486e-05,1.64e-05,-2.727,0.006,-7.71e-05,-1.26e-05
Duration,0.0134,0.047,0.285,0.775,-0.078,0.105
ClosePrice,0.0899,0.009,9.509,0.000,0.071,0.108
OpenPrice,-0.1070,0.011,-9.534,0.000,-0.129,-0.085
Category_Automotive,-0.5384,0.384,-1.402,0.161,-1.291,0.214
Category_Books,0.2781,0.455,0.611,0.541,-0.613,1.170
Category_Business/Industrial,1.3093,0.873,1.499,0.134,-0.402,3.021
Category_Clothing/Accessories,-1.2532,0.432,-2.902,0.004,-2.099,-0.407


# Conclusion

The models defined help isolate which characteristics play a larger role in the prediction. In our case, the "Category" seem to be very important in prediction the success of the Ebay auction, a lot more so than the day the auction ended. The Photography category in general seem to yield competitive auctions. To note, the P-value for this particular dummie variable, although the coefficient was large, is not great. So the result may not be that meaningful.

The seller feedback could also help predict how competite an auction will do and the data should be kept in a predicting model. The P-value for the seller feedback was also very encouraging.


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=52e9ae2e-8d42-48c9-9988-588f5a262306' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>