In [25]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import dask.dataframe as dd
import dask.array as da
import statsmodels.api as sm
import scipy.stats as stats
import pickle

In [2]:
df = dd.read_csv("orders.csv", dtype = {
        "id" : "int", 
        "timestamp" : "int", 
        "market" : "category",
        "pair" : "str",
        "side" : "category",
        "quantity" : "float",
        "price" : "float",
        "order_type" : "category",
        "execution_time" : "float"
    }
)

## 03 A/B Testing

Removing outliers

In [27]:
f = open('outliers_exectime_bool.pkl','rb')
outliers_exectime_bool = pickle.load(f)
f.close()

In [None]:
pd_ordertype = df.loc[:, "order_type"].compute()
pd_exectime = df.loc[:, "execution_time"].compute()

pd_ordertype_exout = pd_ordertype[~outliers_exectime_bool]
pd_exectime_exout = pd_exectime[~outliers_exectime_bool]

In [44]:
market_exectime = pd_exectime_exout[pd_ordertype_exout == "Market"]
limit_exectime = pd_exectime_exout[pd_ordertype_exout == "Limit"]

### (b) Compare median execution times

First look of data shows that the median for market order execution times is not shorter than of limit orders'.  
The concept of testing the hypothesis of "market orders having shorter execution times than limit orders" does not make sense as the null hypothesis will not be rejected.

In [45]:
print("Market Execution Time Median:", market_exectime.median())
print("Limit Execution Time Median:", limit_exectime.median())

Market Execution Time Median: 5.053
Limit Execution Time Median: 5.051


We can regardless, test if the two medians are significantly different.  
Mood's median test shows that the test results are significant at a 10% significance level, but not at a 5% significance level.

In [52]:
res = stats.median_test(market_exectime, limit_exectime)
res.pvalue.round(3)

np.float64(0.069)

### (a) Compare both strategies

When testing whether both samples (market vs limit) are from the same distribution through a one-sided Mann-Whitney U test assuming alternative hypothesis to be market order execution times are stochastically smaller than of limit orders for all samples, the result is insignificant.

In [None]:
res = stats.mannwhitneyu(market_exectime, limit_exectime, alternative = 'less')
res.pvalue.round(3)

np.float64(0.906)

### (c) Conclusions

We have executed two tests:
1. Mood's Median Test
2. Mann-Whitney U Test

**In Test 1**  
We tested $H_0:Med_{limit}=Med_{market}$ vs $H_0:Med_{limit}\neq Med_{market}$  
p-value was 0.069, we do not find sufficient evidence to reject null hypothesis at 5% significance level.

**In Test 2**  
We tested $H_0$: Both samples are of the same distribution vs $H_1$: market order execution times are stochastically smaller than of limit orders for all samples  
p-value was 0.906, we do not find sufficient evidence to reject null hypothesis at 5% significance level.