You invented a new algorithm for classification. The metrics you use for comparison is Accuracy. The new algorithm is better in term of the Total Accuracy, but really computationally heavier.

Should you change it?


In [31]:
import numpy as np
import scipy as sp
import pymc3 as pm
import arviz as az

In [23]:
acc_1 = [69.84172683593945, 70.03775288410537, 69.96999543879818, 
         70.13941506222103, 70.11524519279389, 69.87051262808936, 
         70.03098900350274, 70.01317015337584, 69.84430576519885, 70.15376236134091]

acc_2 = [76.69550304241477, 78.06757527749059, 71.29252044225439, 
         86.65062956671677, 85.62131067262195, 85.08135776997447, 
         72.35340551906377, 48.101510299416915, 84.99675411144258, 62.17064407858798]

In [42]:
print('Accurases of algorithms:', np.mean(acc_1), np.mean(acc_2))
print('STDs of accuraces:', np.std(acc_1), np.std(acc_2))

Accurases of algorithms: 70.00168753253656 75.10312107799841
STDs of accuraces: 0.11218279461054097 11.709686906945796


In [46]:
data = [acc_1, acc_2]
with pm.Model() as anova1:
    sigma = pm.Exponential('sigma', lam=1/10)
    mu = pm.Normal('mu', mu=0, sigma=100/2)

    alphas = [0]*2
    alphas[1] = pm.Normal('alpha_1', mu=0., sigma=0.1)
    alphas[0] = pm.Deterministic('alpha_0', -alphas[1])

    accs = [0]*2
    for i in range(2):
        accs[i] = pm.Normal(f'acc_{i}', mu=mu+alphas[i], sigma=sigma, observed=data[i])

    adiff = pm.Deterministic('adiff', alphas[1]-alphas[0])

    posterior = pm.sample(draws=1000, tune=3000, random_seed=42)#, chains=1, progressbar=False)

q1 = np.quantile(posterior['adiff'], 0.025)
q2 = np.quantile(posterior['adiff'], 0.975)
az.summary(posterior, )

  posterior = pm.sample(draws=1000, tune=3000, random_seed=42)#, chains=1, progressbar=False)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha_1, mu, sigma]


Sampling 4 chains for 3_000 tune and 1_000 draw iterations (12_000 + 4_000 draws total) took 16 seconds.


Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
mu,72.466,2.032,68.67,76.32,0.032,0.023,4063.0,2845.0,1.0
alpha_1,0.005,0.102,-0.189,0.197,0.002,0.002,4413.0,2888.0,1.0
sigma,9.196,1.612,6.364,12.186,0.026,0.019,3825.0,2370.0,1.0
alpha_0,-0.005,0.102,-0.197,0.189,0.002,0.002,4413.0,2888.0,1.0
adiff,0.011,0.203,-0.378,0.394,0.003,0.003,4413.0,2888.0,1.0


In [47]:
print(f"{q1} <= adiff <= {q2}")

-0.3933457910836786 <= adiff <= 0.4125738046032672


No evidence that the new algorithm is better. But let's check that we have different sigmas (we can't do it in frquentist ANOVA btw)

In [50]:
data = [acc_1, acc_2]
with pm.Model() as anova:
    sigmas = [pm.Exponential('sigma_1', lam=1/10), pm.Exponential('sigma_2', lam=1/10)]
    mu = pm.Normal('mu', mu=0, sigma=100/2)

    alphas = [0]*2
    alphas[1] = pm.Normal('alpha_1', mu=0., sigma=0.1)
    alphas[0] = pm.Deterministic('alpha_0', -alphas[1])

    accs = [0]*2
    for i in range(2):
        accs[i] = pm.Normal(f'acc_{i}', mu=mu+alphas[i], sigma=sigmas[i], observed=data[i])

    adiff = pm.Deterministic('adiff', alphas[1]-alphas[0])

    posterior = pm.sample(draws=1000, tune=3000, random_seed=42)#, chains=1, progressbar=False)

q1 = np.quantile(posterior['adiff'], 0.025)
q2 = np.quantile(posterior['adiff'], 0.975)
az.summary(posterior, )

  posterior = pm.sample(draws=1000, tune=3000, random_seed=42)#, chains=1, progressbar=False)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha_1, mu, sigma_2, sigma_1]


Sampling 4 chains for 3_000 tune and 1_000 draw iterations (12_000 + 4_000 draws total) took 22 seconds.
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
There were 5 divergences after tuning. Increase `target_accept` or reparameterize.
There were 3 divergences after tuning. Increase `target_accept` or reparameterize.


Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
mu,70.008,0.113,69.806,70.218,0.003,0.002,1077.0,1133.0,1.0
alpha_1,0.007,0.103,-0.182,0.195,0.003,0.002,1011.0,1391.0,1.0
sigma_1,0.14,0.042,0.081,0.216,0.001,0.001,1593.0,1350.0,1.0
sigma_2,13.414,3.237,8.37,19.416,0.083,0.06,1672.0,1961.0,1.0
alpha_0,-0.007,0.103,-0.195,0.182,0.003,0.002,1011.0,1391.0,1.0
adiff,0.013,0.207,-0.365,0.389,0.006,0.005,1011.0,1391.0,1.0


In [49]:
print(f"{q1} <= adiff <= {q2}")

-0.3766093094436013 <= adiff <= 0.41380102317005857


Still no evidence