# Testing the reported peformance scores published for Electrohysterogram classification

In [1]:
from mlscorecheck.bundles import check_ehg
from mlscorecheck.aggregated import kfolds_generator

Testing the scores reported in U. R. Acharya, V. K. Sudarshan, S. Q. Rong, Z. Tan, C. M. Lim, J. E. Koh,
S. Nayak, S. V. Bhandary, Automated detection of premature delivery using emipirical mode and wavelet packet decomposition techniques with uterine electromyogram signals, Computers in biology and medicine 85 (2017) 33–42. doi: 10.1016/j.compbiomed.2017.04.013

In [2]:
folds = list(kfolds_generator(evaluation={'dataset': {'p': 38, 'n': 262}, 'folding': {'n_folds': 5}},
                                available_scores=['acc', 'sens', 'spec']))

In [3]:
len(folds)

918

In [4]:
# the 5-fold cross-validation scores reported in the paper
scores = {'acc': 0.9447, 'sens': 0.9139, 'spec': 0.9733}
eps = 0.0001

In [5]:
results = check_ehg(scores=scores, eps=eps, n_folds=5, n_repeats=1)

Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /home/gykovacs/anaconda3/envs/mlscorecheck/lib/python3.10/site-packages/pulp/solverdir/cbc/linux/64/cbc /tmp/c5010095f93144b8810365a3bf55c42c-pulp.mps timeMode elapsed branch printingOptions all solution /tmp/c5010095f93144b8810365a3bf55c42c-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 11 COLUMNS
At line 73 RHS
At line 80 BOUNDS
At line 92 ENDATA
Problem MODEL has 6 rows, 11 columns and 40 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Continuous objective value is 0 - 0.00 seconds
Cgl0000I Cut generators found to be infeasible! (or unbounded)
Pre-processing says infeasible or unbounded
Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.00   (Wallclock seconds):       0.00

Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /home/gykovacs/an

In [6]:
results['inconsistency']

True

The results show that the scores are inconsistent with the setup, they could not be the outcome of the claimed evaluation.

## Testing the assumption of the improper use of minority oversampling

In [7]:
from mlscorecheck.aggregated import fold_partitioning_generator
from mlscorecheck.check import check_1_dataset_known_folds_mos_scores, check_1_dataset_unknown_folds_mos_scores

In [8]:
# as reported in the paper
p_prime = 244
n = 262

In [9]:
# first we test with the high level functionality
results = check_1_dataset_unknown_folds_mos_scores(dataset={'p': p_prime, 'n': n},
                                                    folding={'n_folds': 5},
                                                    scores=scores,
                                                    eps=eps,
                                                    verbosity=0)

In [10]:
results['inconsistency']

False

In [11]:
results['details'][-1]['configuration_id']

962

In [12]:
# counting the fold configurations

count = 0
for _ in fold_partitioning_generator(p=p_prime, n=n, k=5):
    count += 1

In [13]:
print(count)

2616607


In [14]:
# extracting the evidence
folds = []
tptn = []

evidence = results['details'][-1]['details']

for fold in evidence['lp_configuration']['evaluations'][0]['folds']['folds']:
    folds.append((fold['fold']['p'], fold['fold']['n']))
    tptn.append((fold['fold']['tp'], fold['fold']['tn']))

In [15]:
folds

[(1, 101), (4, 97), (40, 61), (99, 2), (100, 1)]

In [16]:
tptn

[(1.0, 96.0), (3.0, 92.0), (38.0, 59.0), (90.0, 2.0), (96.0, 1.0)]