# DeepCpG basics

This tutorial describes how to create DeepCpG data files, train models, and evaluate models. These topics are described in greater detail in the [DeepCpG documentation](http://deepcpg.readthedocs.io/en/latest/).

## Table of Contents
* [Initialization](#Initialization)
* [Creating DeepCpG data files](#Creating-DeepCpG-data-files)
* [Training models](#Training-models)
* [Imputing methylation profiles](#Imputing-methylation-profiles)
* [Exporting methylation profiles](#Exporting-methylation-profiles)
* [Evaluating prediction performances](#Evaluating-prediction-performances)

## Initialization

We first initialize some variables that will be used throughout the tutorial. `test_mode=1` should be used for testing purposes, which speeds up computations by only using a subset of the data. For real applications, `test_mode=0` should be used.

In [1]:
function run {
  local cmd=$@
  echo
  echo "#################################"
  echo $cmd
  echo "#################################"
  eval $cmd
}

test_mode=1 # Set to 1 for testing and 0 otherwise.
example_dir="../../data" # Example data.
cpg_dir="$example_dir/cpg" # CpG profiles.
dna_dir="$example_dir/dna/mm10" # DNA sequences.
anno_dir="$example_dir/anno/mm10" # Annotations of genomic contexts.

data_dir="./data" # DeepCpG data.
models_dir="./models" # Trained models.
mkdir -p $models_dir
eval_dir="./eval" # Evaluation data.
mkdir -p $eval_dir

## Creating DeepCpG data files

We first store the known CpG methylation states of each cell into a tab delimted file with the following columns:
* Chromosome (without chr)
* Position of the CpG site on the chromosome
* Binary methylation state of the CpG sites (0=unmethylation, 1=methylated)

CpG sites with a methylation rate between zero and one should be binarized by rounding. Filenames should correspond to cell names. 

Each position must point the cytosine residue of a CpG site (positions enumerated from 1). Otherwise `dcpg_data.py` will report warnings, e.g. if a wrong genome is used or CpG sites were not correctly aligned.

For this tutorial we are using a subset of serum mouse embryonic stem cells from *Smallwood et al. (2014)*:

In [2]:
ls $cpg_dir

BS27_1_SER.tsv BS27_3_SER.tsv BS27_5_SER.tsv BS27_6_SER.tsv BS27_8_SER.tsv


We can have a look at the methylation profile of cell 'BS27_1_SER':

In [3]:
head "$cpg_dir/BS27_1_SER.tsv"

1	3000827	1.0
1	3001007	1.0
1	3001018	1.0
1	3001277	1.0
1	3001629	1.0
1	3003226	1.0
1	3003339	1.0
1	3003379	1.0
1	3006416	1.0
1	3007580	1.0


Since we are dealing with mouse cells, we are using the mm10 (GRCm38) mouse genome build:

In [4]:
ls $dna_dir

Mus_musculus.GRCm38.dna.chromosome.1.fa.gz
Mus_musculus.GRCm38.dna.chromosome.10.fa.gz
Mus_musculus.GRCm38.dna.chromosome.11.fa.gz
Mus_musculus.GRCm38.dna.chromosome.12.fa.gz
Mus_musculus.GRCm38.dna.chromosome.13.fa.gz
Mus_musculus.GRCm38.dna.chromosome.14.fa.gz
Mus_musculus.GRCm38.dna.chromosome.15.fa.gz
Mus_musculus.GRCm38.dna.chromosome.16.fa.gz
Mus_musculus.GRCm38.dna.chromosome.17.fa.gz
Mus_musculus.GRCm38.dna.chromosome.18.fa.gz
Mus_musculus.GRCm38.dna.chromosome.19.fa.gz
Mus_musculus.GRCm38.dna.chromosome.2.fa.gz
Mus_musculus.GRCm38.dna.chromosome.3.fa.gz
Mus_musculus.GRCm38.dna.chromosome.4.fa.gz
Mus_musculus.GRCm38.dna.chromosome.5.fa.gz
Mus_musculus.GRCm38.dna.chromosome.6.fa.gz
Mus_musculus.GRCm38.dna.chromosome.7.fa.gz
Mus_musculus.GRCm38.dna.chromosome.8.fa.gz
Mus_musculus.GRCm38.dna.chromosome.9.fa.gz
Mus_musculus.GRCm38.dna.chromosome.MT.fa.gz
Mus_musculus.GRCm38.dna.chromosome.X.fa.gz
Mus_musculus.GRCm38.dna.chromosome.Y.fa.gz


These files were downloaded by `setup.py`. Other genomes, e.g. human genome hg38, can be downloaded, for example, with the following command:

```bash
wget ftp://ftp.ensembl.org/pub/release-86/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.*.fa.gz
```

Now we can run `dcpg_data.py` to create the input data for DeepCpG. For testing purposes, we only consider a few CpG sites on chromosome 19:

In [5]:
cmd="dcpg_data.py
    --cpg_profiles $cpg_dir/*.tsv
    --dna_files $dna_dir
    --out_dir $data_dir
    --cpg_wlen 50
    --dna_wlen 1001
"
if [[ $test_mode -eq 1 ]]; then
    cmd="$cmd
        --chromo 1 13
        --nb_sample_chromo 1000
        "
fi
run $cmd


#################################
dcpg_data.py --cpg_profiles ../../data/cpg/BS27_1_SER.tsv ../../data/cpg/BS27_3_SER.tsv ../../data/cpg/BS27_5_SER.tsv ../../data/cpg/BS27_6_SER.tsv ../../data/cpg/BS27_8_SER.tsv --dna_files ../../data/dna/mm10 --out_dir ./data --cpg_wlen 50 --dna_wlen 1001 --chromo 1 13 --nb_sample_chromo 1000
#################################
INFO (2017-04-14 10:26:50,391): Reading CpG profiles ...
INFO (2017-04-14 10:26:50,391): ../../data/cpg/BS27_1_SER.tsv
INFO (2017-04-14 10:26:56,879): ../../data/cpg/BS27_3_SER.tsv
INFO (2017-04-14 10:27:01,592): ../../data/cpg/BS27_5_SER.tsv
INFO (2017-04-14 10:27:09,046): ../../data/cpg/BS27_6_SER.tsv
INFO (2017-04-14 10:27:14,598): ../../data/cpg/BS27_8_SER.tsv
INFO (2017-04-14 10:27:19,908): 2000 samples
INFO (2017-04-14 10:27:19,908): --------------------------------------------------------------------------------
INFO (2017-04-14 10:27:19,909): Chromosome 1 ...
INFO (2017-04-14 10:27:19,923): 1000 / 1000 (100.0%) sites mat

For each CpG site that is observed in at least one cell, this command extracts the 50 neighboring CpG sites (25 to the left and 25 to the right), and the 1001 bp long DNA sequence window centered on the CpG site. In test mode, only 1000 CpG sites will be randomly sampled from chromosome 1 and 13. The command creates multiple HDF5 files with name `cX_FROM_TO.h5`, where `X` is the chromosome, and `FROM` and `TO` the index of CpG sites stored in the file:

In [6]:
ls $data_dir

c13_000000-001000.h5 c1_000000-001000.h5


## Training models

We can now train models on the created data. First, we need to split the data into a training, validation set, and test set. The training set should contain at least 3 million CpG sites. We will use chromosome 1, 3, 5, 7, and 19 as training set, and chromosome 13, 14, 15, 16, and 17 as validation set:

In [7]:
function get_data_files {
  local data_dir=$1
  shift
  local chromos=$@

  files=""
  for chromo in $chromos; do
    files="$files $(ls $data_dir/c${chromo}_*.h5 2> /dev/null)"
  done
  echo $files
}

train_files=$(get_data_files $data_dir 1 3 5 7 9)
val_files=$(get_data_files $data_dir 13 14 15 16 17)

In [8]:
echo $train_files

./data/c1_000000-001000.h5


In [9]:
echo $val_files

./data/c13_000000-001000.h5


We can count the number of CpG sites in the training set using `dcpg_data_stats.py`:

In [10]:
cmd="dcpg_data_stats.py $train_files"
run $cmd


#################################
dcpg_data_stats.py ./data/c1_000000-001000.h5
#################################
           output  nb_tot  nb_obs  frac_obs      mean       var
0  cpg/BS27_1_SER    1000     187     0.187  0.775401  0.174154
1  cpg/BS27_3_SER    1000     208     0.208  0.711538  0.205251
2  cpg/BS27_5_SER    1000     200     0.200  0.690000  0.213900
3  cpg/BS27_6_SER    1000     195     0.195  0.666667  0.222222
4  cpg/BS27_8_SER    1000     210     0.210  0.776190  0.173719


For each output cell, `nb_tot` is the total number of CpG sites, `nb_obs` the number of CpG sites with known methylation state, `frac_obs` the ratio between `nb_obs` and `nb_tot`, `mean` the mean methylation rate, and `var` the variance of the methylation rate.

We can now train our model. DeepCpG consists of a *DNA*, *CpG*, and *Joint model*, which can be trained either jointly or separately. We will train them jointly, starting with the CpG model:

In [11]:
cmd="dcpg_train.py
    $train_files
    --val_files $val_files
    --cpg_model RnnL1
    --out_dir $models_dir/cpg
    "
if [[ $test_mode -eq 1 ]]; then
    cmd="$cmd
        --nb_epoch 1
        --nb_train_sample 1000
        --nb_val_sample 1000
    "
else
    cmd="$cmd
        --nb_epoch 30
        "
fi
run $cmd


#################################
dcpg_train.py ./data/c1_000000-001000.h5 --val_files ./data/c13_000000-001000.h5 --cpg_model RnnL1 --out_dir ./models/cpg --nb_epoch 1 --nb_train_sample 1000 --nb_val_sample 1000
#################################
Using TensorFlow backend.
INFO (2017-04-14 10:27:34,697): Building model ...
Replicate names:
BS27_1_SER, BS27_3_SER, BS27_5_SER, BS27_6_SER, BS27_8_SER

INFO (2017-04-14 10:27:34,703): Building CpG model ...
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
cpg/state (InputLayer)           (None, 5, 50)         0                                            
____________________________________________________________________________________________________
cpg/dist (InputLayer)            (None, 5, 50)         0                                            
______________________________________

`--RnnL1` specifies the architecture of the CpG model and `--nb_epoch` the number of training epochs. These are hyper-parameters, which can be adapted depending on the size of the training set and model complexitiy. For testing purposes, we decrease the number of samples using `--nb_train_sample` and `--nb_val_sample`.

The CpG model is often times already quite accurate on its own. However, we can further boost the performance by also training a DNA model, which leverage the DNA sequence:

In [12]:
cmd="dcpg_train.py
    $train_files
    --val_files $val_files
    --dna_model CnnL2h128
    --out_dir $models_dir/dna
    "
if [[ $test_mode -eq 1 ]]; then
    cmd="$cmd
        --nb_epoch 1
        --nb_train_sample 1000
        --nb_val_sample 1000
    "
else
    cmd="$cmd
        --nb_epoch 30
        "
fi
run $cmd


#################################
dcpg_train.py ./data/c1_000000-001000.h5 --val_files ./data/c13_000000-001000.h5 --dna_model CnnL2h128 --out_dir ./models/dna --nb_epoch 1 --nb_train_sample 1000 --nb_val_sample 1000
#################################
Using TensorFlow backend.
INFO (2017-04-14 10:27:57,142): Building model ...
INFO (2017-04-14 10:27:57,145): Building DNA model ...
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
dna (InputLayer)                 (None, 1001, 4)       0                                            
____________________________________________________________________________________________________
dna/convolution1d_1 (Convolution (None, 991, 128)      5760        dna[0][0]                        
____________________________________________________________________________________________________
dna/activa

Finally, we combine both models by training a Joint model:

In [13]:
cmd="dcpg_train.py
    $train_files
    --val_files $val_files
    --dna_model $models_dir/dna
    --cpg_model $models_dir/cpg
    --joint_model JointL2h512
    --train_models joint
    --out_dir $models_dir/joint
"
if [[ $test_mode -eq 1 ]]; then
    cmd="$cmd
        --nb_epoch 1
        --nb_train_sample 1000
        --nb_val_sample 1000
    "
else
    cmd="$cmd
        --nb_epoch 10
        "
fi
run $cmd


#################################
dcpg_train.py ./data/c1_000000-001000.h5 --val_files ./data/c13_000000-001000.h5 --dna_model ./models/dna --cpg_model ./models/cpg --joint_model JointL2h512 --train_models joint --out_dir ./models/joint --nb_epoch 1 --nb_train_sample 1000 --nb_val_sample 1000
#################################
Using TensorFlow backend.
INFO (2017-04-14 10:28:21,112): Building model ...
INFO (2017-04-14 10:28:21,115): Loading existing DNA model ...
INFO (2017-04-14 10:28:21,115): Using model files ./models/dna/model.json ./models/dna/model_weights_val.h5
Replicate names:
BS27_1_SER, BS27_3_SER, BS27_5_SER, BS27_6_SER, BS27_8_SER

INFO (2017-04-14 10:28:21,513): Loading existing CpG model ...
INFO (2017-04-14 10:28:21,513): Using model files ./models/cpg/model.json ./models/cpg/model_weights_val.h5
INFO (2017-04-14 10:28:22,546): Joining models ...
____________________________________________________________________________________________________
Layer (type)           

You can find more information about [training](http://deepcpg.readthedocs.io/en/latest/train.html) and [model architectures](http://deepcpg.readthedocs.io/en/latest/models.html) in the DeepCpG documentation.

## Imputing methylation profiles

Finally, we use `dcpg_eval.py` to impute missing methylation states and to evaluate prediction performance on observed  states. We will use the trained Joint model, but could of course also evaluate the CpG or DNA model.

In [14]:
cmd="dcpg_eval.py
    $data_dir/c*.h5
    --model_files $models_dir/joint
    --out_data $eval_dir/data.h5
    --out_report $eval_dir/report.tsv
    "
if [[ $test_mode -eq 1 ]]; then
    cmd="$cmd
        --nb_sample 10000
        "
fi
run $cmd


#################################
dcpg_eval.py ./data/c13_000000-001000.h5 ./data/c1_000000-001000.h5 --model_files ./models/joint --out_data ./eval/data.h5 --out_report ./eval/report.tsv --nb_sample 10000
#################################
Using TensorFlow backend.
INFO (2017-04-14 10:28:52,032): Loading model ...
INFO (2017-04-14 10:28:53,425): Loading data ...
INFO (2017-04-14 10:28:53,429): Predicting ...
INFO (2017-04-14 10:28:53,447):  128/2000 (6.4%)
INFO (2017-04-14 10:28:54,343):  384/2000 (19.2%)
INFO (2017-04-14 10:28:55,203):  640/2000 (32.0%)
INFO (2017-04-14 10:28:56,036):  896/2000 (44.8%)
INFO (2017-04-14 10:28:56,818): 1128/2000 (56.4%)
INFO (2017-04-14 10:28:57,652): 1384/2000 (69.2%)
INFO (2017-04-14 10:28:58,499): 1640/2000 (82.0%)
INFO (2017-04-14 10:28:59,376): 1896/2000 (94.8%)
INFO (2017-04-14 10:28:59,788): 2000/2000 (100.0%)
  'precision', 'predicted', average, warn_for)
  mcc = cov_ytyp / np.sqrt(var_yt * var_yp)
           output       auc       acc       tp

The imputed methylation profiles of all cells are stored in `data.h5`, and performance metrics in `report.tsv`.

In [15]:
h5ls -r $eval_dir/data.h5

/                        Group
/chromo                  Dataset {2000}
/outputs                 Group
/outputs/cpg             Group
/outputs/cpg/BS27_1_SER  Dataset {2000}
/outputs/cpg/BS27_3_SER  Dataset {2000}
/outputs/cpg/BS27_5_SER  Dataset {2000}
/outputs/cpg/BS27_6_SER  Dataset {2000}
/outputs/cpg/BS27_8_SER  Dataset {2000}
/pos                     Dataset {2000}
/preds                   Group
/preds/cpg               Group
/preds/cpg/BS27_1_SER    Dataset {2000}
/preds/cpg/BS27_3_SER    Dataset {2000}
/preds/cpg/BS27_5_SER    Dataset {2000}
/preds/cpg/BS27_6_SER    Dataset {2000}
/preds/cpg/BS27_8_SER    Dataset {2000}


In [16]:
cat $eval_dir/report.tsv

metric	output	value
acc	cpg/BS27_1_SER	0.7851662404092071
acc	cpg/BS27_3_SER	0.6225490196078431
acc	cpg/BS27_5_SER	0.30788804071246817
acc	cpg/BS27_6_SER	0.3706467661691542
acc	cpg/BS27_8_SER	0.48284313725490197
auc	cpg/BS27_1_SER	0.5267976951614176
auc	cpg/BS27_3_SER	0.5857490940897163
auc	cpg/BS27_5_SER	0.541899611084103
auc	cpg/BS27_6_SER	0.5013366005791936
auc	cpg/BS27_8_SER	0.5393628453565362
f1	cpg/BS27_1_SER	0.8793103448275862
f1	cpg/BS27_3_SER	0.7240143369175627
f1	cpg/BS27_5_SER	0.0
f1	cpg/BS27_6_SER	0.17589576547231267
f1	cpg/BS27_8_SER	0.5685071574642127
mcc	cpg/BS27_1_SER	0.010059326521099646
mcc	cpg/BS27_3_SER	0.1446147146482661
mcc	cpg/BS27_5_SER	0.0
mcc	cpg/BS27_6_SER	0.017828739558016723
mcc	cpg/BS27_8_SER	0.06394060856947545
n	cpg/BS27_1_SER	391.0
n	cpg/BS27_3_SER	408.0
n	cpg/BS27_5_SER	393.0
n	cpg/BS27_6_SER	402.0
n	cpg/BS27_8_SER	408.0
tnr	cpg/BS27_1_SER	0.012195121951219513
tnr	cpg/BS27_3_SER	0.49056603773584906
tnr	cpg/BS27_5_SER	1.0
tnr	cpg/BS27_6_SER	0.9104477611

## Exporting methylation profiles

`dcpg_eval_export.py` can be used to export imputed methylation profiles:

In [17]:
cmd="dcpg_eval_export.py
    $eval_dir/data.h5
    -o $eval_dir/hdf
    -f hdf
"
eval $cmd

INFO (2017-04-14 10:29:02,007): cpg/BS27_1_SER
INFO (2017-04-14 10:29:02,014): cpg/BS27_3_SER
INFO (2017-04-14 10:29:02,019): cpg/BS27_5_SER
INFO (2017-04-14 10:29:02,023): cpg/BS27_6_SER
INFO (2017-04-14 10:29:02,028): cpg/BS27_8_SER
INFO (2017-04-14 10:29:02,033): Done!


In [18]:
ls $eval_dir/hdf

BS27_1_SER.h5 BS27_3_SER.h5 BS27_5_SER.h5 BS27_6_SER.h5 BS27_8_SER.h5


By default, `dcpg_eval_export.py` exports profiles to HDF5 files. You can use `-f bedGraph` to export profiles to gzip-compressed bedGraph files, which, however, takes longer.

## Evaluating prediction performances

`dcpg_eval_perf.py` enables evaluating prediction performances genome wide, in specific genomic contexts, and by computing performance curves. Using `--anno_files`, you can specify a list of BED files with annotation tracks that are evaluated, and you can compute  ROC and precision recall (PR) curves for individual outputs using `--curves roc pr`. You can also use `--anno_curves roc pr` to compute performance curves for annotations specified by `--anno_files`.

In [19]:
cmd="dcpg_eval_perf.py
    $eval_dir/data.h5
    --out_dir $eval_dir/perf
    --curves roc pr
    --anno_files $anno_dir/CGI*.bed $anno_dir/Gene_body.bed $anno_dir/Introns.bed $anno_dir/Exons.bed $anno_dir/UW_DNase1.bed
"
eval $cmd

INFO (2017-04-14 10:29:03,746): Loading data ...
INFO (2017-04-14 10:29:03,760): 2000 samples
INFO (2017-04-14 10:29:03,760): Evaluating globally ...
  'precision', 'predicted', average, warn_for)
  mcc = cov_ytyp / np.sqrt(var_yt * var_yp)
           output    anno       auc       acc       tpr       tnr        f1       mcc      n
1  cpg/BS27_3_SER  global  0.585749  0.622549  0.668874  0.490566  0.724014  0.144615  408.0
2  cpg/BS27_5_SER  global  0.541900  0.307888  0.000000  1.000000  0.000000  0.000000  393.0
4  cpg/BS27_8_SER  global  0.539363  0.482843  0.438486  0.637363  0.568507  0.063941  408.0
0  cpg/BS27_1_SER  global  0.526798  0.785166  0.990291  0.012195  0.879310  0.010059  391.0
3  cpg/BS27_6_SER  global  0.501337  0.370647  0.100746  0.910448  0.175896  0.017829  402.0
INFO (2017-04-14 10:29:03,828): roc curve
INFO (2017-04-14 10:29:03,837): pr curve
INFO (2017-04-14 10:29:03,846): Evaluating annotations ...
INFO (2017-04-14 10:29:04,028): CGI: 312
INFO (2017-04-14 1

Performance metrics are stored in `metrics.tsv`, and performances curves in `curves.tsv`:

In [20]:
head $eval_dir/perf/metrics.tsv

anno	metric	output	value
global	acc	cpg/BS27_5_SER	0.30789
global	acc	cpg/BS27_6_SER	0.37065
global	acc	cpg/BS27_8_SER	0.48284
global	acc	cpg/BS27_3_SER	0.62255
global	acc	cpg/BS27_1_SER	0.78517
global	auc	cpg/BS27_6_SER	0.50134
global	auc	cpg/BS27_1_SER	0.52680
global	auc	cpg/BS27_8_SER	0.53936
global	auc	cpg/BS27_5_SER	0.54190


In [21]:
head $eval_dir/perf/curves.tsv

anno	curve	output	x	y	thr
global	roc	cpg/BS27_1_SER	0.00000	0.00324	0.55943
global	roc	cpg/BS27_1_SER	0.00000	0.01294	0.55266
global	roc	cpg/BS27_1_SER	0.01220	0.01294	0.55247
global	roc	cpg/BS27_1_SER	0.01220	0.02589	0.55041
global	roc	cpg/BS27_1_SER	0.02439	0.02589	0.54989
global	roc	cpg/BS27_1_SER	0.02439	0.03236	0.54941
global	roc	cpg/BS27_1_SER	0.03659	0.03236	0.54921
global	roc	cpg/BS27_1_SER	0.03659	0.07120	0.54488
global	roc	cpg/BS27_1_SER	0.06098	0.07120	0.54435


You can find more annotation tracks in the example data directory:

In [22]:
ls $anno_dir

Active_enhancers.bed H3K4me1.bed          Tet2.bed
CGI.bed              H3K4me1_Tet1.bed     UW_DNase1.bed
CGI_shelf.bed        IAP.bed              Wu_Tet1.bed
CGI_shore.bed        Intergenic.bed       mESC_enhancers.bed
Exons.bed            Introns.bed          p300.bed
Gene_body.bed        LMRs.bed             prom_2k05k.bed
H3K27ac.bed          Oct4_2i.bed          prom_2k05k_cgi.bed
H3K27me3.bed         TSSs.bed             prom_2k05k_ncgi.bed


To visualize performances of a single model, you can use the Rmarkdown script `R/eval_perf_single.Rmd` in the DeepCpG directory. The following command will only work if you have installed R and the required libraries (rmarkdown, knitr, ggplot, dplyr, tidyr, xtable, grid). `R/eval_perf_mult.Rmd` can be used for a side-by-side comparison of multiple models.

In [23]:
cp ../../../R/eval_perf_single.Rmd $eval_dir/perf/index.Rmd
cwd=$PWD
cd $eval_dir/perf
Rscript -e "library(rmarkdown); render('./index.Rmd', output_format='html_document')"
cd $cwd

1: Setting LC_CTYPE failed, using "C" 
2: Setting LC_COLLATE failed, using "C" 
3: Setting LC_TIME failed, using "C" 
4: Setting LC_MESSAGES failed, using "C" 
5: Setting LC_MONETARY failed, using "C" 


processing file: index.Rmd
  |...                                                              |   5%
   inline R code fragments

  |......                                                           |   9%
label: unnamed-chunk-1 (with options) 
List of 1
 $ include: symbol F

  |.........                                                        |  14%
  ordinary text without R code

  |............                                                     |  18%
label: unnamed-chunk-2 (with options) 
List of 1
 $ include: symbol F


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

  |...............                                                  |

<a href='./eval/perf/index.html'>Prediction performance evaluation</a>