# Globals

In [3]:
import random

"Also, for each data set, we first run the algorithm on 

    a set of randomly generated start HOTs or start HOT-mixtures
        
    for 10 iterations.
The HOT or HOT-mixture that results in the best likelihood is then 

    run until convergence.
    
Unless stated otherwise, 

    the number of start trees and mixtures is 100."

In [7]:
early_iterations = 10
start_trees = 100


# Data Processing

## Synthetic Data

### Single HOTS

We generated random HOTs with 10, 25, and 40 vertices with parameters on the edges chosen
uniformly in the intervals

    Pr[Z(u) = 1|Z(p(u)) = 1] in the range [0.1, 1.0], (7)
    Pr[X(u) = 0|Z(u) = 1], ex,ez in the range [0.01, q], (8)
where q is in the range {0.05, 0.10, 0.25, 0.50}. 

For each combination, we generated 100 HOTs for a total of
3 x 4 x 100 = 1200 HOTs.

In [5]:
vertices_range = [10, 25, 40]
q_range = [0.05, 0.10, 0.25, 0.50]

In [4]:
def prob_range(lower, upper):
    return random.uniform(lower, upper)

#### Global Parameters EM

In the standard version of the EM algorithm, there are four parameters per edge of a HOT.
The number of parameters can be reduced by letting some parameters be global, e.g., by letting
x(u) = x(u') for all vertices u and u'. There are three parameters whose global estimation is
desirable: x, Z, and Pr[X(u) = 0 |Z(u) = 1]. However, for technical reasons, requiring that z be
global makes it impossible to derive an EM algorithm. Therefore, we will distinguish between two
different versions of the algorithm: one with free parameters and one with global parameters. The
free parameter version then corresponds to the standard EM algorithm, while the global parameter
version corresponds to letting x and Pr[X(u) = 0 | Z(u) = 1] be global. When evaluating the global
parameter version of the algorithm using synthetic data, we will follow the convention of letting all
three error parameters be global when generating data.

In [None]:
for vertices in vertices_range:
    for q in q_range:
        Px0= random.normal(0.01, q)
        ex= random.normal(0.01, q)
        ez= random.normal(0.01, q)
        for tree in start_trees:
            for node in tree:
                Pz= random.normal(0.1, 1.0)

#### Free Parameters EM (most probably this version is not used)

In [10]:
for vertices in vertices_range:
    for q in q_range:
        for tree in range(0,start_trees):
            for node in tree:
                Pz= random.normal(0.1, 1.0)
                Px0= random.normal(0.01, q)
                ex= random.normal(0.01, q)
                ez= random.normal(0.01, q)

TypeError: 'int' object is not iterable

### HOT-mixtures

## Real Cytogenetic Cancer Data