# Testing the code

This notebook is used to flesh out simple programming errors when writing code. It therefore focuses on the graph building part of the graph, and not on the testable part. Note that we assume in the code that the graph is connected, i.e that there is a path between any two nodes in the graph. The case of non-connected graphs is more complex but, fortunatly, it does not concern us.


## Creating a toy dataset

In [1]:
from importlib import reload
import sys, os, random
import numpy as np
import pandas as pd
sys.path.append(os.path.abspath("/Users/hector/Documents/BGWAS2/CALDERA/scripts"))
import Subgraphs
reload(Subgraphs)
from Subgraphs import *
import ExploreBFS
reload(ExploreBFS)
from ExploreBFS import *

In [2]:
sys.path.append(os.path.abspath("/Users/hector/Documents/BGWAS2/CALDERA/tests"))
import toyDataset
Pop, neighbours, pattern, Pheno, edges = toyDataset.generateToyDataset()
alpha = 0.05
Lengths = np.ones(pattern.shape[0])
G = structure(Pop, neighbours, pattern, Pheno, Lengths)
n1s, n2s = G.ns()
TH = chi2.isf(alpha / 1, 1)

## Explore the graph
### First stage

In [3]:
Lmax = 100
nNodes = G.lengths.shape[0]
n1s, n2s = G.ns()
nodes = [node for node in range(nNodes)]
C = start(nodes, G.Pop, G.pattern, G.neighbours, G.lengths, TH, n1s, n2s)
Env = np.array([S.Env for S in C], dtype = np.float64)
Pvals = np.array([S.pval for S in C], dtype = np.float64)
Not_Too_Large = np.array([S.length < 10 for S in C], dtype = np.bool_)
# We then find the updated k0 we considering only subgraphs of size 1
k0, TH = find_ko(Pvals, alpha)
# We clean the list and we save testable subgraphs
are_Testable = Pvals >= TH
R = C[are_Testable]
R_Pvals = Pvals[are_Testable]
Keep = (Env >= TH) + Not_Too_Large
C = C[Keep]

### All stage

In [4]:
threads = 1
kmax = 10 ** 6
Lmax = 500
verbose = True
output = "None"
n = int(n1s.sum() + n2s.sum())
while C.shape[0] > 0:
    if k0 > kmax:
        if verbose:
            print('Reached kmax value')
        break
    # Compute all the parents

    chunks = np.array_split(C, threads)
    Childrens = Parallel(n_jobs=threads, verbose = 1)(delayed(expand)
                       (chunk, G.Pop, G.pattern, G.neighbours, G.lengths, TH, n1s, n2s)
                       for chunk in chunks)
    C = np.concatenate(Childrens)
    if C.shape[0] == 0:
        break
    Env = np.array([S.Env for S in C], dtype = np.float64)
    Pvals = np.array([S.pval for S in C], dtype = np.float64)
    Not_Too_Large = np.array([S.length < Lmax for S in C], dtype = np.bool_)

    # We then find the updated k0 we considering only subgraphs of size 1
    k0, TH = find_ko(np.concatenate([Pvals, R_Pvals]), alpha, k0)
    # We clean the list and we save testable subgraphs
    are_Testable = Pvals >= TH
    R = np.concatenate([R[R_Pvals >= TH], C[are_Testable]])
    R_Pvals = np.concatenate([R_Pvals[R_Pvals >= TH],
                              Pvals[are_Testable]])
    Keep = (Env >= TH) + Not_Too_Large
    
    if verbose:
        messages(are_Testable, C.shape[0], Keep, Not_Too_Large, k0)
    C = C[Keep]
print("Done")

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


998 out of 998 subgraphs explored at that step are currently potentially testable using the old k0
New k0 is 1498
0 subgraphs were pruned, including 100% because of size, 100% because of the pruning.



[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


3910 out of 3910 subgraphs explored at that step are currently potentially testable using the old k0
New k0 is 5290
0 subgraphs were pruned, including 100% because of size, 100% because of the pruning.



[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    8.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


1677 out of 1677 subgraphs explored at that step are currently potentially testable using the old k0
New k0 is 6568
0 subgraphs were pruned, including 100% because of size, 100% because of the pruning.



[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   57.5s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


6 out of 6 subgraphs explored at that step are currently potentially testable using the old k0
New k0 is 6573
0 subgraphs were pruned, including 100% because of size, 100% because of the pruning.

Done


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   23.5s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.1s finished


## Running an example where we know the truth

In [5]:
import Ground_truth as ground_truth
Pop, neighbours, pattern, Pheno, edges = ground_truth.generateData1()
Lengths = np.ones(pattern.shape[0])
G = structure(Pop, neighbours, pattern, Pheno, Lengths)
n1s, n2s = G.ns()
n = int(n1s.sum() + n2s.sum())
nNodes = G.lengths.shape[0]
Patterns = G.pattern
Neighbours = neighbours
Lengths = G.lengths
k0  =1
TH = 0
kmax = 100
Lmax = 20
C = start(np.array(range(nNodes)), G.Pop, Patterns, Neighbours, Lengths, TH, n1s, n2s)
R = C
while C.shape[0] > 0:
    C = expand(C, G.Pop, Patterns, Neighbours, Lengths, TH, n1s, n2s)
    if C.shape[0] == 0:
        break
    R = np.concatenate([R, C])
assert(R.shape[0] == 10)

In [6]:
Pop, neighbours, pattern, Pheno, edges = ground_truth.generateData2()
Lengths = np.ones(pattern.shape[0])
G = structure(Pop, neighbours, pattern, Pheno, Lengths)
n1s, n2s = G.ns()
n = int(n1s.sum() + n2s.sum())
nNodes = G.lengths.shape[0]
Patterns = G.pattern
Neighbours = G.neighbours
Lengths = G.lengths
C = start(np.array(range(nNodes)), G.Pop, Patterns, Neighbours, Lengths, TH, n1s, n2s)
R = C
while C.shape[0] > 0:
    C = expand(C, G.Pop, Patterns, Neighbours, Lengths, TH, n1s, n2s)
    if C.shape[0] == 0:
        break
    R = np.concatenate([R, C])
assert(R.shape[0] == 6)

In [7]:
Pop, neighbours, pattern, Pheno, edges = ground_truth.generateData3()
Lengths = np.ones(pattern.shape[0])
G = structure(Pop, neighbours, pattern, Pheno, Lengths)
n1s, n2s = G.ns()
n = int(n1s.sum() + n2s.sum())
nNodes = G.lengths.shape[0]
Patterns = G.pattern
Neighbours = G.neighbours
Lengths = G.lengths
C = start(np.array(range(nNodes)), G.Pop, Patterns, Neighbours, Lengths, TH, n1s, n2s)
R = C
while C.shape[0] > 0:
    C = expand(C, G.Pop, Patterns, Neighbours, Lengths, TH, n1s, n2s)
    if C.shape[0] == 0:
        break
    R = np.concatenate([R, C])
assert(R.shape[0] == 11)

In [8]:
Pop, neighbours, pattern, Pheno, edges = ground_truth.generateData4()
Lengths = np.ones(pattern.shape[0])
G = structure(Pop, neighbours, pattern, Pheno, Lengths)
n1s, n2s = G.ns()
n = int(n1s.sum() + n2s.sum())
nNodes = G.lengths.shape[0]
Patterns = G.pattern
Neighbours = G.neighbours
Lengths = G.lengths
C = start(np.array(range(nNodes)), G.Pop, Patterns, Neighbours, Lengths, TH, n1s, n2s)
R = C
while C.shape[0] > 0:
    C = expand(C, G.Pop, Patterns, Neighbours, Lengths, TH, n1s, n2s)
    if C.shape[0] == 0:
        break
    R = np.concatenate([R, C])
assert(R.shape[0] == 13)

In [9]:
neighbours, neighbours, pattern, Pheno, edges = toyDataset.generateToyDataset(n=2, N = 10)
Lengths = np.ones(pattern.shape[0])
Pop = np.zeros((4,), dtype = np.int)
G = structure(Pop, neighbours, pattern, Pheno, Lengths)
n1s, n2s = G.ns()
n = int(n1s.sum() + n2s.sum())
nNodes = G.lengths.shape[0]
Patterns = G.pattern
Neighbours = G.neighbours
Lengths = G.lengths
C = start(np.array(range(nNodes)), G.Pop, Patterns, Neighbours, Lengths, TH, n1s, n2s)
R = C
while C.shape[0] > 0:
    C = expand(C, G.Pop, Patterns, Neighbours, Lengths, TH, n1s, n2s)
    if C.shape[0] == 0:
        break
    R = np.concatenate([R, C])
for S in R:
    closure = (pattern[S.neighbours,] | S.ys) == S.ys
    closure = closure.all(axis = -1)
    assert(not closure.any())

In [10]:
neighbours, neighbours, pattern, Pheno, edges = toyDataset.generateToyDataset(n=50, N = 200)
Lengths = np.ones(pattern.shape[0])
Pop = np.zeros((100,), dtype = np.int)
G = structure(Pop, neighbours, pattern, Pheno, Lengths)
n1s, n2s = G.ns()
n = int(n1s.sum() + n2s.sum())
nNodes = G.lengths.shape[0]
Patterns = G.pattern
Neighbours = G.neighbours
Lengths = G.lengths
C = start(np.array(range(nNodes)), G.Pop, Patterns, Neighbours, Lengths, TH, n1s, n2s)
R = C
while C.shape[0] > 0:
    C = expand(C, G.Pop, Patterns, Neighbours, Lengths, TH, n1s, n2s)
    if C.shape[0] == 0:
        break
    R = np.concatenate([R, C])
for S in R:
    closure = (pattern[S.neighbours,] | S.ys) == S.ys
    closure = closure.all(axis = -1)
    assert(not closure.any())