# PyToxo use example as library

In this Jupyter Notebook we illustrate the use of PyToxo, as library, with some of the models saved within this repository.

## Step by step with a first example

The first is the first: let's import the PyToxo library.

In [1]:
import pytoxo

Now we can build a PyToxo object model using a model CSV file. We are going to use `models/additive_3.csv`. It is also possible to directly input a Python dictionary with the parameters of your model —we will do it in the second example—.

In [2]:
model_file = "../models/additive_3.csv"
model = pytoxo.Model(filename=model_file)

We can examine some properties of our `model` object:

In [3]:
from pprint import pprint  # We will use this to print some stuff legibily

print(model.name)
print(model.order)
pprint(model.variables)
pprint(model.penetrances)

additive_3
3
[x, y]
[x,
 x*(y + 1),
 x*(y + 1)**2,
 x*(y + 1),
 x*(y + 1)**2,
 x*(y + 1)**3,
 x*(y + 1)**2,
 x*(y + 1)**3,
 x*(y + 1)**4,
 x*(y + 1),
 x*(y + 1)**2,
 x*(y + 1)**3,
 x*(y + 1)**2,
 x*(y + 1)**3,
 x*(y + 1)**4,
 x*(y + 1)**3,
 x*(y + 1)**4,
 x*(y + 1)**5,
 x*(y + 1)**2,
 x*(y + 1)**3,
 x*(y + 1)**4,
 x*(y + 1)**3,
 x*(y + 1)**4,
 x*(y + 1)**5,
 x*(y + 1)**4,
 x*(y + 1)**5,
 x*(y + 1)**6]


`model` exposes only two public methods:

1. `find_max_prevalence_table`: computes the penetrance table whose prevalence is maximum for the given MAFs and heritability, and returns it as a `PTable` object.
2. `find_max_heritability_table`: computes the penetrance table whose heritability is maximum for the given MAFs and prevalence, and returns it within a `PTable` object

Let's play with the first one, `find_max_prevalence_table`, to obtain a penetrance table. We are going to use a MAF of 0.4 and a heritability of 0.85.

In [4]:
mafs = [0.4, 0.4, 0.4]  # Dimension should coincide with model order
heritability = 0.85
ptable = model.find_max_prevalence_table(mafs=mafs, h=heritability)

And here we have our `ptable` penetrance table. Let's take a look at it:

In [5]:
ptable.print_table()

AABBCC,4.08906702591303E-11
AABBCc,2.20302888340413E-9
AABBcc,1.18690552890343E-7
AABbCC,2.20302888340413E-9
AABbCc,1.18690552890343E-7
AABbcc,0.00000639458132008111
AAbbCC,1.18690552890343E-7
AAbbCc,0.00000639458132008111
AAbbcc,0.000344514953072204
AaBBCC,2.20302888340413E-9
AaBBCc,1.18690552890343E-7
AaBBcc,0.00000639458132008111
AaBbCC,1.18690552890343E-7
AaBbCc,0.00000639458132008111
AaBbcc,0.000344514953072204
AabbCC,0.00000639458132008111
AabbCc,0.000344514953072204
Aabbcc,0.0185611140040732
aaBBCC,1.18690552890343E-7
aaBBCc,0.00000639458132008111
aaBBcc,0.000344514953072204
aaBbCC,0.00000639458132008111
aaBbCc,0.000344514953072204
aaBbcc,0.0185611140040732
aabbCC,0.000344514953072204
aabbCc,0.0185611140040732
aabbcc,1.00000000000000



We could also save the table easily with the provided method `write_to_file`. As default, PyToxo prints tables using the above's CSV format and saves them using a GAMETES format, but we can change this behavior at will. The next example prints the table using GAMETES format, and the `write_to_file` accept exactly the same parameters to configure it, but with the difference that the default is `csv`.

In [6]:
ptable.print_table(format="gametes")

Attribute names:	P0	P1	P2
Minor allele frequencies:	0.400	0.400	0.400
x: 4.08906702591303e-11
y: 52.8760766072850
Prevalence: 0.00482966795815766
Heritability: 0.850000000000000

Table:

4.08906702591303E-11, 2.20302888340413E-9, 1.18690552890343E-7
2.20302888340413E-9, 1.18690552890343E-7, 0.00000639458132008111
1.18690552890343E-7, 0.00000639458132008111, 0.000344514953072204

2.20302888340413E-9, 1.18690552890343E-7, 0.00000639458132008111
1.18690552890343E-7, 0.00000639458132008111, 0.000344514953072204
0.00000639458132008111, 0.000344514953072204, 0.0185611140040732

1.18690552890343E-7, 0.00000639458132008111, 0.000344514953072204
0.00000639458132008111, 0.000344514953072204, 0.0185611140040732
0.000344514953072204, 0.0185611140040732, 1.00000000000000
 


And remember that unless you configure it to avoid it, PyToxo checks solutions, so if it can find a table, that table is correct within the program's accuracy margins.

## Using a Python dict to input the model

This time we are going to directly write our model data instead of using a CSV file. PyToxo allows us to select between a unified Python dictionary where we associate genotype definitions and probabilities, or two separate lists (or Numpy arrays): one with the genotype definitions and the other with the associated probabilities. You can even use a list and a Numpy array in combination.

In this example we are going to use the first possibility: a Python dictionary, which is a very visual way of presenting the case. In the following example we will address the second possibility.

Here we have also chosen to manually define the name of the model. This only serves to be able to identify the case during the execution of the program. If manually entering this parameter is omitted, PyToxo can only deduce it when initializing from a CSV file, in the rest of cases it is left as if it had no name, although it can be modified at any time.

In [7]:
import pytoxo

model2 = pytoxo.Model(
    genotypes_dict={
        "AABBCCDD": "x",
        "AABBCCDd": "x",
        "AABBCCdd": "x",
        "AABBCcDD": "x",
        "AABBCcDd": "x",
        "AABBCcdd": "x",
        "AABBccDD": "x",
        "AABBccDd": "x",
        "AABBccdd": "x",
        "AABbCCDD": "x",
        "AABbCCDd": "x",
        "AABbCCdd": "x",
        "AABbCcDD": "x",
        "AABbCcDd": "x",
        "AABbCcdd": "x",
        "AABbccDD": "x",
        "AABbccDd": "x",
        "AABbccdd": "x",
        "AAbbCCDD": "x",
        "AAbbCCDd": "x",
        "AAbbCCdd": "x",
        "AAbbCcDD": "x",
        "AAbbCcDd": "x",
        "AAbbCcdd": "x",
        "AAbbccDD": "x",
        "AAbbccDd": "x",
        "AAbbccdd": "x",
        "AaBBCCDD": "x",
        "AaBBCCDd": "x",
        "AaBBCCdd": "x",
        "AaBBCcDD": "x",
        "AaBBCcDd": "x",
        "AaBBCcdd": "x",
        "AaBBccDD": "x",
        "AaBBccDd": "x",
        "AaBBccdd": "x",
        "AaBbCCDD": "x",
        "AaBbCCDd": "x",
        "AaBbCCdd": "x",
        "AaBbCcDD": "x",
        "AaBbCcDd": "x*(1+y)",
        "AaBbCcdd": "x*(1+y)",
        "AaBbccDD": "x",
        "AaBbccDd": "x*(1+y)",
        "AaBbccdd": "x*(1+y)",
        "AabbCCDD": "x",
        "AabbCCDd": "x",
        "AabbCCdd": "x",
        "AabbCcDD": "x",
        "AabbCcDd": "x*(1+y)",
        "AabbCcdd": "x*(1+y)",
        "AabbccDD": "x",
        "AabbccDd": "x*(1+y)",
        "Aabbccdd": "x*(1+y)",
        "aaBBCCDD": "x",
        "aaBBCCDd": "x",
        "aaBBCCdd": "x",
        "aaBBCcDD": "x",
        "aaBBCcDd": "x",
        "aaBBCcdd": "x",
        "aaBBccDD": "x",
        "aaBBccDd": "x",
        "aaBBccdd": "x",
        "aaBbCCDD": "x",
        "aaBbCCDd": "x",
        "aaBbCCdd": "x",
        "aaBbCcDD": "x",
        "aaBbCcDd": "x*(1+y)",
        "aaBbCcdd": "x*(1+y)",
        "aaBbccDD": "x",
        "aaBbccDd": "x*(1+y)",
        "aaBbccdd": "x*(1+y)",
        "aabbCCDD": "x",
        "aabbCCDd": "x",
        "aabbCCdd": "x",
        "aabbCcDD": "x",
        "aabbCcDd": "x*(1+y)",
        "aabbCcdd": "x*(1+y)",
        "aabbccDD": "x",
        "aabbccDd": "x*(1+y)",
        "aabbccdd": "x*(1+y)",
    },
    model_name="model2",
)
ptable2 = model2.find_max_prevalence_table(mafs=[0.1] * model2.order, h=0.96)
ptable2.print_table(format="csv")

AABBCCDD,0.0000542974682915145
AABBCCDd,0.0000542974682915145
AABBCCdd,0.0000542974682915145
AABBCcDD,0.0000542974682915145
AABBCcDd,0.0000542974682915145
AABBCcdd,0.0000542974682915145
AABBccDD,0.0000542974682915145
AABBccDd,0.0000542974682915145
AABBccdd,0.0000542974682915145
AABbCCDD,0.0000542974682915145
AABbCCDd,0.0000542974682915145
AABbCCdd,0.0000542974682915145
AABbCcDD,0.0000542974682915145
AABbCcDd,0.0000542974682915145
AABbCcdd,0.0000542974682915145
AABbccDD,0.0000542974682915145
AABbccDd,0.0000542974682915145
AABbccdd,0.0000542974682915145
AAbbCCDD,0.0000542974682915145
AAbbCCDd,0.0000542974682915145
AAbbCCdd,0.0000542974682915145
AAbbCcDD,0.0000542974682915145
AAbbCcDd,0.0000542974682915145
AAbbCcdd,0.0000542974682915145
AAbbccDD,0.0000542974682915145
AAbbccDd,0.0000542974682915145
AAbbccdd,0.0000542974682915145
AaBBCCDD,0.0000542974682915145
AaBBCCDd,0.0000542974682915145
AaBBCCdd,0.0000542974682915145
AaBBCcDD,0.0000542974682915145
AaBBCcDd,0.0000542974682915145
AaBBCcdd

## Using two separated lists to input the model

As in the previous example, this time we are going to directly rewrite our model data instead of using a CSV file. As we have already explained, PyToxo allows us to select between a unified Python dictionary where we associate genotype definitions and probabilities, or two separate lists (or Numpy arrays): one with the genotype definitions and the other with the associated probabilities. You can even use a list and a Numpy array in combination.

In this example we are going to use a Python list and a Numpy array, which would be the case with which we play more variations.

Here we also manually name the model, as we explained in the previous example.

In [8]:
import pytoxo
import numpy

gen_definitions = ["AABB", "AABb", "AAbb", "AaBB", "AaBb", "Aabb", "aaBB", "aaBb", "aabb"]
gen_probabilities = numpy.array(
    [
        "x",
        "x",
        "x",
        "x",
        "x*(1+y)",
        "x*(1+y)",
        "x",
        "x*(1+y)",
        "x*(1+y)",
    ]
)
model3 = pytoxo.Model(
    definitions=gen_definitions,
    probabilities=gen_probabilities,
    model_name="model3",
)
ptable3 = model3.find_max_prevalence_table(mafs=[0.1] * model3.order, h=0.96)
ptable3.print_table(format="gametes")

Attribute names:	P0	P1
Minor allele frequencies:	0.100	0.100
x: 0.00150190754739745
y: 664.819944598339
Prevalence: 0.0375476886849364
Heritability: 0.960000000000000

Table:

0.00150190754739745, 0.00150190754739745, 0.00150190754739745
0.00150190754739745, 1.00000000000000, 1.00000000000000
0.00150190754739745, 1.00000000000000, 1.00000000000000
 


## Working with the final penetrance tables

In the previous examples we have directly printed the penetrance tables. In addition to printing or saving them to a file, PyToxo allows to work directly with these objects to access their data and easily integrate them into a Python program. Here we illustrate how to handle them.

We are going to revisit the already calculated `ptable3`.

In [9]:
print(ptable3.model_name)
print(ptable3.order)

# Return genotypes and penetrances values as lists and print them
pprint(ptable3.genotypes)
pprint(ptable3.penetrance_values)

None
2
['AABB', 'AABb', 'AAbb', 'AaBB', 'AaBb', 'Aabb', 'aaBB', 'aaBb', 'aabb']
[0.00150190754739745,
 0.00150190754739745,
 0.00150190754739745,
 0.00150190754739745,
 1.00000000000000,
 1.00000000000000,
 0.00150190754739745,
 1.00000000000000,
 1.00000000000000]


It is also supported to return the genotypes and penetrances values as Numpy arrays.

In this example we also calculate the mean of the penetrances of this table, from the Numpy array, just as an example of numerical manipulation of this output.

In [10]:
pprint(ptable3.genotypes_as_numpy)
penetrances3_as_numpy = ptable3.penetrance_values_as_numpy
pprint(penetrances3_as_numpy)

# Mean of the returned penetrances array, for doing something...
numpy.mean(penetrances3_as_numpy)

array(['AABB', 'AABb', 'AAbb', 'AaBB', 'AaBb', 'Aabb', 'aaBB', 'aaBb',
       'aabb'], dtype='<U4')
array([0.00150190754739745, 0.00150190754739745, 0.00150190754739745,
       0.00150190754739745, 1.00000000000000, 1.00000000000000,
       0.00150190754739745, 1.00000000000000, 1.00000000000000],
      dtype=object)


0.445278837526332