#### Problem

Two events $A$ and $B$ are independent if $P(A \land B)$ is equal to $P(A) \times Pr(B)$. In other words, the events do not influence each other, so that we may simply calculate each of the individual probabilities separately and then multiply.

More generally, random variables $X$ and $Y$ are independent if whenever $A$ and $B$ are respective events for $X$ and $Y$, $A$ and $B$ are independent (i.e., $P(A \land B) = Pr(A) \times Pr(B)$).

As an example of how helpful independence can be for calculating probabilities, let $X$ and $Y$ represent the numbers showing on two six-sided dice. Intuitively, the number of pips showing on one die should not affect the number showing on the other die. If we want to find the probability that $X+Y$ is odd, then we don't need to draw a tree diagram and consider all possibilities. We simply first note that for $X+Y$ to be odd, either $X$ is even and $Y$ is odd or $X$ is odd and $Y$ is even. In terms of probability, $P((X+Y)_{odd}) = P(X_{even} \land Y_{odd}) + P(X_{odd} \land Y_{even})$. Using independence, this becomes $[P(X_{even}) \times P(Y_{odd})] +[Pr(X_{odd}) \times Pr(Y_{even})]$, or $\left(\frac{1}{2}\right)^{2} + \left(\frac{1}{2}\right)^{2} = \frac{1}{2}$.

__Given:__ Two positive integers $k$ $(k \leq 7)$ and $N$ $(N \leq 2k)$. In this problem, we begin with Tom, who in the $0^{th}$ generation has genotype $Aa Bb$. Tom has two children in the $1^{st}$ generation, each of whom has two children, and so on. Each organism always mates with an organism having genotype $Aa Bb$.

__Return:__ The probability that at least $N$ $Aa Bb$ organisms will belong to the $k^{th}$ generation of Tom's family tree (don't count the Aa Bb mates at each level). Assume that Mendel's second law holds for the factors.

##### Sample Dataset
```
2 1
```
##### Sample Output
```
0.684
```

In [1]:
from itertools import product
from functools import reduce
import operator

In [2]:
probs = [operator.mul(*a)*operator.mul(*b) for a,b in list(product(product([0.25,.75], repeat=2), repeat=2))]

In [3]:
probs

[0.00390625,
 0.01171875,
 0.01171875,
 0.03515625,
 0.01171875,
 0.03515625,
 0.03515625,
 0.10546875,
 0.01171875,
 0.03515625,
 0.03515625,
 0.10546875,
 0.03515625,
 0.10546875,
 0.10546875,
 0.31640625]

In [4]:
sum(probs[0:len(probs)-1])

0.68359375

In [5]:
# This is not an ideal solution as the complexity of the offspring grows exponentially as k increases
def compute_probability(k, N):
    offspring = list(product([1,0], repeat=2**k))
    cases = list(filter(lambda x: sum(x) >= N, offspring))
    formulas = map(lambda x: [0.25 if i == 1 else 0.75 for i in x], cases)
    return sum(list(map(lambda x: reduce(operator.mul, x), formulas)))

In [6]:
compute_probability(2, 1)

0.68359375

In [7]:
compute_probability(3, 1)

0.8998870849609375

In [8]:
compute_probability(4, 1)

0.9899774042423815

In [9]:
# This is a closed form solution I arrived at when I noticed that I could re-write the cases as p/q combinations that appear C(n,r) times
# in generation k, where n = 2**k
# Then, we solve that arithmetic series for all j >= N (much faster!)
def fac(n):
    result = 1
    for i in range(2,n+1):
        result *= i
    return result

def choose(n, r):
    return fac(n) / (fac(r) * fac(n-r))

def better_compute_probability(k, N):
    n = 2**k
    p = 0.25
    q = 1-p
    result = 0
    j = n
    while j >= N:
        val = choose(n, j) * p**j * q**(n-j)
        result += val
        j -= 1
    return result

In [10]:
better_compute_probability(2,1)

0.68359375

In [11]:
def parse_file_print_ans(filename):
    with open(filename, 'r') as fh:
        k, N = next(fh).strip().split(' ')
        print(f'{better_compute_probability(int(k), int(N)):0.3f}')

In [12]:
parse_file_print_ans('sample.txt')

0.684


In [13]:
parse_file_print_ans('rosalind_lia.txt')

0.235


#### Discussion of the solution
The first thing to notice is that every generation always mates with heterozygous (Aa Bb) partners.  

If you draw out a Punnett Square, for the different genotypes of the A allele, you will see that the probability of heterozygous offspring is always $0.5$ independent of Tom's progeny's genotype.  

From there, we want to determine the probability of both alleles being Heterozygous, so this is simply $0.5 \times 0.5 = 0.25$.

Now, we have two important pieces of information about the problem that we can apply towards formulating a solution.
- The prior generation's genotype does not matter
- The probability of a double heterozygotic offspring is 0.25

This means the probability of all other offspring is $1.0 - 0.25 = 0.75$

In a generation $k$ we will have $2^{k}$ offspring.

If we write the genotypes as 1 for double-hetero and 0 for all others, we see this can be written as a binary vector where each position in the vector is the genotype of an offspring. And all possible combinations of offspring then become a binary string with $2^k$ bits.

As you can see in my first approach, this means the size of the vector increases exponentially with respect to the generation. (e.g. - $k=3$ is $2^{3}$ offspring yielding a vector of $2^{3}$ bits and thus $2^{2^{3}}$ possible binary vectors representing all possible genotype combinations!)

However, we can improve upon this if we also note the problem wants probabilities where $\geq N$ double-hetero offspring are in the family tree at generation $k$.

Since the first approach made use of coding double-hetero as $1$, the sum of the vector tells you how many double-hetero offspring are present in that outcome.

And we can determine how many outcomes contain a certain number of $1$ values by noticing that this is the same as looking for the combination of $r$ number of $1$ values in a $2^{k}$ bit binary vector.

Using the sample input $k=2, N=1$ we see the following probabilities:
```
[0.00390625,
 0.01171875,
 0.01171875,
 0.03515625,
 0.01171875,
 0.03515625,
 0.03515625,
 0.10546875,
 0.01171875,
 0.03515625,
 0.03515625,
 0.10546875,
 0.03515625,
 0.10546875,
 0.10546875,
 0.31640625]
 ```
 
$C(4,4)$ occurences of `0.00390625` where all $4$ offspring are double-hetero.  
$C(4,3)$ occurences of `0.01171875` where $3$ of the offspring are double-hetero.  
$\ldots$
 
We also know the probability that any given position is a $1$ (i.e. - The offspring is a double-hetero.) is $0.25$  
And likewise, positions with a $0$ have probability of $1.0 - 0.25 = 0.75$  
So we re-write that as:
$$P(k,N)=\sum_{r=N}^{2^{k}}C\left(^{2^{k}}_{r}\right) p^{r} q^{2^{k}-r}$$
 
This closed form solution is much faster to calculate than the first approach, and as you can see from the above runs of each, the answers are the same.  
In the final function that parses the input and prints a solution we rounded to three decimal places because the example output shows 3 significant digits as the desired output.