Dan Shea  
2021-06-19  

#### Problem
Consider a collection of coin flips. One of the most natural questions we can ask is if we flip a coin $92$ times, what is the probability of obtaining $51$ "heads", vs. $27$ "heads", vs. $92$ "heads"?

Each coin flip can be modeled by a uniform random variable in which each of the two outcomes ("heads" and "tails") has probability equal to $\frac{1}{2}$. We may assume that these random variables are independent (see "Independent Alleles"); in layman's terms, the outcomes of the two coin flips do not influence each other.

A binomial random variable $X$ takes a value of $k$ if $n$ consecutive "coin flips" result in $k$ total "heads" and $n−k$ total "tails". We write that as  $X \in Bin(n,0.5)$.

__Given:__ A positive integer $n \leq 50$.

__Return:__ An array $A$ of length $2n$ in which $A[k]$ represents the common logarithm of the probability that two diploid siblings share at least $k$ of their $2n$ chromosomes (we do not consider recombination for now).

##### Sample Dataset
```
5
```
##### Sample Output
```
0.000 -0.004 -0.024 -0.082 -0.206 -0.424 -0.765 -1.262 -1.969 -3.010
```

In [1]:
from math import log, factorial

In [2]:
def C(n,r):
    '''Returns the number of Combinations of n choose r'''
    return factorial(n) // (factorial(r)*factorial(n-r))

In [3]:
def Bin(n, k, p):
    q = 1 - p
    return C(n, k) * p**k * q**(n-k)

In [4]:
def calc_prob(n):
    diploid = 2*n
    probs = [Bin(diploid, k, 0.5) for k in range(0,diploid+1)]
    ans = [0.0] * len(probs)
    for i in range(len(probs)):
        ans[i] = sum(probs[i:])
    ans = list(map(lambda x: log(x,10), ans))
    return ' '.join([f'{ans[x]:0.3f}' for x in range(1,len(ans))])

In [5]:
calc_prob(5)

'-0.000 -0.005 -0.024 -0.082 -0.205 -0.424 -0.765 -1.262 -1.969 -3.010'

In [6]:
def parse_input_print_ans(filename):
    with open(filename, 'r') as fh:
        n = int(next(fh).strip())
        print(calc_prob(n))

In [7]:
parse_input_print_ans('sample.txt')

-0.000 -0.005 -0.024 -0.082 -0.205 -0.424 -0.765 -1.262 -1.969 -3.010


In [8]:
parse_input_print_ans('rosalind_indc.txt')

-0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.001 -0.001 -0.003 -0.005 -0.008 -0.014 -0.023 -0.035 -0.053 -0.077 -0.109 -0.150 -0.202 -0.265 -0.340 -0.430 -0.533 -0.652 -0.788 -0.940 -1.109 -1.296 -1.502 -1.727 -1.971 -2.235 -2.520 -2.826 -3.154 -3.503 -3.875 -4.271 -4.691 -5.135 -5.605 -6.101 -6.624 -7.176 -7.757 -8.369 -9.014 -9.692 -10.405 -11.157 -11.949 -12.784 -13.667 -14.600 -15.590 -16.644 -17.769 -18.979 -20.292 -21.734 -23.357 -25.287


#### Discussion
They're a bit fast and loose with defining things here. For completeness, you would ideally want array $A$ to have a length of $2n+1$ to account for the $k \geq 0$ case which should always give you $log_{10}(1) = 0.000$

However, even though they want an array $A$ of length $2n$ where $k \in [1,2n]$ you need to make sure in your calculation that you include the case they are omitting, otherwise your probabilities will not sum to $1.0$.

This can be seen in my use of `range(0,diploid+1)` in the `calc_prob` function. Where the `probs` list is of length $2n+1$.  
In my `return` statement, I simply skip the $k=0$ result when returning the rounded values.