# Counting Subsets

## Background Info

* **Character**: Any feature (genetic, physical) that divides a collection of organisms into 2 separate groups. One commonly used genetic chracter is the possession of a single-nucleotide polymorphism (SNP).
* **Genotyping**: A process of comparing genetic markers (i.e., alleles) taken from a large number of members of the same species to obtain a more complete picture of that species's phylogeny. An allelic marker may be represented by an SNP or a microsatellite.

Whether we use genetic or physical characters, we can think of a collection of $n$ characters as a collection of ON/OFF switches.

## Problem

### Terminologies
* **Set**: The mathematical term for a loose collection of objects, called elements. The ordering of the elements in the sets is unimportant. Sets are not allowed to contain duplicate elements.

We can use subsets to represent the collection of taxa possessing a character.

## Aim of the problem
To count the total number of possible subsets of a given set.

**Given**: A positive integer $n$ ($n \le 1000)$,<br>
**Return**: The total number of subsets of $\{1,2,...,n\}$ % $1000000$

## Solution Explanation

Let's say:<br>
$n$ = Number of elements in a set <br>
$a_n$ = Total number of subsets of a set with $n$ elements<br>
$b_n$ = $a_n$ % $10^6$<br>

Suppose we know $a_{n-1}$. The recursive equation is $a_n = 2a_{n-1}$ because for each subset from a set of $n-1$, we have the choice of either including the $n$th element in that set or not. The base case can then be established as $a_0 = 1$, since an empty set is still a set.

By recursion, we can now conclude that $a_n = 2^n$.

In [7]:
def b(n):
    if n == 0:
        return 1
    return 2*b(n-1) % 10**6 # Mod 10^6 was taken in this step to avoid overflow

## Actual Dataset

In [8]:
n = 973

In [9]:
print(f'The total number of subsets of a set with {n} elements: {b(n)}')

The total number of subsets of a set with 973 elements: 115392


## Problem solved!