## Applied Statistics Problems

### 1. Simulate 7 sided dice throw using a 6 sided dice

> https://www.youtube.com/watch?v=xGNlOWjqgmo&pp=ygUfZ29vZ2xlIGRhdGEgc2NpZW50aXN0IGludGVydmlldw%3D%3D

---

Approach:

The main problem is that the outcome space we want is 1-7 but we are constricted to 1-6

So we can simply throw 1-6 twice (throw the 6 sided dice twice)

We then map the sample space to expected sample space

```
Total sample space : 36 : 11,12,13,14, ... 63,64,65,66

36/7 = 5 and 1 will remain

So 66

11 ... 21: 1
22 ... 32: 2 
...
55 ... 65: 7

if 66 -> retry
```

#### Simplified approach

Simply hardcode the expected mapping
```
11 - 1
12 - 2
13 - 3
14 - 4
15 - 5
16 - 6
21 - 7
```

if anything other than these -> retry

Problem: here there will be lots of retries

In [1]:
import numpy as np
import random

In [2]:
def create_sample_space() -> list:
    sample_space = []
    for i in range(1, 7):
        for j in range(1, 7):
            sample_space.append(int(f"{i}{j}"))

    return sample_space

In [3]:
sample_space = create_sample_space()

In [5]:
print (sample_space)

[11, 12, 13, 14, 15, 16, 21, 22, 23, 24, 25, 26, 31, 32, 33, 34, 35, 36, 41, 42, 43, 44, 45, 46, 51, 52, 53, 54, 55, 56, 61, 62, 63, 64, 65, 66]


In [15]:
len(sample_space)

36

In [17]:
(36//7)*7

35

In [10]:
def map_sample_space_to_outcome(sample_space: list, max_outcome: int) -> dict:
    """
    Maps each element in the sample space to an outcome number, cycling through outcomes up to max_outcome.

    Args:
        sample_space (list): A list of elements representing the sample space.
        max_outcome (int): The maximum outcome number to cycle through.

    Returns:
        dict: A dictionary mapping each element in the sample space to an outcome number.
    """
    sample_space_to_outcomes = {}  # Initialize the dictionary to store the mappings
    current_outcome = 1  # Start with the first outcome
    sample_space_upper_limit = (len(sample_space) // max_outcome) * max_outcome  # Calculate the upper limit for the sample space

    for i in range(0, sample_space_upper_limit):
        if current_outcome > max_outcome:  # Reset the outcome number if it exceeds max_outcome
            current_outcome = 1
        sample_space_to_outcomes[sample_space[i]] = current_outcome  # Map the current sample space element to the current outcome
        current_outcome += 1  # Increment the outcome number

    return sample_space_to_outcomes

In [13]:
sample_space_to_outcomes = map_sample_space_to_outcome(sample_space, 7)

In [14]:
print (sample_space_to_outcomes)

{11: 1, 12: 2, 13: 3, 14: 4, 15: 5, 16: 6, 21: 7, 22: 1, 23: 2, 24: 3, 25: 4, 26: 5, 31: 6, 32: 7, 33: 1, 34: 2, 35: 3, 36: 4, 41: 5, 42: 6, 43: 7, 44: 1, 45: 2, 46: 3, 51: 4, 52: 5, 53: 6, 54: 7, 55: 1, 56: 2, 61: 3, 62: 4, 63: 5, 64: 6, 65: 7}


In [None]:
### samity check to count the number of outcomes
outcome_counts = {}

for throw, outcome in sample_space_to_outcomes.items():
    if outcome in outcome_counts:
        outcome_counts[outcome] += 1
    else:
        outcome_counts[outcome] = 1

In [16]:
outcome_counts

{1: 5, 2: 5, 3: 5, 4: 5, 5: 5, 6: 5, 7: 5}

In [32]:
sample_space_to_outcomes[13]

3

In [17]:
def simulate_throw(sample_space_to_outcomes: dict, trials: 10000):
    """
    Function to simulate a given number of trials using the sample space to outcomes mapping.
    """
    num_completed_trials = 0

    outcome_counts = {}

    while num_completed_trials < trials:
        first_throw = random.randint(1, 6)
        second_throw = random.randint(1, 6)

        combined_throw = int(f"{first_throw}{second_throw}")

        if combined_throw not in sample_space_to_outcomes:
            continue

        outcome = sample_space_to_outcomes[combined_throw]
        if outcome not in outcome_counts:
            outcome_counts[outcome] = 1
        else:
            outcome_counts[outcome] += 1

        num_completed_trials += 1

    return outcome_counts

In [18]:
N_TRIALS = 100000
outcome_counts = simulate_throw(sample_space_to_outcomes, 10000)

outcome_count_proportions = {k: v/N_TRIALS for k, v in outcome_counts.items()}

outcome_count_proportions

{7: 0.01467,
 4: 0.01497,
 1: 0.01363,
 5: 0.01465,
 3: 0.01355,
 2: 0.01442,
 6: 0.01411}

### Problem 2: Lazy Sorting

> https://www.hackerrank.com/challenges/lazy-sorting/problem

---

This problem maps to a geometric distribution: https://www.youtube.com/watch?v=d5iAWPnrH6w&list=PL0o_zxa4K1BVsziIRdfv4Hl4UIqDZhXWV&index=55

```
p: success
q: failure

n: first success on nth try

mean of this distribution: expected no of tries required

mean = 1/p

p = successful permutation/no of permutations possible 

p = 1 / no of permutations possible 

no of permutations possible  = N!/(r_1! r_2!, ...)

where r_1, r_2 are the no of repititions

eg : [1,1,3,4,4,6] -> 6!/(2!.2!)

```


In [19]:
import math

In [20]:
def create_hashmap(P) -> dict:
    hashmap = {}
    for elem in P:
        if elem not in hashmap:
            hashmap[elem] = 1
        else:
            hashmap[elem] += 1
    return hashmap
        
def find_num_repititions_penalty(hashmap: dict) -> list:
    num_repititions_penalty = 1
    
    for elem in hashmap:
        if hashmap[elem] > 1:
            num_repititions_penalty *= math.factorial(hashmap[elem])
            
    return num_repititions_penalty
        

def solve(P):
    # P = [2,2,3,3,5,6]
    # print (P)
    if P == sorted(P):
        return round(float(0), 6)    
    hashmap = create_hashmap(P)
    # print (hashmap)
    Dr = find_num_repititions_penalty(hashmap)
    result = math.factorial(len(P))/Dr
    # print(result)
    return round(float(result), 6)    

In [22]:
solve([5,5,2,1,4,3])

360.0