# **5.12 Sample Offline Data**
---
- select a random subset of customers to roll out new feature to 
- input = array of distinct elements and a size 
- output = subset of the given size of array 
    - all subsets should be equally likely 
    - subsets of size `k+1` also equally likely
    - return result in input array itself (**IN-PLACE**)
- **Construct a random subset of size `k+1` given a random subset of size `k`**

---

### Naive Approaches
- Naive Approach 1:
    - iterate through input array, selecting entries with probability `k/n`
        - avg. number of selected entries will be `k`, may select more or less than `k`
- Naive Approach 2:
    - enumerate all subsets of size `k` and select one at random 
    - (ᴺ<sub>k</sub>) subsets of size `k` -> Time and Space Complexity are HUGE
        - enumerating subsets of size `k` is nontrivial 

---

## Random Sampling:
- build random subset of size `k` by building a subset size `k-1` and adding one more element selected randomly 
    - trivial when `k=1`
        - call random number generator for `mod n` -> `r`
        - swap `A[0]` with `A[r]`
        - `A[0]` now holds the results 
    - `k > 1`
        - choose one element at random 
        - repeat same above process with subarray `A[1,n-1]`
        - random subset soon occupies slots `A[0,k-1]`
            - remaining elements in `n-k` slots 

In [1]:
from typing import List
import random

def random_sampling(A: List[int], k: int) -> None: 
    for i in range(k):
        # generate random index in [i, len(A)-1]
        r = random.randint(i, len(A)-1)
        A[i], A[r] = A[r],A[i]

In [2]:
A = [3,7,5,11]
k = 3
random_sampling(A,k)
A

[7, 3, 11, 5]

##### Time Complexity: `O(k)`
- `k` = size of the random subset 
- `k` calls made to random number generator 
- when `k > n/2` optimize by computing a subset of `n-k` elements to remove from the set 
- `k = n - 1` -> replaces `n-1` calls to random number generator 

##### Space Complexity: `O(1)`
- In-Place Solution

---
# Variant:
#### `rand()` function in C returns a uniformly random number in [0,RAND_MAX-1]
#### Does `rand()` `% n` generate a number uniformly distributed in [0,n-1]?
- `mod n` or `% n` modular arithmetic -> numbers wrap around when they reach a certain value, the modulus

---
#### `rand() % n` is not precisely uniformly distributed 
- depends on range of numbers(N) and degree of randomness you want (RAND_MAX)
- larger the the max and larger the n -> less uniformly distributed 