# R Introduction

## Combinatorics and Counting

Factorial to get $n!$ - `factorial(n)` where `n` is the number you want to take the factorial of:

In [1]:
factorial(5)

$\binom{n}{k}$ - `choose(n,k)` where `n` is total number of things and `k` is the number of things you are choosing:

In [2]:
choose(5,3)

$P_n^k$ - permutations, call `choose(n,k)` and then multiply by `factorial(k)`:

In [3]:
choose(5,3)*factorial(3)

Distinguishable permutation, i.e. $\binom{n}{n_1, n_2, ..., n_k}$, you will need to import the `iterpc` package:

In [6]:
#NOTE: not working and still confused on some things

install.packages("iterpc")
library(iterpc)
multichoose(c(5,3,2))

"package 'iterpc' is in use and will not be installed"


## Sampling From a Set (w/ and w/o replacement)

Randomly sampling $k$ times from $\mathbb{N}$:

In [11]:
# Example: Rolling a die 10 times and seeing the results

sample(1:6, 10, replace=TRUE) # With replacement
sample(1:6, 3, replace=FALSE) # Without replacement (once a number is rolled you can't get that number again)

You are able to sample from any list that you provide:

In [12]:
S <- c("red", "blue", 1:10) # the list we sample from is the numbers 1-10 and the word red and the word blue
sample(S, 5, replace=TRUE)

Note that in the above example, these are technically ordered draws (so blue was draw #2 and draw #4).

## Flipping a Coin

Here is code to simulate 10 flips of a fair coin:

In [13]:
sample(c("H", "T"), 10, replace=TRUE)

If you want to specify the probabilities for a biased coin with $P(H) = p$ and $P(T) = 1 - p$:

In [15]:
sample(c("H", "T"), 10, replace=TRUE, prob=c(0.8,1-0.8)) # or just write 0.2

## Drawing Balls from Urns

Suppose we had an urn with 5 red, 7 blue, and 8 green balls and we wish to draw 5 out randomly. Then we could write this simulation:

In [16]:
sample(c(rep("R", 5), rep("B", 7), rep("G", 8)), 5, replace=TRUE) # with replacement
sample(c(rep("R", 5), rep("B", 7), rep("G", 8)), 5, replace=FALSE) # without replacement

Now say that we want to experimentally test if our analytical probability calculation is correct. Suppose given the setup above that we want to calculate the probability that our drawing of 5 balls without replacement consists of exactly 2 red balls. Then:

$$
    P(\text{exactly 2 red}) = \frac{\binom{5}{2}\binom{15}{3}}{\binom{20}{5}} = \frac{2275}{7752} \approx 0.29347265221878227
$$

And we can write the code to experimentally test this calculation as follows:

In [20]:
tot.num.draws <- 10^5
num.drawn <- 5
num.drawn.R <- 2
num.R <- numeric()
num.B <- numeric()
num.G <- numeric()

for (k in 1:tot.num.draws)
{
    x <- sample(c(rep("R", 5), rep("B", 7), rep("G", 8)), 5, replace=FALSE); # without replacement
    num.R[k] <- sum(x=="R")
    num.B[k] <- sum(x=="B")
    num.G[k] <- sum(x=="G")
}

sum(num.R==2)/tot.num.draws  # experimental probability
choose(5, num.drawn.R) * choose(15, num.drawn-num.drawn.R) / choose(20, num.drawn) # analytical probability

Since our simulated probabiliity is very close to our theoretical probability, this can make us more convinced that our theoretical probability is correct.

## Tossing Balls into Bins

Tossing (indistinguishable) balls into (distinguishable) bins can be conceptualized as sampling with replacement from the list of bins. Suppose we have $k$ balls and $n$ bins. We can simulate this process using:

`x <- sample(1:n, k, replace=TRUE)`

We'll toss 10 balls into 7 bins and analyze the random outcome. This will store the specific sequence of tosses in variable `x`. We can then print `x` to see the result of the experiment:

In [23]:
x <- sample(1:7, 10, replace=TRUE)
x

This means that ball 1 when into bin 2, ball 2 went into bin 6, etc. As such, `x` contains the ordered info about each ball toss. If we want to knokw the number of balls in each bin then we can run:

In [24]:
table(x)

x
1 2 3 4 6 7 
1 1 3 3 1 1 

Note that the first row is the bin labels and the bottom row is the number of balls in each of the bins. Note that if no balls ended up in a bin, the bin won't be listed in the table.


Here is code for counting the number of bins with specified numbers of balls and comparing the simulated and theoretical probabilities. This is tossing 7 balls into 4 bins, and we wish to calculate the probability that two bins have 2 balls each and one bin has 3 balls. This will need the `iterpc` library so that we can use `multichoose()`. The code is as follows:

In [None]:
library(iterpc)
n <- 4 # bins
k <- 7 # balls
ball.count <- c("2", "3") # number of balls in individual bins
bin.count <- c(2,1) # number of bins with the above quantities of balls

count <- 0
Nsim <- 10^4

for (j in 1:Nsim)
{
    x <- sample(1:n, k, replace=TRUE)
    table(x)
    nballs <- names(table(x))
    nbins <- as.numeric(table(table(x)))
    
    if (sum(nballs == ball.count)==length(ball.count) && sum(nballs != ball.count))
}