# Distributions

All the named distributions you study in this course are implemented in R. In this exercise, you will learn about the relationship between the hypergeometric, binomial, and Poisson distributions. Additionally, you will implement the binomial distribution using R’s base functions.

## Intro

To have an efficient list of the distributions available in R, you can use the following command:

```R
help(distributions)
```

In general, for many named discrete distributions, three functions with the prefixes `d`, `p`, and `r` give us the probability mass function (PMF), cumulative distribution function (CDF), and a random sample (it repeats the experiment for the requested number of trials and generates the random variable values from the desired distribution), respectively. Note that the function starting with `p` is not the PMF but the CDF (cumulative distribution function). (`d` is the first letter of the word 'distribution')."

### Binomial

The binomial distribution is associated with three functions dbinom, pbinom, and qbinom in R. One way to use the Bernoulli distribution is to use the binomial distribution with $n = 1$.

<div dir='rtl'>
<font face='B Titr' size='4'>
<h2> تابع جرمی توزیع دوجمله‌ای 
</h2>
</font>
</div>

The binomial distribution probability mass function is dbinom. This function takes three inputs: the first input is the value `x` at which it evaluates the PMF, and the second and third inputs are the parameters of the binomial distribution, 
`n` and `p`. For example, $dbinom(3, 5, 0.2)$ returns the probability $P(X=3)$ where $X$ ~ $Bin(5, 0.2)$.

`dbinom(3, 5, 0.2)` = ${5 \choose 3}(0.2)^3(0.8)^2$

#### pbinom

The binomial distribution cumulative distribution function is pbinom. This function takes three inputs: the first input is the value `x` at which it evaluates the CDF, and the second and third inputs are the parameters of the binomial distribution, `n` and `p`. For example:

#### rbinom

rbinom is a function to generate binomial random variables. The first input indicates the number of random variables we want to generate, and the second and third inputs, like before, are the parameters of the distribution. Therefore, the command $rbinom(7, 5, 0.2)$ creates 7 realizations of independent and identically distributed random variables $Bin(5, 0.2)$.

As you may have noticed, the function $rbern(n_realizations, p)$ which gives realizations from the Bernoulli distribution is not provided by default in R. Implement this function using R’s base functions, and finally compare your result with the base function corresponding to the binomial distribution with $n=1$.

In [1]:
set.seed(42)

rbern <- function(n_realizations = 1, p = 0.5) {
  return (
    replicate(n_realizations, sample(c(0, 1), size = 1, replace=F, c(1 - p, p)))
    )
}

rbern(10, 0.4)

In [2]:
set.seed(42)

rbinom(10, 1, 0.4)

In [3]:
set.seed(42)

rbinomial <- function(n_realizations, n, p){
  return (
    replicate(n_realizations, sum(rbern(n, p)))
  )
}

rbinomial(7, 5, 0.2)

### Hypergeometric

The hypergeometric distribution is also associated with three functions in R:

`dhyper`, `phyper`, `rhyper`

As expected, these three functions correspond to the probability mass function, cumulative distribution function, and random variable generation function for the hypergeometric distribution, respectively.
Since the hypergeometric distribution has three parameters, each of these functions takes four inputs.

For the `dhyper` and `phyper` functions:
The first input is the value at which we want to evaluate the probability mass function or cumulative distribution function, and the remaining inputs are the distribution parameters.

Thus:

`dhyper(k,w,b,n)` = $P(X = k)$

Where:

$X \sim HGeom(w, b, n)$

And:

`phyper(k,w,b,n)` = $P(X ≤ k)$

For the `rhyper` function:
The first input is the number of realizations of the hypergeometric random variable, and the remaining inputs are the distribution parameters. For example:

`rhyper(100,w,b,n)`

Generates 100 independent and identically distributed realizations of the random variable $HGeom(w, b, n)$.

Using the functions you’ve learned in this notebook, answer the following question:

Suppose that on average, you receive 12 emails in a day. Assuming that the probability of receiving an email is equal for any hour of the day, calculate the probability of receiving 3 emails in one hour using two methods:

1. Break the day into very small intervals (e.g., seconds) and calculate the probability of receiving an email in one of these intervals, then adjust the interval to match the time unit specified in the question.

2. Use the Law of Rare Events or a distribution that is directly appropriate for modeling this situation.

In [4]:
dbinom(3, 3600, 12 / (24 * 3600))

In [5]:
dpois(3, 12/24)

A batch of 100 electrical fuses passes the quality control test if all 5 randomly selected samples are intact. Suppose there are 20 defective fuses in the batch. Find the probability that this batch will be accepted.

In [6]:
dhyper(5,80,20,5)

Now, suppose instead of 100 fuses, we have 1,000,000 fuses, and 100 of them are defective. Show that the probability of accepting this batch with a sample of 10 fuses does not depend on whether the sample is drawn with or without replacement.

In [7]:
print(dhyper(10,999900,100,10))
print(dbinom(10, 10, 1 - 0.0001))

[1] 0.9990004
[1] 0.9990004
