## Discrete Distributions

In a random experiment we can have two types of events discrete or continuous, we will start by describing different types of discrete events. To describe this events we can use mathematical functions f(x), that gives us the different probabilities for each possible outcome in a trial or a number of trials.

These functions for the discrete distributions are named Probability Mass Functions (PMF) - which is simply a list of probabilities of the outcomes (which can be represented graphically). This function has three properties



1. P(X = x) = f(x) > 0 if x $\in$ S


2. $ \sum_{x\in S}f(x) = 1 $


3. $ P(X \in A) = \sum_{x\in S}f(x)$


Property 3 means that to find the probability of any event A, you must sum the probabilities of the x values in A

Example:

In a class we count the number of siblings for each student. Lets define x as the number of siblings (a random variable) = {0,1,2,3}. We obtained this result

PMF = 


|X      |0    |1    |2    |3    |
|:------|:----|:----|:----|:----|
|p(X)   |0.35 |0.40 |0.20 |0.05 |



In [None]:
x = c(0.35,0.40,0.20,0.05)
plot(0:3,x, type = 'h')

Another way to represent discrete distributions is using a step function called Cumulative Distribution Function CDF.

The CDF is formally defined as $F_x(t) = P(X \leq t)$

This is an important definition as it implies that the random variable is a function of t (which can be defined as the order of events - time).

then 

$$F_x(t) = \sum_{j = 1}^{t} p(x_j)$$

In [None]:
cumsum(x)
plot(cumsum(x),type = 's')

#### Bernoulli Trial

A Bernoulli trial is a random experiment with two outcomes. e.g. sucess/failure, yes/no, on/off, etc. in this experiment the probability of sucess or failure doesn't change from trial to trial


## Bernoulli Distribution

The Bernoulli distribution is the probability distribution of a random variable which takes the value 1 with probability p and the value 0 with probability q=1-p

The distribution of heads and tails in coin tossing is an example of a Bernoulli distribution with p=q=1/2. The Bernoulli distribution is the simplest discrete distribution, and it the building block for other more complicated discrete distributions.

$$\mu = p$$

$$\sigma = p(1-p)$$

---

## Binomial Distribution

Similar to the Bernoulli Distribution, the binomial distribution pertains to random experiments with two possible otucomes: Sucess (S) and Failure (F). Thus, For any random variable (X) we can assign x = 1 when sucess and x = 0 when failure.

if p(s) = p then p(f) = 1-p

the PMF of X for one trial is

$$f_x(x) = p^x (1-p)^{1-x}$$

The binomial model has three properties

* It uses multiple Bernoulli trials (n times)
* The trials are independent
* P(s) states the same 

if X counts the number of sucesses in the n independet trials then the PMF of X is

$$f_x(x) = {n \choose x} p^x (1-p)^{n-x}$$ 

The mean of the distribution is
$$\mu = np$$

The standard deviation of the distribution is

$$\sigma^2 = np(1-p)$$

A four-child family. Each child may be either a boy (B) or a girl (G). For simplicity we suppose that P(B) = P(G) = 1=2 and that the genders of the children are determined independently. 
If we let X count the number of B’s, then X ~ binom(size = 4; prob = 1=2). 



In [None]:
##we can calculate the binomial probability of no having any boys in the family of four
##in R using the function pbinom()

dbinom(0,4,0.5)

How about finding the probability of having two boys in a family of 4 Further, P(X = 2) is

$$f_x(2) = {4 \choose 2} \frac{1}{2}^2 \frac{1}{2}^{2} = \frac{6}{2^4}$$ 

In [None]:
choose(4,2) #which calculates the combinations of two sucessess in four trials

6/2^4 ##probability of having two boys in a family of 4

dbinom(2,4,0.5)

find $\mu$ and $\sigma$

In [None]:
#lets plot PMF and CDF for this example

two_boys_pmf <- dbinom(0:4, size = 4, prob = 0.5)
plot(two_boys_pmf, type = "h", ylim= c(0,0.5))
points(two_boys_pmf,pch=19)

In [None]:
plot(1:5,cumsum(two_boys_pmf), type = 'l')
#plot(1:5,cumsum(two_boys_pmf), type = 's')
#plot(pbinom(0:5,4,0.5), type = 'l')

Exercise:  the CDC estimates that 22% of adults in the U.S. smoke
If we randomly sample 10 individuals from the US population, what is the probability that 5 individuals from the sample smoke?
In this case smoke is the success and non smoke is the failure
plot the PMF and CMF for this distribution?

In [None]:
#dbinom(5,size=10,prob=.22)

In [None]:
#plot(pbinom(0:10,10,.22), type = 'l')

### Confidence intervals:

In most cases obtaining a unique probability is not very reasonable, as we are dealing with random trials. It is better to supply this info using confidence intervals, normally we use 95% CF.

In our previous example calculate the 95% that we obtain 5 smokers of our sample


In [None]:
#Generate a sequency from 0 to 1 by 0.01 units
se = seq(0,1,by = 0.01)

#calculate all binomial probabilities at each initial probability
a = dbinom(5,10,prob = se)

#combine data
#df = as.data.frame(cbind(se,a))
#df

In [None]:
plot(seq(0,1,by=0.01),a, type = 'l')
abline(h = 0.05, xlim =c(0,1), col ='red')

Obviously we can do it in R, in an easier manner

In [None]:
#install.packages("binom")
library(binom)
binom.confint(5, 10, conf.level = 0.95)