# Random variables and their distributions

##### Thus, a random variable X assigns a numerical value X(s) to each possible outcome s of the experiment. The randomness comes from the fact that we have a random experiment (with probabilities described by the probability function P); the mapping itself is deterministic,

![Sample Image](/Users/maukanmir/Documents/Machine-Learning/AI-ML-Textbooks/AI-ML-Learning/images/random-var.png)

#### A random variable maps the sample space into the real line. The r.v. X depicted here is defined on a sample space with 6 elements, and has possible values 0, 1, and 4. The randomness comes from choosing a random pebble according to the probability function P for the sample space.

##### This definition is abstract but fundamental; one of the most important skills to develop when studying probability and statistics is the ability to go back and forth between abstract ideas and concrete examples. Relatedly, it is important to work on recognizing the essential pattern or structure of a problem and how it connects to problems you have studied previously. We will often discuss stories that involve tossing coins or drawing balls from urns because they are simple, convenient scenarios to work with, but many other problems are isomorphic: they have the same essential structure, but in a different guise.

## Bernoulli and Binomial

#### (Bernoulli distribution). An r.v. X is said to have the Bernoulli distribution with parameter p if P (X = 1) = p and P (X = 0) = 1 − p, where 0 < p < 1. We write this as X ~ Bern(p). The symbol ~ is read “is distributed as”.

#### Any r.v. whose possible values are 0 and 1 has a Bern(p) distribution, with p the probability of the r.v. equaling 1. This number p in Bern(p) is called the parameter of the distribution; it determines which specific Bernoulli distribution we have. Thus there is not just one Bernoulli distribution, but rather a family of Bernoulli distributions, indexed by p. For example, if X ~ Bern(1/3), it would be correct but incomplete to say “X is Bernoulli”; to fully specify the distribution of X, we should both say its name (Bernoulli) and its parameter value (1/3), which is the point of the notation X ~ Bern(1/3).

#### (Bernoulli trial). An experiment that can result in either a “success” or a “failure” (but not both) is called a Bernoulli trial. A Bernoulli random variable can be thought of as the indicator of success in a Bernoulli trial: it equals 1 if success occurs and 0 if failure occurs in the trial.

#### Because of this story, the parameter p is often called the success probability of the Bern(p) distribution. Once we start thinking about Bernoulli trials, it’s hard not to start thinking about what happens when we have more than one Bernoulli trial.

## Binomial
#### (Binomial distribution). Suppose that n independent Bernoulli trials are performed, each with the same success probability p. Let X be the number of successes. The distribution of X is called the Binomial distribution with parameters n and p. We write X ~ Bin(n, p) to mean that X has the Binomial distribution with parameters n and p, where n is a positive integer and 0 < p < 1.

## EXample 3.7.2

#### (Random walk). A particle moves n steps on a number line. The particle starts at 0, and at each step it moves 1 unit to the right or to the left, with equal probabilities. Assume all steps are independent. Let Y be the particle’s position after n steps. Find the PMF of Y.


In [1]:
from scipy.special import comb
import numpy as np

def random_walk_pmf(n):
    # Dictionary to store the PMF
    pmf = {}
    
    # Loop over all possible end positions y from -n to n with step size 2
    for y in range(-n, n+1, 2):
        k = (y + n) // 2  # Number of steps to the right
        
        # Check if y+n is even and k is within the valid range
        if (y + n) % 2 == 0 and 0 <= k <= n:
            pmf[y] = comb(n, k) * (0.5**n)  # Calculate PMF using the binomial coefficient
            
    return pmf

# Example usage
n_steps = 10  # Number of steps in the random walk
pmf_result = random_walk_pmf(n_steps)

# Print the results
for position, probability in pmf_result.items():
    print(f'P(Y = {position}) = {probability:.4f}')


P(Y = -10) = 0.0010
P(Y = -8) = 0.0098
P(Y = -6) = 0.0439
P(Y = -4) = 0.1172
P(Y = -2) = 0.2051
P(Y = 0) = 0.2461
P(Y = 2) = 0.2051
P(Y = 4) = 0.1172
P(Y = 6) = 0.0439
P(Y = 8) = 0.0098
P(Y = 10) = 0.0010


## Exercises

#### 2. (a) Independent Bernoulli trials are performed, with probability 1/2 of success, until there has been at least one success. Find the PMF of the number of trials performed.

- K is the trial number on which the first success occurs
- p is the probability of success on each trial,
- (1-p)^k-1 is the probabaility of having k-1 failures before the first success

- P(X=k) = (1-p)^k-1 x p

- P(X=k) = (1/2)^k-1 * 1/2
- P(X=k) = 1/2^k
- For k = 1,2,3 ...

## Geometric Distribution
#### The geometric distribution is used to model the number of trials until the first success in a sequence of independent Bernoulli trials, where each trial has a constant probability of success. The key characteristic of the geometric distribution is that it describes the trials up to and including the first success.

#### One of the unique features of the geometric distribution is its memoryless property. This means that the probability of achieving the first success in future trials does not depend on how many trials have already been performed without success. Mathematically, this can be expressed as:
- P(X > n +k | X > n) = P(X > k)

## (b) Independent Bernoulli trials are performed, with probability 1/2 of success, until there has been at least one success and at least one failure. Find the PMF of the number of trials performed.

#### For either case, if the first outcome occurs on the first trial, the subsequent trials follow a geometric distribution (with p =1/2) for the opposite event.
- The probability of stopping after k trials k >= 2 is determined by P(X=K) = (1/2) * (1/2)^k-1 + (1/2) * (1/2)^k-1
- 2 * (1/2)^k

## Definition of Logarithms

#### A logarithm asks the question: To what power must a given base b be raised to produce a certain number x?

#### Formally, if 𝑏^𝑦 =𝑥 then log subscript b(x) = y. Here, b is the base of the algorithm, x is the argument and y is the logarithm of x to base b.