## Probability Distribution

- Probability is a mathematical term for the likelihood of an event happening.
- The chance of any event occurring is a number between 0 and 1.
- It is used to anticipate risks and identify ways to manage such risks.
- It is used to make predictions about future events based on their likelihood.
- It is a powerful tool used to incorporate uncertainty in planning and decision-making.

    - Flipping of a Coin: Possession and starting direction are determined at the beginning of a football game using a coin toss.
    - The probability of getting the desired outcome while flipping a coin is 0.5 or 50%.

## Random Variable

- It is a type of variable whose value depends on the numerical outcomes of certain random events.
- These variables take on specific values that are determined by the outcomes of random events.
- These variables must be quantifiable and often take the form of real numbers

    ## types of Random Variables
    - Discrete random variables: 
        - Number of misprints on a page
        - Number of customers at a theater
        - Number of cars at a gas station
        - Outcome of throwing a dice

    - Continuous random variables:
        - Temperature
        - Length
        - Height
        - Time

- A probability distribution is a statistical function that describes the range of possible values and their associated probabilities for a random variable.

## Bernoulli Distribution
- The Bernoulli distribution is a discrete probability distribution that models a single trial with two possible outcomes, typically labeled as success (denoted as 1) and failure (denoted as 0).
- It is a discrete probability distribution which takes up only two distinct values: 1 and 0
    - 1 indicates success
    - 0 indicates failure


In [None]:
# plot the Curve for a Bernoulli Distribution

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import bernoulli, gaussian_kde

# Bernoulli distribution parameters
p = 0.6  # Probability of success

# Generating Bernoulli distribution data
data_bernoulli = bernoulli.rvs(p, size=1000)

# Plot histogram
plt.hist(data_bernoulli, bins=np.arange(-0.5, 2, 1), density=True, alpha=0.5, color='pink', edgecolor='black', linewidth=2, align='mid')

# Calculate the KDE
kde = gaussian_kde(data_bernoulli)
kde_xs = np.linspace(-1, 2, 300)
kde_ys = kde.pdf(kde_xs)

# Plot KDE
plt.plot(kde_xs, kde_ys, color='red')

# Add labels
plt.xlabel('Bernoulli')
plt.ylabel('Frequency')

# Display the plot
plt.show()

## Binomial Distribution
- A binomial distribution indicates the probability of a SUCCESS or FAILURE of a survey that is repeated multiple times. The binomial is a type of distribution that has two possible outcomes.

- Example: 
    - If the probability of a machine producing a defective piece is 0.05, what is the probability of finding an inspection sample of size n that has four or more defectives?
    - Let X denote the random variable indicating the number of defectives in the inspection sample of size 10.
    - Assuming that the process is Bernoulli, X follows a binomial with parameters n= 10 and p = 0.1

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import binom, gaussian_kde

# Parameters for the binomial distribution
n, p = 20, 0.8  # number of trials, probability of success per trial
print(n)
# Generate binomially distributed data
data_binom = binom.rvs(n, p, size=10000)

# Plot histogram
plt.hist(data_binom, bins=range(n+2), density=True, alpha=0.5, color='green')

# Calculate the KDE
kde = gaussian_kde(data_binom)
kde_xs = np.linspace(data_binom.min(), data_binom.max(), 301)
kde_ys = kde.pdf(kde_xs)

# Plot KDE
plt.plot(kde_xs, kde_ys, color='blue')

# Add labels
plt.xlabel('Binomial')
plt.ylabel('Frequency')

# Display the plot
plt.show()

## Determine the probability of r successes
- Consider a random experiment of tossing a biased coin 6 times where the probability of getting a head is 0.6. If ‘getting a head’ is considered as ‘success’, then the binomial distribution table will contain the probability of r successes for each possible value of r.

    - This distribution has a mean equal to np and variance of np(1-p).
    - Here, scipy.stats module contains various functions for statistical calculations and tests. The stats() function of
    the scipy.stats.binom module can be used to calculate a binomial distribution using the values of n and p.

In [None]:
from scipy.stats import binom
# Setting the values of n and p
n = 6
p = 0.6
# Defining the list of r values
r_values = list(range(n + 1))
# Obtaining the mean and variance
mean, var = binom.stats(n, p)
# List of pmf values
dist = [binom.pmf(r, n, p) for r in r_values ]
# Printing the table
print("r\tp(r)")
for i in range(n + 1):
    print(str(r_values[i]) + "\t" + str(dist[i]))
# Printing mean and variance
print("mean = "+str(mean))
print("variance = "+str(var))

# The scipy.stats.binom.pmf function is used to obtain the probability mass function for a certain value of r, n, and p. We can obtain the distribution by passing all possible values of r(0 to n).

import matplotlib.pyplot as plt
# Setting the values of n and p
n = 6
p = 0.6
# Defining the list of r values
r_values = list(range(n + 1))
# List of pmf values
dist = [binom.pmf(r, n, p) for r in r_values ]
# Plotting the graph
plt.bar(r_values, dist)
plt.show()

## Poisson Distribution
- Poisson distribution measures the probability of an event occurring over a specified period. It has only one parameter, lambda (λ), which is the mean number of events.
- If X denotes the random variable indicating the number of occurrences which follows a Poisson distribution, the probabilities of X taking a given value (0, 1, 2 .....) depends on its expected value λ. λ is the parameter of the distribution.


In [None]:
# Plot the curve for a poisson Distribution
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import poisson, gaussian_kde

# Generate Poisson distributed data
data_poisson = poisson.rvs(mu=4, size=10000)

# Plot histogram
plt.hist(data_poisson, bins=30, density=True, alpha=0.5, color='green')

# Calculate the KDE
kde = gaussian_kde(data_poisson)
kde_xs = np.linspace(data_poisson.min(), data_poisson.max(), 301)
kde_ys = kde.pdf(kde_xs)

# Plot KDE
plt.plot(kde_xs, kde_ys, color='blue')

# Add labels
plt.xlabel('Poisson')
plt.ylabel('Frequency')

# Display the plot
plt.show()

#Determine the proabilit of restaurants
# Suppose you are going for a long drive. The rate of occurrences of good restaurants in a range of 10 miles is 2. 
# In other words, the mean number of occurrences of restaurants in a range of 10 miles is 2. 
# What is the probability that 0, 1, 2, 3, 4, or 5 restaurants will show up in the next 10 miles.
# The probability of the different number of restaurants ranging from 0 to 5 that one could find within 10 miles given the mean number of occurrences of the restaurant in 10 miles is 2.

# Scipy.stats poisson class is used along with PMF method to calculate the value of probabilities
from scipy.stats import poisson
import matplotlib.pyplot as plt

# Random variable representing number of restaurants
# Mean number of occurrences of restaurants in 10 miles is 2
X = [0, 1, 2, 3, 4, 5]
lmbda = 2

# Probability values
poisson_pd = poisson.pmf(X, lmbda)

# Plot the probability distribution
fig, ax = plt.subplots(1, 1, figsize=(8, 6))
ax.plot(X, poisson_pd, 'bo', ms=8, label='poisson pmf')
plt.ylabel("Probability", fontsize="18")
plt.xlabel("X = No. of restaurants", fontsize="18")
plt.title("Poisson distribution - No. of restaurants vs probability", fontsize="18")
ax.vlines(X, 0, poisson_pd, colors='b', lw=5, alpha=0.5)

## Continuous Probability Distribution
-  Two types of continuous probability distribution:
    - normal distribution
    - uniform distribution
- Normal Distribution: Normal distribution extends between -∞ to +∞ (minus infinity to plus infinity). The distribution is completely
specified by its mean μ and standard deviation σ. The value of population mean μ can be positive, negative, or zero.
- Standard Normal Distribution: The normal distribution with mean 0 and standard deviation 1 is called the standard normal distribution.
    - Statistical tables are used to get probabilities for Z, that is, the normal distribution with zero mean and
standard deviation unity. For other values of mean and standard deviation, probabilities are obtained using
the transformation.
                    


In [None]:
# Plot the Curve for a Normal Distribution
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm, gaussian_kde

# Generate normally distributed data
mu = 0      # Mean
sigma = 1   # Standard deviation
data_normal = norm.rvs(mu, sigma, size=10000)

# Plot histogram
plt.hist(data_normal, bins=30, density=True, alpha=0.5, color='blue')

# Calculate the KDE
kde = gaussian_kde(data_normal)
kde_xs = np.linspace(data_normal.min(), data_normal.max(), 301)
kde_ys = kde.pdf(kde_xs)

# Plot KDE
plt.plot(kde_xs, kde_ys, color='red')

# Add labels
plt.xlabel('Normal distribution')
plt.ylabel('Frequency')

# Display the plot
plt.show()



#Uniform Distribution
from scipy.stats import uniform

# Calculate uniform probability
uniform.cdf(x=8, loc=0, scale=20) - uniform.cdf(x=0, loc=0, scale=20)

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import uniform

# Generate uniformly distributed data
a, b = 0, 10  # Define the start and end points of the uniform distribution
data_uniform = uniform.rvs(a, b-a, size=10000)

# Plot histogram
plt.hist(data_uniform, bins=30, density=True, alpha=0.5, color='blue')

# Calculate the PDF of the Uniform Distribution
uniform_xs = np.linspace(a, b, 301)
uniform_ys = uniform.pdf(uniform_xs, a, b-a)

# Plot PDF
plt.plot(uniform_xs, uniform_ys, color='red')

# Add labels
plt.xlabel('Uniform Distribution')
plt.ylabel('Density')

# Display the plot
plt.show()