# Introduction:

Probability distributions are an essential concept in machine learning that helps us model and analyze the data. They describe the likelihood of a random variable taking on different values, and they provide a mathematical framework for understanding the data. In machine learning, we use probability distributions to model various types of data, such as continuous, discrete, binary, and bounded data.

Here are some common probability distributions used in machine learning applications:

- Normal distribution: Also known as the Gaussian distribution, it is one of the most widely used probability distributions. It is used to model continuous variables that are symmetric and have a bell-shaped curve. The normal distribution has two parameters, mean (μ) and standard deviation (σ). For example, the height of people in a population can be modeled by a normal distribution.

- Bernoulli distribution: It is a discrete probability distribution that models the probability of a binary event (success or failure). It has only one parameter, p, which represents the probability of success. For example, the result of a coin flip can be modeled by a Bernoulli distribution.

- Binomial distribution: It is used to model the number of successes in a fixed number of trials of a Bernoulli experiment. It has two parameters, n (number of trials) and p (probability of success). For example, the number of heads in ten coin flips can be modeled by a binomial distribution.

- Poisson distribution: It is used to model the number of occurrences of an event in a fixed interval of time or space. It has one parameter, λ (rate parameter). For example, the number of phone calls received by a call center in an hour can be modeled by a Poisson distribution.

- Exponential distribution: It is used to model the time between two successive events in a Poisson process. It has one parameter, λ (rate parameter). For example, the time between two phone calls in a call center can be modeled by an exponential distribution.

- Beta distribution: It is used to model the probability distribution of a random variable that is bounded between 0 and 1. It has two parameters, α and β, which can be interpreted as the number of successes and failures, respectively. For example, the probability of a website user clicking on an advertisement can be modeled by a beta distribution.

These probability distributions have many applications in machine learning, such as:

In Bayesian inference, the prior and posterior distributions are often chosen from the family of probability distributions based on the problem domain.
In regression analysis, the residual errors are often assumed to follow a normal distribution.
In classification problems, the class probabilities can be modeled by a binomial or a multinomial distribution.
In clustering problems, the distribution of the data points can be modeled by a mixture of normal distributions.
In reinforcement learning, the rewards can be modeled by a Poisson or an exponential distribution.
Understanding and applying common probability distributions is crucial in machine learning applications to model and analyze the data accurately.

Now that we have introduced the concept of probability distributions and their importance in machine learning, let's move on to the next section, Normal Distribution.

## Normal Distribution

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# define the parameters of the normal distribution
mu = 0   # mean
sigma = 1   # standard deviation

# generate random data from the normal distribution
data = np.random.normal(mu, sigma, 1000)

# plot the histogram of the data
plt.hist(data, bins=50, density=True, alpha=0.6, color='g')

# plot the probability density function (PDF) of the normal distribution
x_axis = np.arange(-4, 4, 0.001)
plt.plot(x_axis, norm.pdf(x_axis, mu, sigma), color='r', linewidth=2)

# add labels and title to the plot
plt.xlabel('x')
plt.ylabel('Density')
plt.title('Normal Distribution')

# display the plot
plt.show()


This code generates 1000 random data points from a normal distribution with mean 0 and standard deviation 1. Then, it plots a histogram of the data and overlays the probability density function (PDF) of the normal distribution on top of it. The PDF is obtained using the norm.pdf() function from the scipy.stats library, which takes the x-axis values, mean, and standard deviation as input.

The resulting plot shows the bell-shaped curve of the normal distribution, with a peak at the mean and a spread determined by the standard deviation. The area under the curve sums up to 1, representing the total probability of all possible values of the random variable.

In machine learning, the normal distribution is used to model continuous variables that are symmetric and have a bell-shaped curve. It is often assumed that the errors in regression models follow a normal distribution, and the likelihood function in Bayesian inference is often chosen to be a normal distribution. The central limit theorem also states that the sum of a large number of independent random variables with finite means and variances converges to a normal distribution, which makes the normal distribution an important concept in statistical inference.

### Example 
Here's an example of how the normal distribution can be used to model the heights of people in a population.

Suppose we have a dataset of heights of people in a particular population. We can use the normal distribution to model this data and estimate the mean and standard deviation of the population's height.

We can use the scipy library to work with the normal distribution in Python. First, let's generate some sample data to work with:

In [1]:
import numpy as np

# Generate 1000 random heights
heights = np.random.normal(loc=175, scale=10, size=1000)


In this example, we generate 1000 random heights with a mean of 175 cm and a standard deviation of 10 cm.

Now, let's plot a histogram of the data to visualize the distribution:

In [None]:
import matplotlib.pyplot as plt

# Plot a histogram of the heights
plt.hist(heights, bins=30, density=True)

# Add a title and labels to the plot
plt.title("Distribution of Heights")
plt.xlabel("Height (cm)")
plt.ylabel("Frequency")

# Show the plot
plt.show()


This should produce a histogram that shows a bell-shaped curve with the mean around 175 cm.

We can also use the normal distribution to calculate the probability of a person's height falling within a certain range. For example, we can calculate the probability of a person's height being between 165 cm and 185 cm:

In [None]:
from scipy.stats import norm

# Calculate the probability of a person's height being between 165 cm and 185 cm
lower_bound = norm.cdf(165, loc=np.mean(heights), scale=np.std(heights))
upper_bound = norm.cdf(185, loc=np.mean(heights), scale=np.std(heights))
probability = upper_bound - lower_bound

print("The probability of a person's height being between 165 cm and 185 cm is {:.2f}%.".format(probability*100))


This should output:



The probability of a person's height being between 165 cm and 185 cm is 68.72%.

This means that about 68.72% of the population falls within the range of 165 cm to 185 cm in height.