# Statistical Distributions and Probability Estimation

## Probability Density Function (PDF)

**When to Use**: For continuous variables.

**What It Does**: Describes the likelihood of a continuous random variable taking on a specific value.

**Important Note**: The value of the PDF at a specific point is not a probability but rather a density. To get a probability, you need to look at an interval (area under the curve).

**Example**: Heights of people, where height can take any value within a range (like 150 cm to 200 cm).

## 1. Calculating the PDF from a Dataset

To calculate the Probability Density Function (PDF) from a dataset containing real-valued random variables, we can use a technique called Kernel Density Estimation (KDE). Here's an example using Python with a sample dataset.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde

# Sample dataset
data = np.random.normal(loc=0, scale=1, size=1000)

# Calculate the PDF using Kernel Density Estimation
kde = gaussian_kde(data)
x = np.linspace(min(data), max(data), 1000)
pdf = kde(x)

# Plot the PDF
plt.plot(x, pdf, label='Estimated PDF')
plt.hist(data, bins=30, density=True, alpha=0.5, label='Histogram')
plt.title('PDF Estimation using KDE')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.show()


## Differentiating Between Probability and Probability Density

**Probability**: This is a measure of the likelihood that a given event will occur. For discrete variables, it's straightforward to calculate the probability of a specific outcome.

Example: The probability of rolling a 3 on a six-sided die is 
1
6
6
1
**Probability Density**: For continuous variables, instead of calculating the probability of a specific outcome, we calculate the probability density. The PDF gives us a relative likelihood of the variable falling within a particular range.

Example: The height of a person can be any real number. Instead of calculating the probability of a person being exactly 170 cm tall, we look at the probability density around 170 cm.

## Why It Is Not Possible to Calculate the Probability for a Specific Point with Continuous Variables

For continuous variables, the probability of the variable taking any specific exact value is zero. This is because there are infinitely many possible values the variable can take within any range. Instead, we calculate the probability that the variable falls within an interval.

**summary**:

* With a few choices, you can calculate the chance of picking one.
* With endless choices, the chance of picking one specific spot is so small, it's basically zero.






## Estimating Probabilities for Continuous Variables by Using Interpolation

To estimate probabilities for continuous variables, we can use interpolation on the PDF

**Continuous Variables and Probabilities**

First, let's remember that for continuous variables, we don't talk about the probability of hitting one exact point. Instead, we talk about the probability of hitting a range of points.

**Interpolation Basics**

Interpolation is a way of estimating values between known data points. Imagine you have a smooth curve (like a line graph), and you know some points on that curve. Interpolation helps you guess the points in between.

In [None]:
from scipy.integrate import simps

# Example interval
a, b = -0.5, 0.5

# PDF values for the interval
pdf_values = kde(x)

# Find the indices corresponding to the interval
indices = np.where((x >= a) & (x <= b))[0]

# Estimate the probability using numerical integration (Simpson's rule)
estimated_probability = simps(pdf_values[indices], x[indices])

print(f"Estimated probability of falling between {a} and {b} is: {estimated_probability}")
