<a href="https://colab.research.google.com/github/HanifaElahi/Statistical-Analysis/blob/main/Statistical%20Analysis%20Part%20VI%20-%20Estimation%20Techniques.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np

from scipy.stats import norm, beta

# 1. Point Estimation

## Definition

Point estimation provides a single best guess or value for an unknown population parameter (e.g., mean, variance) based on sample data.

## Key Concepts

- Estimator: A rule or formula used to compute the estimate (e.g., sample mean).
- Bias: Difference between the expected value of the estimator and the true parameter value.
- Efficiency: Measure of the estimator’s variance relative to others.

## Pros

- Simple to compute.
- Directly applicable in many real-world scenarios.

## Cons

- Does not provide information about variability or uncertainty.
- Sensitive to outliers.

## Use Cases

- Estimating population averages (e.g., average income, test scores).
- Calculating proportions (e.g., proportion of defective products).



In [2]:
# Generate sample data
data = np.random.normal(loc=10, scale=2, size=100)

In [3]:
# Point estimation: Mean
mean_estimate = np.mean(data)
variance_estimate = np.var(data, ddof=1)

In [4]:
print(f"Mean Estimate: {mean_estimate}")
print(f"Variance Estimate: {variance_estimate}")

Mean Estimate: 9.581735304855037
Variance Estimate: 3.169590299565443


# 2. Interval Estimation

## Definition

Interval estimation provides a range of values (confidence interval) within which the unknown parameter is likely to fall, with a certain probability.

## Key Concepts

- Confidence Level: Probability that the interval contains the true parameter (e.g., 95% confidence).
- Margin of Error: Half-width of the interval.

## Pros

- Accounts for uncertainty in estimation.
- Provides a range for better decision-making.

## Cons

- Relies on assumptions (e.g., normality of data).
- Interval width can be large with small sample sizes.

## Use Cases

- Estimating population means in surveys.
- Forecasting confidence intervals for future outcomes.


In [5]:
# Confidence interval for mean
confidence_level = 0.95

n = len(data)

mean = np.mean(data)

std_error = np.std(data, ddof=1) / np.sqrt(n)

z_score = norm.ppf(1 - (1 - confidence_level) / 2)

In [6]:
confidence_interval = (mean - z_score * std_error, mean + z_score * std_error)

In [7]:
print(f"95% Confidence Interval: {confidence_interval}")

95% Confidence Interval: (9.232796189617835, 9.93067442009224)


# 3. Bayesian Estimation

## Definition

Bayesian estimation uses Bayes' theorem to update the probability of a hypothesis as new evidence is observed. It combines prior beliefs with observed data.

## Key Concepts

- Prior: Initial belief about the parameter.
- Likelihood: Probability of observed data given the parameter.
- Posterior: Updated belief after observing data.

## Pros

- Can incorporate prior knowledge.
- Handles small sample sizes well.

## Cons

- Requires subjective prior selection.
- Computationally intensive.

## Use Cases

- Machine learning (e.g., Bayesian inference in probabilistic models).
- Medical diagnosis (updating probabilities based on test results).
Pyt

In [8]:
# Bayesian estimation for a binomial distribution
alpha_prior = 2
beta_prior = 2
successes = 10
trials = 15

In [9]:
alpha_post = alpha_prior + successes
beta_post = beta_prior + (trials - successes)

In [10]:
posterior = beta(alpha_post, beta_post)
posterior_mean = posterior.mean()

In [11]:
print(f"Posterior Mean: {posterior_mean}")

Posterior Mean: 0.631578947368421


# 4. Monte Carlo Estimation

## Definition

Monte Carlo estimation uses random sampling to approximate mathematical computations, such as probabilities, integrals, and optimization.

## Key Concepts

- Random Sampling: Generating random inputs to simulate outcomes.
- Convergence: Accuracy improves as the number of samples increases.

## Pros

- Handles complex, non-analytical problems.
- Easy to implement for high-dimensional problems.

## Cons

- Computationally expensive.
- Convergence can be slow for small sample sizes.

## Use Cases

- Estimating π.
- Option pricing in financial models.

In [12]:
# Estimating π using Monte Carlo
np.random.seed(42)
n_samples = 10000
points = np.random.rand(n_samples, 2)
inside_circle = np.sum(np.linalg.norm(points, axis=1) <= 1)

In [13]:
pi_estimate = 4 * inside_circle / n_samples

In [14]:
print(f"Monte Carlo Estimate of π: {pi_estimate}")

Monte Carlo Estimate of π: 3.1508
