# Objectives

----------------------------------------

## 1. Descriptive Statistics vs. Inferential Statistics

- Statistics is a set of methods and measures that deal with collecting, cleaning and analyzing a dataset and making quantitative statements about the population


- Two types of statistical methods are used in analyzing data: **descriptive statistics** and **inferential statistics**:

  * Descriptive statistics describes data (e.g. calculating the average temperature of Berlin in Jan)
  
  * inferential statistics allows you to make predictions (a.k.a inferences) from that data (e.g. predicting the temperature in Berlin tomorrow). Inferential statistics use statistical models to help you compare your sample data to other samples or to previous research

<div>
<img src="attachment:population_sample.png" width="400"/>
</div>


<div>
<img src="attachment:descriptive_stats.png" width="600"/>
</div>

### Sampling

`Sampling with replacement` means that when we select an individual data point from the population to include in a sample, we return that data point back to the population before selecting the next one. In other words, each time we draw a data point for a sample, it is put back into the population, and the population remains unchanged throughout the sampling process.



`Sampling without replacement` is another method of selecting data points from a population to form a sample. In this sampling technique, each time a data point is selected from the population to be included in the sample, it is not returned or "put back" into the population before the next selection is made. As a result, each data point can only be selected once in the sample.



### What is a Random Variable ?

- A random variable is a variable that is subject to random variations so that it can take on multiple different values, each with an associated probability

- an example would be the number you get when you toss a die or the height of a person living in Berlin



### What is Probability and why is it useful ?

- Probability is the long-term chance that a certain outcome will occur from some random process
- Some real life examples of the use of probabilities are:

**Finance:** By estimating the chance that a given financial asset will fall between or within a specific range, it’s possible to develop trading strategies to capture that predicted outcome

**Weather forecast:** Meteorologists can’t predict exactly what the weather will be, so they use tools and instruments to determine the likelihood that it will rain, snow or hail. They also examine historical data bases to estimate high and low temperatures and probable weather patterns for that day or week

**Insurance:** Probability plays an important role in analyzing insurance policies to determine which plans are best for customers and what deductible amounts they need

## 2. Central Limit Theorem 

[Central Limit Theorem](https://www.youtube.com/watch?v=YAlJCEDH2uY)

The Central Limit Theorem (CLT) is a fundamental concept in statistics that helps us understand how the averages of many random samples from a population tend to behave.

In simple terms, the Central Limit Theorem says that no matter what the original population looks like, if we take a bunch of random samples from that population and calculate the average of each sample, those sample averages will tend to form a bell-shaped curve, known as a normal distribution.

#### Example:

- Take a population, like the heights of all people in a country.
- Take a bunch of random groups of people from that population, like selecting groups of 50 people each time.
- Find the average height of each group.
- Plot all those averages on a graph.
- What you'll see is that the graph of those averages will form a bell-shaped curve.




In [None]:
import numpy as np
import matplotlib.pyplot as plt

import seaborn as sns
import pandas as pd

from scipy.stats import poisson

In [None]:
# Population parameters
population_mean = 200
population_stddev = 15
population_size = 10000

# Number of samples to draw
num_samples = 1000

# Sample size for each sample
sample_size = 30

In [None]:
# Function to generate the population data
def generate_population():
    return np.random.normal(population_mean, population_stddev, population_size)

In [None]:
# Function to take random samples from the population and calculate sample means
def sample_means():
    population_data = generate_population()
    samples = [np.random.choice(population_data, size=sample_size) for i in range(num_samples)]
    sample_means = [sample.mean() for sample in samples]
    return sample_means

In [None]:
# Generate sample means
sample_means_data = sample_means()

In [None]:
# Plot the distribution of sample means
plt.hist(sample_means_data, bins=50, density=True, alpha=0.7)
plt.xlabel("Sample Mean")
plt.ylabel("Frequency")
plt.title("Distribution of Sample Means (Central Limit Theorem)")
plt.grid(True)
plt.show()

### Why are distribution functions important?

#### 1. Characterizing Uncertainty
   - By specififying the distribution of a variable we gain insights into the range of possible values it can take and the likelihood of each value occuring.
#### 2. Statistical Inference
   - Distibution function are basis of statistical inference, which involves drawing conclusions about the population on samples. 
#### 3. Modeling and Simulations
   - In many scientific, engineering and business aplications it’s necessary to model and simulate complex systems that involve uncertain variables. Distribution functions allow practitioners to model randomness accurately, enabling the creation of realistic simulations and predictions.
#### 4. Risk Assessment and Managment
   - Distibution functions are widely used in risk assessment and managment to quantify the potential impact of uncertain events. By analyzing the distribution of possible outcomes, decision-makers can make informed choices to minimise risks and optimize strategies. 
#### 5. Statistical Testing and Hypothesis Testing
   - Distibution functions are central to hypothesis testing where researchers compare observed data to expected distributions.
#### 6. Machine learning and Data Analysis
   - Distribution functions are used in various machine learning and data analysis techniques. They help in understanding the underlying data distribution, selecting appropriate models and evaluating model performance.
#### 7. Predictive Analytics
  - Distibution Functions enable predictive analytics by providing insights into the future outcomes based on historical data. This is valuable for forcasting demand, predicting customer behavior and making strategic bussines decisions.
    




### Type of functions

* **PMF (Probability Mass Function)** — a mathematical formula to measure the *probability of drawing a specific value from a discrete data distribution.*

* **PDF (Probability Density Function)** — a mathematical formula to measure the *probability density of different values across a continuous space.*

* **CDF (Cumulative Density Function)** — a mathematical formula to measure the *probability of drawing a sample less than or equal to a certain value*

### What is a Probability Distribution and why is it helpful ?

- A Probability distribution is a list of all of the possible outcomes of a random variable, along with its corresponding probability values

- probability distributions help us model random processes or situations happening in real life 

- Lots of probability distributions models exist for different situations, and the key point is that you have to select the right one that fits your data 

- Modeling a RV with a specific probability distribution enables us to easily calculate the probability of certain situations (using closed form equations or simultions)

## Common distributions

### Example 1: Normal Distribution

- Normal distribution, also known as Gaussian distribution, is a probability distribution that is characterized by a bell-shaped curve. In a normal distribution, the data is symmetrically distributed around the mean, with the majority of values concentrated near the center and fewer values further away from the mean. The curve is smooth and continuous, and it extends to infinity in both positive and negative directions.



- The normal distribution is fully defined by two parameters: the mean (μ) and the standard deviation (σ). The mean represents the center of the distribution, and the standard deviation controls the spread or variability of the data points around the mean. The probability of an observation falling within a certain range of values can be calculated using the properties of the normal distribution.

![normal_dist.png](attachment:normal_dist.png)

In [None]:
# we also sample from a normal distribution
mu = 0
sigma = 1

normal_samples = np.random.normal(mu, sigma, size=10_000)

In [None]:
# let's plot the normalized histogram

sns.displot(normal_samples, kde=True, stat="probability");

### Example 2: Bernoulli Distribution

The Bernoulli distribution is the simplest and most basic probability distribution, describing a single Bernoulli trial, an experiment with only two possible outcomes: success (usually denoted as "1") with probability "p" and failure (usually denoted as "0") with probability "q" (where "q = 1 - p").

![bernoulli_dist.png](attachment:bernoulli_dist.png)

In [None]:
# we can use the numpy random module to sample from many distributions

np.random.seed(42)

random_numbers = np.random.random(size=10)
heads = random_numbers > 0.5
print(f"Outcome of 10 fair coin flips: {heads}")


In [None]:
print(f"Number of heads = {np.sum(heads)}")

#### We can now use that model to calculate the probability that we get 5 heads when we toss a coin 5 times

In [None]:
n_all_heads = 0  # initialize number of 5-heads trials

for i in range(100_000):
    heads = np.random.random(size=5) > 0.5
    n_heads = np.sum(heads)
    if n_heads == 5:
        n_all_heads += 1
                
prob = n_all_heads/100_000  # calculate the estimated probability of 5-heads in 5 tosses

print(f"Probability of getting all heads in 5 fair coin tosses = {prob}")

In [None]:
# calculating that probability using the pmf equation

np.power(0.5, 5)

### Example 2: Binomial Distribution

The binomial distribution is a probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials. A Bernoulli trial is an experiment with two possible outcomes: success (usually denoted as "1") and failure (usually denoted as "0"). Each trial is assumed to have the same probability of success, denoted by "p."

![binomial_dist.png](attachment:binomial_dist.png)

#### The binomial distribution is commonly used in various fields, such as:

Quality control and manufacturing processes: To determine the probability of defects or errors in a production run.
Biostatistics: To model the probability of disease occurrence or treatment success.
Finance: To calculate the probability of successful trades or investments.
Polling and surveys: To estimate the proportion of a population with a particular characteristic based on a sample.

#### We can recreate that PMF plot using sampling and a barplot

In [None]:
# sampling from a binomial distribution of 10 fair coin tosses
np.random.seed(42)

n_tails = np.random.binomial(10, 0.5, size=1_000_000)

# plotting the pmf using the samples
n_tails = pd.Series(n_tails)

In [None]:
n_tails.value_counts()

In [None]:
probabilities = n_tails.value_counts(normalize=True)

probabilities

In [None]:
sns.barplot(x = probabilities.index, y = probabilities.values, color="r")
plt.xlabel("number of tails in a 10 fair coin toss")
plt.ylim([0.0, 1.0])

plt.show()

In summary, the main difference between the Bernoulli and Binomial distributions is that the Bernoulli distribution models a single trial with two possible outcomes, while the Binomial distribution models the number of successes in a fixed number of independent trials, each with two possible outcomes. The binomial distribution is an extension of the Bernoulli distribution to multiple trials.


## Poisson Distribution

The Poisson distribution describes a number of events in a fixed time frame. The type of event you could think about is the number of customers entering a store every 15 minutes. In this case, we keep the 15 minutes as a fixed value (unit time) so that we can ignore it in the rest of the calculations.

In this scenario, there would be an average number of customers entering each unit time, which is called the rate. This rate is called Lambda and it is the only parameter needed for the Poisson distribution.

In [None]:
# Parameters for the Poisson distribution
lambda_param = 3.0  # The average number of events in a fixed interval

# Generate random numbers from Poisson distribution
num_samples = 1000
random_poisson = np.random.poisson(lam=lambda_param, size=num_samples)

# Plot a histogram of the generated random numbers
plt.hist(random_poisson, bins=20, density=True, alpha=0.6, color='b', edgecolor='black')
plt.xlabel('Random Values')
plt.ylabel('Probability Mass')
plt.title('Poisson Distribution')
sns.despine()
plt.show()

## Exponential Distribution
The Exponential distribution is related to the Poisson distribution. Where the Poisson distribution describes the number of events per unit time, the exponential distribution describes the waiting time between events.

In [None]:
# Parameters for the exponential distribution
lambda_param = 0.5 

# Generate random numbers from exponential distribution
num_samples = 1000
random_exponential = np.random.exponential(scale=1/lambda_param, size=num_samples)

# Plot a histogram of the generated random numbers
plt.hist(random_exponential, bins=30, density=True, alpha=0.6, color='g', edgecolor='black')
plt.xlabel('Random Values')
plt.ylabel('Probability Density')
plt.title('Exponential Distribution')
sns.despine()
plt.show()

--------------------------------

### Other Concepts of Inferential Statistics

The Law of Large Numbers (LLN) is a fundamental principle in probability and statistics that describes the behavior of sample averages as the sample size increases. In simple terms, it states that as you take more and more observations or samples from a population, the average of those observations or sample means will tend to get closer and closer to the population's true mean.

It is also a key factor in understanding the Central Limit Theorem, which explains the behavior of sample means from any population, regardless of its original distribution, when the sample size is sufficiently large.