# Population and Estimated Parameters

When we are dealing with data, we often distinguish between two broad types of data: **population data** and **sample data**.

The **population** is the entire group that you want to draw conclusions about.

The **sample** is the specific group that you will collect data from. The size of the sample is always less than the total size of the population.

In this notebook, we are going to discuss about these two types of data and their parameters.

## Population Parameters

Population parameters are numerical characteristics of a statistical population. For example, in a population of students at a university, the population mean age is a parameter. Population parameters are typically represented by Greek letters.

The two most commonly used population parameters are:
* Population Mean (μ)
* Population Standard Deviation (σ)

## Sample Parameters

Sample parameters are numerical characteristics of a statistical sample. Sample parameters are used to estimate the population parameters. For example, a sample mean is used to estimate a population mean.

The two most commonly used sample parameters are:
* Sample Mean (x̅)
* Sample Standard Deviation (s)

## Estimating Population Parameters

It is often impractical or impossible to collect data from every member of a population to compute the population parameters. Instead, we estimate the population parameters using sample data.

The difference between a population parameter and its corresponding sample estimate is called the sampling error. As the sample size increases, the sampling error decreases, and the sample estimate gets closer to the population parameter.

## Confidence in Estimates

The confidence in the estimates is quantified using p-values and confidence intervals. The more data we have, the more confidence we have in the estimates.

By estimating the population parameters and quantifying our confidence in them, we can generate results that are reproducible in future experiments.

In the following sections, we will discuss how to estimate the population mean and standard deviation, and how to quantify our confidence in these estimates.


In [1]:
# Required Libraries

import numpy as np
import scipy.stats as stats

# Population Parameters
population = np.random.normal(loc=20, scale=10, size=240000000)
population_mean = np.mean(population)
population_std = np.std(population)
print('Population Mean:', population_mean)
print('Population Standard Deviation:', population_std)

# Sample Parameters
sample = np.random.choice(population, size=5, replace=False)
sample_mean = np.mean(sample)
sample_std = np.std(sample, ddof=1)
print('\nSample Mean:', sample_mean)
print('Sample Standard Deviation:', sample_std)

# Confidence in Estimates
confidence_interval = stats.t.interval(alpha=0.95, df=len(sample)-1, loc=sample_mean, scale=stats.sem(sample))
print('\n95% Confidence Interval for the Mean:', confidence_interval)

Population Mean: 20.000405885476287
Population Standard Deviation: 9.999807730818732

Sample Mean: 24.91416941378636
Sample Standard Deviation: 8.433982678671331

95% Confidence Interval for the Mean: (14.441996760206004, 35.38634206736672)


  confidence_interval = stats.t.interval(alpha=0.95, df=len(sample)-1, loc=sample_mean, scale=stats.sem(sample))
