## Population and Sample

| Tipo de análisis | Muestra con reemplazo | Muestra sin reemplazo |
|-----------|-----------|-----------|
| Estimación de un parámetro (Estadístico)    | ✅ Se usa para simulaciones y modelos de remuestreo (ej. Bootstrap).    | ✅ Se usa en encuestas y estudios    |
| Cálculo del parámetro poblaciona | ❌ No se usa porque la población no se repite   | ✅ Solo si se estudia toda la población (ej. censo).    |

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Create a Population DataFrame with 10 data 

data = pd.DataFrame()
data['Population'] = [47, 48, 85, 20, 19, 13, 72, 16, 50, 60]

You may get different results from sampling.

In [3]:
# Draw sample with replacement, size=5 from Population

a_sample_with_replacement = data['Population'].sample(5, replace=True)  # true : with replacement
print(a_sample_with_replacement)

2    85
7    16
7    16
2    85
1    48
Name: Population, dtype: int64


In [4]:
# Draw sample without replacement, size=5 from Population

a_sample_without_replacement = data['Population'].sample(5, replace=False)   # false : without replacement
print(a_sample_without_replacement)

1    48
9    60
6    72
0    47
8    50
Name: Population, dtype: int64


# Parameters and Statistics

**Parámetros poblaciones**
$$
\begin{aligned}
\mu &= \frac{1}{N} \sum_{i=1}^{N} X_i \\
\sigma^2 &= \frac{1}{N} \sum_{i=1}^{N} (X_i - \mu)^2 \\
\sigma &= \sqrt{\sigma^2} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (X_i - \mu)^2}
\end{aligned}
$$

In [5]:
# Calculate mean and variance                            # La muestra es la totalidad de la población sin reemplazo
population_mean = data['Population'].mean()
population_var = data['Population'].var(ddof=0)          #  ddof=0 >  N  Deniminador poblacional
population_des = data['Population'].std(ddof=0)

print('Population mean is ', population_mean)
print('Population variance is', population_var)
print('Population desviation is', population_des)

Population mean is  43.0
Population variance is 571.8
Population desviation is 23.912339910598458


**Expected Output: ** Population mean is  43.0
Population variance is 571.8


You may get different result from sampling.

**Estadísticos Muestrales**  
$$
\begin{aligned}
\bar{X} &= \frac{1}{n} \sum_{i=1}^{n} X_i \\
s^2 &= \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})^2 \\
s &= \sqrt{s^2} = \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})^2}
\end{aligned}
$$

In [6]:
# Calculate sample mean and sample standard deviation, size =10
# You will get different mean and varince every time when you excecute the below code

a_sample = data['Population'].sample(10, replace=True)
sample_mean = a_sample.mean()
sample_var = a_sample.var(ddof=1)                  #  ddof=1 >  n-1  Deniminador poblacional
sample_des = a_sample.std(ddof=1)
print('Sample mean is ', sample_mean)
print('Sample variance is', sample_var)
print('Sample variance is', sample_des)

Sample mean is  28.2
Sample variance is 374.6222222222222
Sample variance is 19.3551600929112


**Average of an unbiased estimator - estimador insesgado**

El promedio de un estimador insesgado simplemente significa que, si repetimos el experimento muchas veces y promediamos las estimaciones, obtendremos el verdadero valor del parámetro poblacional.

In [7]:
sample_length = 500
sample_variance_collection0=[data['Population'].
                            sample(50, replace=True).
                            var(ddof=0) 
                            for i in range(sample_length)]

In [8]:
sample_length = 500
sample_variance_collection1=[data['Population'].
                            sample(50, replace=True).
                            var(ddof=1) 
                            for i in range(sample_length)]

In [9]:
print('Population variance is ',data['Population'].var(ddof=0))
print('Average of sample variance with n is', pd.DataFrame(sample_variance_collection0)[0].mean())
print('Average of sample variance with n-1 is', pd.DataFrame(sample_variance_collection1)[0].mean())

Population variance is  571.8
Average of sample variance with n is 562.0496648000001
Average of sample variance with n-1 is 568.7300236734694
