# Population

_Reference_:
1. Pg. 200 - 202, _John A. Rice_, Mathematical Statistics and Data Analysis, $3^{rd}$ Edition 
2. Weisstein, Eric W. "Population." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/Population.html 

**Statistical Population** as defined in Wolfram MathWorld is _a finite and actually existing group of objects which, although possibly large, can be enumerated in theory (e.g., people living in the United States)._ <br>

We will assume that the population is of finite size _**N**_ and there is a numerical value of interest associated with each element of the population. Let the numerical values be represented as $x_{1}$, $x_{2}$, $x_{3}$, $x_{4}$, ..... $x_{N}$. <br><br> Let us understand some of the population statistics

### 1. Population Mean

**Population Mean** or **average** is represented by $\mu$ and is defined as <br><br>
$$ \mu = \frac{1}{N} \sum_{i=1}^N x_{i}$$

### 2. Population Variance


**Population Variance** represented by $\sigma^{2}$ is defined as <br><br>

$$ \sigma^{2} = \frac{1}{N} \sum_{i=1}^N \left( x_{i} - \mu \right)^{2}$$ <br><br>

The above expression when expanded results in <br><br>

$$ \sigma^{2} = \frac{1}{N} \sum_{i=1}^N x_{i}^{2} - \mu^{2} $$

### 3. Population Standard Deviation

**Population Standard Deviation** is the square root of Population Variance

---
Note: 
---
In real-world scenarios, we will not have the entire population available but only sample of the population.

### 3. Example

To understand poulation mean and population variance, let us use a sample dataset and work on calculations. Here, we will be using **Istanbul Stock Exchange** dataset. The dataset is available on UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets/ISTANBUL+STOCK+EXCHANGE. Data description can be found at https://www2.1010data.com/documentationcenter/beta/Tutorials/MachineLearningExamples/IstanbulDataSet.html

#### Load necessary libraries

In [10]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

#### Read the data

In [17]:
# read data into a dataframe
istanbul_df = pd.read_csv('data/istanbul_stock_exchange.csv', sep = ',')

# attributes (columns) in the dataset along with their data types
print(istanbul_df.dtypes, '\n')

date        object
ISE1       float64
ISE2       float64
SP         float64
DAX        float64
FTSE       float64
NIKKEI     float64
BOVESPA    float64
EU         float64
EM         float64
dtype: object 



In [18]:
# let's look at the records
print(istanbul_df.head())

       date      ISE1      ISE2        SP       DAX      FTSE    NIKKEI  \
0  5-Jan-09  0.035754  0.038376 -0.004679  0.002193  0.003894  0.000000   
1  6-Jan-09  0.025426  0.031813  0.007787  0.008455  0.012866  0.004162   
2  7-Jan-09 -0.028862 -0.026353 -0.030469 -0.017833 -0.028735  0.017293   
3  8-Jan-09 -0.062208 -0.084716  0.003391 -0.011726 -0.000466 -0.040061   
4  9-Jan-09  0.009860  0.009658 -0.021533 -0.019873 -0.012710 -0.004474   

    BOVESPA        EU        EM  
0  0.031190  0.012698  0.028524  
1  0.018920  0.011341  0.008773  
2 -0.035899 -0.017073 -0.020015  
3  0.028283 -0.005561 -0.019424  
4 -0.009764 -0.010989 -0.007802  


#### Summary statistics of the dataset

In [19]:
# dataset summary
istanbul_df.describe()

Unnamed: 0,ISE1,ISE2,SP,DAX,FTSE,NIKKEI,BOVESPA,EU,EM
count,536.0,536.0,536.0,536.0,536.0,536.0,536.0,536.0,536.0
mean,0.001629,0.001552,0.000643,0.000721,0.00051,0.000308,0.000935,0.000471,0.000936
std,0.016264,0.021122,0.014093,0.014557,0.012656,0.01485,0.015751,0.01299,0.010501
min,-0.062208,-0.084716,-0.054262,-0.052331,-0.054816,-0.050448,-0.053849,-0.048817,-0.038564
25%,-0.006669,-0.009753,-0.004675,-0.006212,-0.005808,-0.007407,-0.007215,-0.005952,-0.004911
50%,0.002189,0.002643,0.000876,0.000887,0.000409,0.0,0.000279,0.000196,0.001077
75%,0.010584,0.013809,0.006706,0.008224,0.007428,0.007882,0.008881,0.007792,0.006423
max,0.068952,0.100621,0.068366,0.058951,0.050323,0.061229,0.063792,0.067042,0.047805


For simplicity and ease of understanding, we will not be using the entire dataset but returns of The Financial Times Stock Exchange 100 Index which is represented by the column **FTSE**.

In [27]:
# retrieve returns of FTSE
ftse_df = istanbul_df[['date', 'FTSE']]

# let's look at some of the returns
print(ftse_df.head(), '\n')

# print summary of FTSE
print(ftse_df.describe())

       date      FTSE
0  5-Jan-09  0.003894
1  6-Jan-09  0.012866
2  7-Jan-09 -0.028735
3  8-Jan-09 -0.000466
4  9-Jan-09 -0.012710 

             FTSE
count  536.000000
mean     0.000510
std      0.012656
min     -0.054816
25%     -0.005808
50%      0.000409
75%      0.007428
max      0.050323


**Assumption** <br>

We will be considering The Financial Times Stock Exchange 100 Index a.k.a FTSE as our Population through the entirety of this notes. Yes, although mentioned earlier that entire population data can never be available in real-world scenarios, we will be making this assumption here for ease of understanding of future calculations.

Let **FTSE** represent our population of returns of The Financial Times Stock Exchange 100 Index. Let us calculate

#### 3.1 Population Mean $\mu$

$$ \mu = \frac{1}{N} \sum_{i=1}^N x_{i}$$

In [36]:
# population mean
N = ftse_df.shape[0]
summation_x_i = np.sum(ftse_df.FTSE)
population_mean = (summation_x_i * 1.0) / N

print('Population Mean: ', population_mean)

Population Mean:  0.000510277348880597


#### 3.2 Population Variance $\sigma^{2}$

$$ \sigma^{2} = \frac{1}{N} \sum_{i=1}^N \left( x_{i} - \mu \right)^{2}$$ 

In [37]:
# population variance
summation_sqrd = np.sum(np.square(ftse_df.FTSE - population_mean))
population_var = (summation_sqrd * 1.0) / N

print('Population variance: ', population_var)

Population variance:  0.00015986764021781228


#### 3.3 Population Standard Deviation $\sigma$

$$ \sigma = \sqrt(\sigma^{2}) $$

In [38]:
# population standard deviation
population_std = np.sqrt(population_var)

print('Population Standard Deviation: ', population_std)

Population Standard Deviation:  0.01264387757840973


---
Analysis:
---

It can be seen that population mean and population standard deviation from python source code matches the FTSE summary mean and std.