# Tasks
This my proposed solution to the given assessment. The objective is to research the two different implementations of the standard deviation function used by mircosoft excel libary and highlight the differences between them. These functions are **STDEV.P** and **STDEV.S** respeciievly. Then use **numpy** to perform a simulation demonstrating that the **STDEV.S** calculation is a better estimate for the standard deviation of a population when performed on a sample.

## Research
Consulting the Microsoft Excel documentation [1,2], I went to investigate why there is a need for two different formulas for calculating the standard deviation of a population and in what context are they used.

#### STDEV.S

> Estimates standard deviation based on a sample (ignores logical values and text in the sample).

#### STDEV.P

> Calculates standard deviation based on the entire population given as arguments (ignores logical values and text).

On further investigation [3], I was able to establish that *STDEV.P* is to be used for calculating the standard deviation on an entire population. For example: if you had 20 students in a class and all of them had been accounted for.

By extension *STDEV.S* is to be used in instances where only a sample of a larger population has been taken. For example: if you had 20 students in a class, but only obtained data for 12 of them.

Therefore, *STDEV.S* functions as a type of "correction" when the data collected is only a sample of a larger population [4].

[1] STDEV.S function - https://support.microsoft.com/en-us/office/stdev-s-function-7d69cf97-0c1f-4acf-be27-f3e83904cc23

[2] STDEV.P function - https://support.microsoft.com/en-us/office/stdev-p-function-6e917c05-31a0-496f-ade7-4f4e7462f285

[3] Standard Deviation - https://en.wikipedia.org/wiki/Standard_deviation

[4] Standard Deviation and Variance - https://www.mathsisfun.com/data/standard-deviation.html


## Application of numpy

*numpy.std*(arr, axis = None) : Compute the standard deviation of the given data (array elements) along the specified axis[5].

**Standard Deviation** (SD) is measured as the spread of data distribution in the given data set[6].

[5] Numpy Docs -https://numpy.org/doc/stable/reference/generated/numpy.std.html#:~:text=The%20standard%20deviation%20is%20the,N%20%3D%20len(x)%20.

[6] numpy.std() in Python - https://www.geeksforgeeks.org/numpy-std-in-python/


In [1]:
# Python Program illustrating STDEV.P
# numpy.std() method  
import numpy as np
    
# 1D array  
pop = [20, 2, 7, 1, 34] 
  
print("pop : ", pop)  
print("std of pop : ", np.std(pop)) 
  
print ("\nMore precision with float32") 
print("std of pop : ", np.std(pop, dtype = np.float32)) 
  
print ("\nMore accuracy with float64") 
print("std of pop : ", np.std(pop, dtype = np.float64)) 

pop :  [20, 2, 7, 1, 34]
std of pop :  12.576167937809991

More precision with float32
std of pop :  12.576168

More accuracy with float64
std of pop :  12.576167937809991


In [2]:
# Python Program illustrating STDEV.S
# numpy.std() method  
# Remove the last 2 index from the arry, so it is a sample population of 3/5
# 1D array  
pop = [20, 2, 7] 

print("pop : ", pop)  
print("std of pop : ", np.std(pop))
  
print ("\nMore precision with float32") 
print("std of pop : ", np.std(pop, dtype = np.float32))
  
print ("\nMore accuracy with float64") 
print("std of pop : ", np.std(pop, dtype = np.float64))

pop :  [20, 2, 7]
std of pop :  7.586537784494028

More precision with float32
std of pop :  7.586538

More accuracy with float64
std of pop :  7.586537784494028


In [3]:
# Manual implementation STDEV.P
# define array
arr = [20, 2, 7, 1, 34] 

## get std dev
np.sqrt(np.sum((arr - np.mean(arr))**2)/len(arr))

12.576167937809991

In [4]:
# Manual implementation STDEV.S
# Remove the last 2 index from the arry, so it is a sample population of 3/5
# 1D array  
arr = [20, 2, 7] 
np.sqrt(np.sum((arr - np.mean(arr))**2)/len(arr) - 1)

7.520342781785652