# Task [1] - Calculating the Square Root of 2

## Simple calculation using the "**" Operator accurate to 15 decimals

In [1]:
def sqrt2Simp(n):
    # the exponentiation
    # using the exponent (1/2) on a number is the same as getting it's square root
    n  = n**(1/2)
    # printing out 100decimals
    print("{:.100}".format(n))
    # print out normallly
    print(n)
    
sqrt2Simp(2)

1.4142135623730951454746218587388284504413604736328125
1.4142135623730951


## Calculating using the Babylonian Method accurate to 10 decimals

In [2]:
def sqrt2B1(n):
    """
    A function to calculate the square root of a number n.
    """
    r = n
    x = 0
    # setting 10 decimal accurrency
    a = 0.0000000001

    # Loop until we're happy with the accuracy.
    while abs(r - x) > a:
        r = (r + x) / 2
        x = n / r
    print("{:.100}".format(r))
    # Return the (approximate) square root of n.
    return r

sqrt2B1(2)

1.4142135623746898698271934335934929549694061279296875


1.4142135623746899

# Task[2] - Chi-Squared Test

## Contingency Table
***

The table provided for this task is an example of a Contingency Table.[1]

***
- .[1] Contingency Tables, https://en.wikipedia.org/wiki/Contingency_table
***
###### Wikipedia example of Contigency Table 

<img style="float: left;" src="images/ContigencyTable.PNG">




In [3]:
"""
The display of Contingency Table for ease of reading
"""

# Data frames
import pandas as pd 

data = {'': ['White collar', 'Blue collar', 'No collar'],
        'A':[90, 30, 30],
        'B':[60, 50, 40], 
        'C':[104, 51, 45], 
        'D':[95, 20, 35]}

#Setting value to index
df = pd.DataFrame(data).set_index('')

#Total sum per column: 
df.loc['Total']= df.sum(axis=0)

#Total sum per row: 
df['Total'] = df.sum(axis=1)

df

Unnamed: 0,A,B,C,D,Total
,,,,,
White collar,90.0,60.0,104.0,95.0,349.0
Blue collar,30.0,50.0,51.0,20.0,151.0
No collar,30.0,40.0,45.0,35.0,150.0
Total,150.0,150.0,200.0,150.0,650.0


## Calculating the Chi-Square and P values
***
Scipy.stats contains multiple functions to calculate the Chi-Square value.[1]

Do to the fact the table given is an example of a Contingency Table, I thought it appropriate to contingency function chi2_contingency(observed, correction=True, lambda_=None).[2].  Only the observed is needed in this case.

This function returns the test statistic, p-value, degree of freedom and expected frequencies.  Only the first 2 are necessary for this task.


***
- .[1] Documentation for scipy.stats, https://docs.scipy.org/doc/scipy/reference/stats.html  
- .[2] Function for Chi-square test of independence of variables in a contingency table, https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html#scipy.stats.chi2_contingency
***

In [4]:
#Statistics package
import scipy.stats as ss

#Observed Data
obsData = [[90, 30, 30],
         [60, 50, 40],
         [104, 51, 45],
         [95, 20, 35]]

#chi2_contingency - Chi-square test of independence of variables in a contingency table
stat, p, dof, expected  = ss.chi2_contingency(obsData)

print(f"Approximate Chi-squared value: {stat:0.1f}")
print('Actual Chi-squared value:', stat)

print('\nP-Value:', p)
print(f"P_Value: {p:0.2f}")


Approximate Chi-squared value: 24.6
Actual Chi-squared value: 24.571202858582602

P-Value: 0.0004098425861096692
P_Value: 0.00


***
# Task[3] - Excels functions for Standard Deviation

## Setting data sets of Population and Samples
***

### STDEV.P
***

Returns an estimation for the standard deviation of a population, this is when the full data set is known.  An example of this if the data set was a sensis of the entire population of a country.

STDEV.P calculates the standard deviation of a data set, it does this by using the "n" method which is as follows:

<img style="float: left;" src="images/StandardDeviation.gif" width="200" height="100"/>


### STDEV.S
***

Returns an estimation for the standard devaition of a sample, this is when you have part of a data set is known. An example of this would be a survey of set of people in a country, this may be region or other.

STDEV.S uses a similiar formula to the one above but instead of dividing by "N" it divides "N-1".  This is known as "Bessel's Correction".[1]

- .[1] Bessel's Correction, https://en.wikipedia.org/wiki/Bessel%27s_correction


In [70]:
'''
Using numpy to create a random data set for the population.
'''

# Efficient numerical arrays.
import numpy as np

# Setting the mean, standard deviation and size of populaiton
m_a, s_a = 1.0, 0.4
N = 50

# Population data set
a = np.random.normal(m_a, s_a, N)

# Printing the population data set
print("Population A:")
print(a)

Population A:
[0.38842921 1.01034183 0.89190977 1.15935193 0.88460741 0.27655353
 0.53408832 1.05053786 0.86727588 0.85831187 0.98727665 0.83758418
 0.65576175 1.56249442 0.9072126  0.63322598 1.59382523 1.13234558
 0.73824033 1.03518097 1.27045912 0.78162246 0.9029945  0.8736088
 0.64913574 1.74346171 1.72052288 0.47077446 0.93687274 0.64952033
 0.67610036 1.240677   1.32013074 0.05043781 0.70226566 1.22024025
 0.38996632 1.46974039 1.38866417 1.15645885 1.21439177 0.89902013
 0.46468832 0.98940124 0.55036177 1.29123743 0.9916612  1.07341992
 1.03317712 1.37083124]


In [71]:
'''
Standard deviation of population using the equivalent of Excels STDEV.P.
'''
# Standard deviation for population
sd = np.std(a)

print(f"Standard Deviation of Population A: {sd:0.4f}")

Standard Deviation of Population A: 0.3666


In [72]:
'''
Getting random sample set from population.
'''

# Setting size of sample
n1 = 30

# Sample data set from poplution
sample_1 = np.random.choice(a, n1, replace=False)

# Printing the sample data set
print("Sample 1:")
print(sample_1)

Sample 1:
[0.65576175 0.9029945  1.22024025 1.05053786 0.64913574 0.63322598
 1.46974039 1.01034183 0.64952033 1.56249442 1.59382523 1.07341992
 1.240677   0.67610036 0.89190977 0.38842921 1.15645885 1.32013074
 0.98940124 1.27045912 0.70226566 1.21439177 0.47077446 0.78162246
 1.37083124 0.38996632 1.03518097 1.29123743 0.98727665 0.89902013]


In [73]:
'''
Standard deviation of sample 1 using the equivalent of Excels STDEV.P and STDEV.S.
'''

# Standard deviation using STDEV.P
sdp_samp_1 = np.sqrt(np.sum((sample_1 - np.mean(sample_1))**2)/len(sample_1))
# Standard deviation using STDEV.S
sds_samp_1 = np.sqrt(np.sum((sample_1 - np.mean(sample_1))**2)/(len(sample_1)-1))


print("Sample 1:")
print(f"Standard Deviation using STDEV.P: {sdp_samp_1:0.4f}")
print(f"Standard Deviation using STDEV.S: {sds_samp_1:0.4f}")

Sample 1:
Standard Deviation using STDEV.P: 0.3299
Standard Deviation using STDEV.S: 0.3355


In [75]:
'''
Getting second random sample set from population.
'''

# Setting size of sample
n2 = 20

# Sample data set from poplution
sample_2 = np.random.choice(a, n2, replace=False)

# Printing the sample data set
print("Sample 2:")
print(sample_2)

Sample 2:
[0.38842921 0.83758418 1.240677   1.29123743 0.89902013 1.46974039
 0.27655353 1.03518097 0.73824033 0.88460741 0.67610036 1.22024025
 0.64913574 1.15645885 1.21439177 0.98727665 0.64952033 0.9072126
 1.59382523 0.38996632]


In [77]:
'''
Standard deviation of sample 2 using the equivalent of Excels STDEV.P and STDEV.S.
'''

# Standard deviation using STDEV.P
sdp_samp_2 = np.sqrt(np.sum((sample_2 - np.mean(sample_2))**2)/len(sample_2))
# Standard deviation using STDEV.S
sds_samp_2 = np.sqrt(np.sum((sample_2 - np.mean(sample_2))**2)/(len(sample_2)-1))

print("Sample 2:")
print(f"Standard Deviation using STDEV.P: {sdp_samp_2:0.4f}")
print(f"Standard Deviation using STDEV.S: {sds_samp_2:0.4f}")  

Sample 2:
Standard Deviation using STDEV.P: 0.3516
Standard Deviation using STDEV.S: 0.3607


### Conclusion
***

In this example, the Standard Deviation of the entire population using STDEV.P is "0.3666".

First sample STDEV.P = "0.3299", STDEV.S = "0.3355".
Second sample STDEV.P = "0.3516", STDEV.S = "0.3607".

In both of these situation using STDEV.S gave us a standard deviation that was closer to the populations true standard deviation.

# End