# Task 1 - Calculating the square root of 2

[4]*The square root of 2 or root 2 is represented using the square root symbol √ and written as √2 whose value is 1.414. This value is widely used in mathematics. Root 2 is an irrational number as it cannot be expressed as a fraction and has an infinite number of decimals. So, the exact value of the root of 2 cannot be determined.*

## Newton's Method:
[2]
![Newton's Method](./images/newtons_method.png)

Newton's Method of calculating square roots is an iterative root-finding algorithm, that produces successively better approximations of the root. Where ${\alpha}_{0} $ is an initial approximation of $\sqrt{N}$. If the initial approximation is suitably chosen, the process converges quickly and accurate approximations to $\sqrt{N}$ are obtained after only a few iterations. However, if extended multiple-precision approximations to $\sqrt{N}$ are sought, the computation time increases rapidly because of the times required for dividing ${N}$ by a many-digit number. Generally, the time required for floating-point division on modern electronic computers compared to floating-point multiplication is at least twice as much for double precision computations.

![Iterational Accuracy](./images/iterations.png)

This is an example of the algorithm each time it runs. Where ${\alpha}_{0}=1$, ${\alpha}_{1}=1.5$, and so on for 5 iterations until it is accurate to 100 decimal places (python can only display 52).

### References
[1] The square root of 2; Ian McLoughlin; https://web.microsoftstream.com/video/214c8379-7c67-45b5-910d-39ec5d269223<br/>
[2] The square root of 2 to 1 million decimals; Jacques Dutka; https://www.jstor.org/stable/2004359?seq=1&cid=pdf-reference<br/>
[3] Methods of Computing Square Roots; Wikipedia; https://en.wikipedia.org/wiki/Methods_of_computing_square_roots
[4] Square Root of 2; Byju's Classes; https://byjus.com/maths/square-root-of-2/

In [3]:
def sqrt2(N):
    
    # Initial estimation of the square root. In this case 2 / 2 = 1. Actual root is 1.414
    a = N / 2.0
    
    # Precision variable, 1e-6 gave a result accurate to 100 decimals.
    precision = 0.000001

    # If the difference between N and a^2 is greater than 0.000001, continue the loop.
    while abs(N - (a*a)) > precision: 
        a = (a + (N/a)) / 2.0 # Gives a more accurate estimation than the initial estimate each time it is iterated.
        
    return a

ans = sqrt2(2)

# Multiply answer by a googol to convert it to an integer, as you can only format floats to 52 decimals in python.
ans = ans * (10**100)

# Convert to a string and insert decimal place.
s = str(int(ans))
ans = s[:1] + '.' + s[1:]

In [8]:
print("The Square Root of 2 to 100 decimals =\n %s" % ans)

The Square Root of 2 to 100 decimals =
 1.4142135623746899583029298868490835765746669908143409641927621954070784222600499434409744805268553728


# Task 2 - Pearson's chi-squared test

A statistical hypothesis test that is used to determine whether there is a statistically significant difference between the expected frequencies and the observed frequencies in one or more categories of a contingency table. The formula below is used to calculate the difference between the observed and expected in a table of values.

$\chi^2 = \mathbf{N} \sum \limits _{i=1} ^n \frac{(O _i / \mathbf{N} - p _i)^2}{p _i}$

Where:

$\chi^2$ = Pearson's cumulative test statistic.<br>
$O _i$ = the number of observations of type i.<br>
$\mathbf{N}$ = total number of observations.<br>
$E_i = N_p{_i}$ = the expected (theoretical) count of type i, asserted by the null hypothesis that the fraction of type i in the population is $p_i$<br>
$n$ = the number of cells in the table

The chi-squared statistic can then be used to calculate a p-value by comparing the value of the statistic to a chi-squared distribution. The number of degrees of freedom is equal to the number of cells $n$, minus the reduction in degrees of freedom, $p$.

Applied to this table of values:

![Table](./images/table_of_values.png)

We obtain a chi-squared value of **24.5712028585826**<br>
And a $p$ value of **0.0004098425861096696**

### References:
[1] Chi-squared test; Wikipedia;
https://en.wikipedia.org/wiki/Chi-squared_test<br>
[2] Pearson's Chi-squared test; Wikipedia;
https://en.wikipedia.org/wiki/Pearson's_chi-squared_test<br>
[3] Chi-Square Procedures for the Analysis of Categorical Frequency Data; Richard Lowry; https://web.archive.org/web/20171022032306/http://vassarstats.net:80/textbook/ch8pt1.html <br>

In [5]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2_contingency
from scipy import stats

obs = np.array([[90, 60, 104, 95], [30, 50, 51, 20], [30, 40, 45, 35]])
g, p, dof, expctd = chi2_contingency(obs, lambda_="pearson")

# Chi-square and p-value.
print(g, p)

24.5712028585826 0.0004098425861096696


# TASK 3 - Research of STDEV.P vs STDEV.S

### Brief:
*The standard deviation of an array of numbers x is
calculated using numpy as **np.sqrt(np.sum((x - np.mean(x))^2)/len(x))** .
However, Microsoft Excel has **two different versions** of the standard deviation
calculation, **STDEV.P and STDEV.S** . The STDEV.P function performs the above
calculation but in the STDEV.S calculation **the division is by len(x)-1** rather
than **len(x)**. Research these Excel functions, writing a note in a Markdown cell
about the difference between them. Then use numpy to perform a simulation
demonstrating that the STDEV.S calculation is a better estimate for the standard
deviation of a population when performed on a sample. Note that part of this task
is to figure out the terminology in the previous sentence.*

## Differences between STDEV.P & STDEV.S

### STDEV.P:
STDEV.P is an excel function used when calculating the standard deviation of an entire population. A population data set contains all members of a specified group, this is the entire list of possible data values. Uses the count of **n** in formulae.

For example, the population may be "ALL people living in the US."

### STDEV.S:
STDEV.S is an excel function used when calculating a sample of a data set. A sample data set contains a part, or a subset, of a population. The size of a sample is always less than the size of the population from which it is taken. This utilizes the count of **n-1** in formulae

Example: The sample may be "SOME people living in the US."

### Differences:
The only difference between the formulae is that for the sample standard deviation you divide by n-1, n is subtracted by 1 to get an unbiased sample deviation. Subtracting by 1 means that the sample standard deviation will be a **larger** number.

See the example below to understand why n-1 is a better estimate for a sample variance.


In [10]:
import numpy as np
import random

# Create dataset of 100 random integers between 1 and 20.
x = []
n = 100
numLow = 1
numHigh = 20

for i in range (0, n):
    x.append(random.randint(numLow, numHigh))
x.sort()

# Calculating the population standard deviation.
stdevp = np.sqrt(np.sum((x - np.mean(x))**2)/len(x))
print("Entire population with STDEV.P = %1.15f" % stdevp)

# Create a sample set of x, using 50% of the data that is in x.
y = x[1:10] + x[30:40] + x[60:70] + x[90:100]

# Calculating the un-biased sample standard deviation.
stdevs_unbiased = np.sqrt(np.sum((y - np.mean(y))**2)/len(y)-1)
print("Sample population with unbiased STDEV.S = %1.15f" % stdevs_unbiased)

# Calculating the biased sample standard deviation.
stdevs_biased = np.sqrt(np.sum((y - np.mean(y))**2)/len(y))
print("Sample population with biased STDEV.S = %1.15f" % stdevs_biased)

Entire population with STDEV.P = 6.022491178905952
Sample population with unbiased STDEV.S = 6.665532447894781
Sample population with biased STDEV.S = 6.740127803976583


### Results:

As we can see from the results, the unbiased formula is generally a decimal or two closer to the population standard deviation than the biased formula.

### References:
* Population VS Sample Data; MathBitsNotebook.com; http://mathbitsnotebook.com/Algebra1/StatisticsData/STPopSample.html
* Measures of Spread; MathBitsNotebook.com; http://mathbitsnotebook.com/Algebra1/StatisticsData/STSpread.html
* Why we divide by n-1 for unbiased sample variance; Sal Khan; https://www.khanacademy.org/math/ap-statistics/summarizing-quantitative-data-ap/more-standard-deviation/v/review-and-intuition-why-we-divide-by-n-1-for-the-unbiased-sample-variance