# Tasks
These are my solutions to the "Tasks 2020" assessment as part of my fourth year module "Emerging Technologies".
***

### Task 1: Write a Python function called ```sqrt2``` that calculates and prints to the screen the square root of 2 to 100 decimal places. Your code should not depend on any module from the standard library1 or otherwise. You should research the task first and include references and a description of your algorithm.

The square root of 2 can be calculated by using the Babylonian method [1] [2].

$$ x_{0} \approx \sqrt{S}, $$ <br>
$$ x_{n+1} = \frac{1}{2} \left (x_{n} + \frac{S}{x_{n}} \right), $$ <br>
$$ \sqrt{S} = \lim_{n \to \infty} x_{n}. $$

The Babylonian method works by making an initial guess of the square root called $ x_{0} $ based on the square root $ S $. Then apply the formula to get a better approximation towards the square root until the approximation is the same as the previous iteration.

Due to limitations with the floats in Python, it is not possible to print out a number to 100 decimal places [3]. It is possible to overcome this limitation by multiplying the answer by 10**200 and then converting it to a string and then a list [4].

#### Example: Square Root of 2
Make an initial guess of 1.2 and apply the formula to get a better approximation of the square root. $ S $ is 2.

$ x_{0} \approx 1.2 $

$ x_{1} = \frac{1}{2} \left ({1.2} + \frac{2}{{1.2}} \right) = 1.433 $

$ x_{2} = \frac{1}{2} \left ({1.433} + \frac{2}{{1.433}} \right) = 1.414 $

$ x_{3} = \frac{1}{2} \left ({1.414} + \frac{2}{{1.414}} \right) = 1.414 $

Hence $ \sqrt{2} \approx 1.414 $

In [103]:
def sqrt2(S):
    """
    A function to calculate the square root of 2 using the Babylonian method.
    """

    # Guess of the approximation of the square root.
    guess = S / 2.0

    # Add 1 to the guess.
    x = guess + 1

    # Loop until the guess and x are the same.
    while(guess != x):
        # x becomes the value of the guess.
        x = guess

        # Formula is applied.
        guess = (guess + (S / guess)) / 2    

    # The task is to print out the number 2 to 100 decimal places and not every number. As such the below only works for 2.
    if (S == 2):
        # Increase the value of guess.
        guess = guess * (10 ** 200)

        # Get the guess without scientific notation.
        answer = "{0:.0f}".format(guess)

        # Convert the float to a string and then a list.
        answer = str(answer)
        answer = list(answer)

        # Then insert a decimal point.
        answer.insert(1, ".")

        # Combine all the digits.
        answer = "".join(answer)

        # Set answer to 100 decimal places.
        answer = answer[0:102]

        return answer

    # For any number other than 2.
    else:
        return guess

print("The square root of 2 is " + sqrt2(2))

The square root of 2 is 1.4142135623730948729028552009655273607485832256919402696392729172311702355446324552662687021007677738


In [None]:
# Test the function on 100.
print("The square root of 100 is {:.100f}".format(sqrt2(100)))

In [None]:
# Test the function on 5.
print("The square root of 5 is {:.100f}".format(sqrt2(5)))

In [None]:
# Test the function on 3.
print("The square root of 3 is {:.100f}".format(sqrt2(3)))

In [None]:
# Test the function on 932.
print("The square root of 932 is {:.100f}".format(sqrt2(932)))

A better way to calcaulate the square root of number to 100 decimal places is as follows:

In [4]:
def sqrt2V2(S):
    x = S * 10 ** 200

    r = x

    def test_diffs(x, r):
        d0 = abs(x - r**2)
        dm = abs(x - (r-1)**2)
        dp = abs(x - (r+1)**2)
        minimised = d0 <= dm and d0 <= dp
        below_min = dp < dm
        return minimised, below_min

    while True:
        oldr = r
        r = (r + x // r) // 2

        minimised, below_min = test_diffs(x, r)
        if minimised:
            break

        if r == oldr:
            if below_min:
                r += 1
            else:
                r -= 1
            minimised, _ = test_diffs(x, r)
            if minimised:
                break

    return f"{r // 10**100}.{r % 10**100:0100d}"

print(sqrt2V2(2))

1.4142135623730950488016887242096980785696718753769480731766797379907324784621070388503875343276415727


#### References
[1] Methods of computing square roots; Babylonian method; https://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Babylonian_method

[2] Python Math: Computing square roots using the Babylonian method; Python Math: Exercise-18 with Solution; https://www.w3resource.com/python-exercises/math/python-math-exercise-18.php

[3] 15. Floating Point Arithmetic: Issues and Limitations; https://docs.python.org/3/tutorial/floatingpoint.html

[4] Is there a way to create more decimal points on Python without importing a library/module?; https://stackoverflow.com/a/64278569

***
### Task 2: The Chi-squared test for independence is a statistical hypothesis test like a t-test. It is used to analyse whether two categorical variables are independent. The Wikipedia article gives the table below as an example [4], stating the Chi-squared value based on it is approximately 24.6. Use ```scipy.stats``` to verify this value and calculate the associated p value. You should include a short note with references justifying your analysis in a markdown cell.

The second task is to verify the chi-squared value of approximately 24.6 using the data from the table below.

|              	| A   	| B   	| C   	| D   	| total 	|
|--------------	|-----	|-----	|-----	|-----	|-------	|
| White collar 	| 90  	| 60  	| 104 	| 95  	| 349   	|
| Blue collar  	| 30  	| 50  	| 51  	| 20  	| 151   	|
| No collar    	| 30  	| 40  	| 45  	| 35  	| 150   	|
| Total        	| 150 	| 150 	| 200 	| 150 	| 650   	|

The null hypothesis is that each person's neighborhood of residence is independent of the person's occupational classification.

A chi-square (χ2) statistic is a test that measures how a model compares to actual observed data. The data used in calculating a chi-square statistic must be random, raw, mutually exclusive, drawn from independent variables, and drawn from a large enough sample. For example, the results of tossing a fair coin meet these criteria [5].

A p value is used in hypothesis testing to help you support or reject the null hypothesis. The p value is the evidence against a null hypothesis. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis [8].

The formula to calculate the expected value for a cell is: [6]

$$ E_{ij} = \frac{R_iC_j}{N} $$

Where
<br>$ R $  = row
<br>$ C $ = column
<br>$ N $ = total
<br>for $i$th row and $j$th column

The chi-squared formula is: [7]

$$ \chi^2 = \sum \limits_{i=1}^n \frac{(O_i - E_i)^2}{E} $$

Where
<br>$ \chi^2 $ = chi squared
<br>$ {O}_i $ =	observed value
<br>$ E_{i}	$ =	expected value

#### Example: Calculating the Expected Value of White Collar Workers for Column A
Calculating the expected value of white collar workers for column A would be as follows:

$ E = \frac{349 \times 150}{650} $

$ E \approx 80.5385 $

Then use the chi-square formula where the observed value $ O $ is 90 and expected value $ E $ is 80.5385:

$ \chi^2 = \frac{(90 - 80.5385)^2}{80.5385} $

$ \chi^2 \approx 1.1115 $

Doing this for all cells and adding them together will give an approximate value of 24.6.

In [112]:
import scipy.stats as stats
"""
Code to get chi-squared statistics for the data in the table.
"""

# Populate the arrays.
whiteCollar = [90, 60, 104, 95]
blueCollar = [30, 50, 51, 20]
noCollar = [30, 40, 45, 35]

data = [whiteCollar, blueCollar, noCollar]

chi2, p, dof, ex = stats.chi2_contingency(data)

# Print out the values.
print("chi2: ", chi2)
print("p-value: ", p)
print("degrees of freedom: ", dof)
print("expected frequencies:")
print(ex)

chi2:  24.5712028585826
p-value:  0.0004098425861096696
degrees of freedom:  6
expected frequencies:
[[ 80.53846154  80.53846154 107.38461538  80.53846154]
 [ 34.84615385  34.84615385  46.46153846  34.84615385]
 [ 34.61538462  34.61538462  46.15384615  34.61538462]]


The p-value of the data is 0.0004098425861096696. As this p-value is less than 0.05, it is statistically significant against the 
null thypothesis. Therefore, we must reject the null thypothesis [9].

#### References
[5] Chi-Square (χ2) Statistic Definition; What Is a Chi-Square Statistic?; https://www.investopedia.com/terms/c/chi-square-statistic.asp

[6] The chi-square test; Getting expected values; https://web.stanford.edu/class/psych252/cheatsheets/chisquare.html;

[7] https://www.gstatic.com/education/formulas/images_long_sheet/en/chi_squared_test.svg;

[8] P Value Definition; https://www.statisticshowto.com/p-value/;

[9] What a p-value tells you about statistical significance; https://www.simplypsychology.org/p-value.html

### Task 3: The standard deviation of an array of numbers ```x``` is calculated using ```numpy``` as ```np.sqrt(np.sum((x - np.mean(x))**2)/len(x)) ```. However, Microsoft Excel has two different versions of the standard deviation calculation, ```STDDEV.P``` and ```STDDEV.S ```. The ```STDDEV.P``` function performs the above calculation but in the ```STDDEV.S``` calculation the division is by ```len(x)-1``` rather than ```len(x) ```. Research these Excel functions, writing a note in a Markdown cell about the difference between them. Then use ```numpy``` to perform a simulation demonstrating that the ```STDDEV.S``` calculation is a better estimate for the standard deviation of a population when performed on a sample. Note that part of this task is to figure out the terminology in the previous sentence