### Tasks Assignment
1. Write a Python function called sqrt2 that calculates and prints to the screen the square root of 2 to 100 decimal places.   <br>Your code should not depend on any module from the standard library or otherwise. You should research the task first and include references and a description of your algorithm  

2. Use scipy.stats to verify the Chi-squared value of 24.16 for the table below (see section 2) and calculate the associated p value. You should include a short note with references justifying your analysis in a markdown cell.

3. Research the STDEV.P and STDEV.S excel functions, writing a note in a Markdown cell about the difference between them. Then use numpy to perform a simulation demonstrating that the STDEV.S calculation is a better estimate for the standard
deviation of a population when performed on a sample.

<img src="banner_copyright_PeaLeafDesign.JPG" style="width:900px;">

<b>---------------------------------------------------------------------------------------------------------------------------------------------------------
## Task 1
</b>

## Background

###  $ \sqrt{2} $ is about 1.41, so what is the big deal?

The square root of $x$ is a number $r$ such that $r2=x$ <br>

Simple examples are <br>
square root of 9 = 3 (3 $*$ 3 = 9)<br>
square root of 2916 = 54 (54 $*$ 54 =2916)<br>

The square root of 2 is exciting because although it's "roughly 1.41" - in an attempt to quantify the decimal expansion in 2016 Ron Watkins estimated that there is 10 trillion digits after the decimal point.<br>

That's because $ \sqrt{2} $ is <b>irrational</b>. 
<br>
A number is irrational when it cannot be expressed as a fraction, therefore their decimal expansions go on to infinity and never become periodic.<br>

The implication, in terms of computing,  of irrationality is that the number cannot be stored as a variable - there are simply too many decimal places.<br>

The challenge of this task is not only to figure it out without a python module but also to limit the number of decimal places to one hundred.<br>


### What is a python function and how do they work?

Functions in computing are an example of encapsulation where one or many lines of complex code and instructions can be wrapped up into one command. A function can be re-used between programmes, where the use of parameters can allow it to be used in different circumstances as required. <br>
<br>
To create a python function, define it with the keyword "def", followed by the name you are giving it.<br>
 **def sqrt(x):**<br>
Brackets come immediately after the function name and specify the type of mandatory input the function accepts.<br>
Then a description of what happens to the arguments passed into the function.<br>
**ans=x **   0.5**<br>    
Indentation is important because Python knows what code is related to a function by how it’s indented.<br>
The return keyword is used in order to print the output from the function.<br>
**return ans**<br>
To call the function type the name followed by (), the argument is typed between the brackets.<br>
**sqrt(7)**<br>


### Research carried out to produce sqrt2 function  

Having established the syntax, the function needs an instruction of what to do.<br>
I can divide the function to code into two separate parts (1) Calculate the square root (2) ensure decimal precision of 100 places and print them to the screen.<br> 

####  Calculate the square root
The code needs to be a raw python statement (with no reference to a python library) in order to meet the requirements of the assignment.<br>

To calculate the square root of $x$ <br>
$x **   0.5$
(Source: [Py Point youtube channel](https://www.youtube.com/watch?v=6red7dqIY-c) "how to find square root in python without math")

In [6]:
2**0.5

1.4142135623730951

In [None]:
The accuracy of this calculation can be checked by calculating its inverse.

In [4]:
1.4142135623730951**2

2.0000000000000004

The fact that the answer is not exactly 2 is further proof of the irrationality of  $ \sqrt{2} $

#### Print to the screen to 100 decimal places

It can be observed above that the default output is 16 digits after the decimal point.<br>
This is because, in order for Python to manage the number of digits, only an approximation to the true decimal value of the binary approximation stored by the machine is printed to the screen.<br>
How can I force a total of 100 to be printed?

The format() built-in function is useful up to a point as it prints out 51 decimal places but after that replaces the actual values with 0.

In [17]:
format(sqrt2(), '.50f')

'1.41421356237309514547462185873882845044136047363281'

In [31]:
#Example of an attempt to use format() to print 100 decimals()
#After 51 digits the actual values are replaced with 0
format(sqrt2(), '.100f')

'1.4142135623730951454746218587388284504413604736328125000000000000000000000000000000000000000000000000'

#### Source a pre-exising algorithm that does both at once

At this point, I concluded that there is a lot more to the task then simply formatting the output of $x **   0.5$<br>
A more complex algorithm is required.

User casevh at [Stackoverflow.com](https://stackoverflow.com/questions/5187664/generating-digits-of-square-root-of-2) stackoverflow.com provides a code to generate the digits that make up the square root of an integer for any level of precision.

In [39]:
#code verbatim from https://stackoverflow.com/questions/5187664/generating-digits-of-square-root-of-2
def sqroot(a, digits):
    a = a * (10**(2*digits))
    x_prev = 0
    x_next = 1 * (10**digits)
    while x_prev != x_next:
        x_prev = x_next
        x_next = (x_prev + (a // x_prev)) >> 1
    return x_next

It can be observed below that the function produces the digits but further manipulation is required for them to appear in the correct format.

In [40]:
sqroot(2,100)

14142135623730950488016887242096980785696718753769480731766797379907324784621070388503875343276415727

#### Validate the answer

Validate that the number produced by casvh's algorithm is correct by cross referencing it with another source.
I chose the website [Astronomy Picture of the Day](https://apod.nasa.gov/htmltest/rjn_dig.html) a hobby website featuring some wonderful retro html published by two professional astronomers.

In [13]:
x = sqroot(2,100)#casevh result
y = 14142135623730950488016887242096980785696718753769480731766797379907324784621070388503875343276415727#apod 100digits
ans = x-y
ans

0

#### Formatting numbers with python

Formatting the output as an integer will not work as previously noted.<br>
The str() method is used to produce a string version of the number that I can manipulate using python string slicing.

In [41]:
my_100digits=str(sqroot(2,100))

In [42]:
my_100digits

'14142135623730950488016887242096980785696718753769480731766797379907324784621070388503875343276415727'

In [12]:
print(my_100digits)

14142135623730950488016887242096980785696718753769480731766797379907324784621070388503875343276415727


Use of slicing to print symbol at index 0 a decimal point and then the rest of the numbers from index 0-101

In [26]:
print(my_100digits[:1]+".",my_100digits[1:101])

1. 4142135623730950488016887242096980785696718753769480731766797379907324784621070388503875343276415727


I have found a series of methods that will print the square root of 2 to 100 decimal places to the screen. They now need to be assembled in a logical manner into a Python function called sqrt2. 

### Description of the algorithm

In [36]:
def sqrt2(a, digits):
    a = a * (10**(2*digits))#calculate the integer square root of a after multiplying by 10 raised to the 2 x digits
    x_prev = 0
    x_next = 1 * (10**digits)
    while x_prev != x_next:
        x_prev = x_next
        x_next = (x_prev + (a // x_prev)) >> 1
        my_101digits=str(x_next)#stringify
    return(my_101digits[:1]+".",my_101digits[1:101])#use of string slicing
   #return x_next

In [37]:
#Call the function with the parameters 2 and 100 to indicate the number required to be squared and the decimal precision required.
sqrt2(2,100)

('1.',
 '4142135623730950488016887242096980785696718753769480731766797379907324784621070388503875343276415727')

## References Task 1

Root 2 Video- Numberphile youtube channel [Online] https://www.youtube.com/watch?v=5sKah3pJnHI  Accessed October 11, 2020

how to find square root in python without math||how to find square root in python Video on Py point youtube channel [Online] https://www.youtube.com/watch?v=6red7dqIY-c   Accessed October 11, 2020


https://stackoverflow.com/questions/64295245/how-to-get-the-square-root-of-a-number-to-100-decimal-places-without-using-any-l[Online]   Accessed October 11, 2020

   
    
https://en.wikipedia.org/wiki/Square_root_of_2   [Online]   Accessed October 11, 2020


https://mathworld.wolfram.com/SquareRoot.html   [Online]   Accessed October 11, 2020


https://mathworld.wolfram.com/IrrationalNumber.html  [Online]   Accessed October 11, 2020

10,000 digits of Square Root of 2  [Online]
https://nerdparadise.com/math/reference/2sqrt10000
Accessed October 23, 2020

How to find square root in python without math [Online]
Available from Py Point: https://www.youtube.com/watch?v=6red7dqIY-c
Accessed October 11, 2020

User: casevh
https://stackoverflow.com/questions/5187664/generating-digits-of-square-root-of-2 [Online]
Accessed October 23, 2020

MCLOUGHLIN, IAN 2020. Machine Learning and Statistics - Functions in functions (Video Lecture) [Online] 
Available from: https://learnonline.gmit.ie/

RONQUILLO, ALEX. The Python Square Root Function [Online]
https://realpython.com/python-square-root-function/
Accessed October 23, 2020

The Python Tutorial  [Online]
https://docs.python.org/3/tutorial/index.html
Accessed 17 October, 2020

<b>---------------------------------------------------------------------------------------------------------------------------------------------------------
## Task 2
</b>

|   	|   A	|   B   |  	C   | D	|total |
|---	|---	|---	|---	|---	| |
|  White collar 	|  90 	|   60	|   104	|   95	|349 |
|  Blue collar 	|   30	|   50	|   51	|   20	|151 |
|  No collar 	|   30	|   40	|   45	|   35	|150 |
|  Total     |     150  |   150    |  200     |   150    | 650|

![](table.jpg)

## Background

The table above is from the Chi-squared test [wikipedia article](https://en.wikipedia.org/wiki/Chi-squared_test).  It is investigated in the article in order to explore and demonstrate
>The null hypothesis ... that each person's neighborhood of residence is independent of the person's occupational classification

The purpose of my task is to analyse the dataset using `scipy.stats` python package, reproduce that result (approximately 24.6) and calculate the associated $p$ value. 

## The Code

In [4]:
import scipy.stats
from scipy.stats import chisquare
#scipy.stats.chisquare(f_obs, f_exp=None, ddof=0, axis=0)
#chisquare([90,30,30,60,50,40,104,51,45,95,20,35]) #first incomplete array
chisquare([90,30,30,60,50,40,104,51,45,95,20,35], f_exp=[80.54,34.85,34.61,80.54,34.85,34.61,107.38,46.46,46.15,80.54,34.85,34.61])


Power_divergenceResult(statistic=24.570833837270943, pvalue=0.010529919253757828)

## Summary of Task 2

SciPy.stats
----
SciPy documentation sets out that the chisquare() function will calculate a one-way chi-square test and return the $p$ value when the following parameters are passed in: *def chisquare(f_obs, f_exp=None, ddof=0, axis=0):*<br>
<br>
**Parameters**<br>
f_obs : array_like.  Observed frequencies in each category.<br>
f_exp : array_like, optional.  Expected frequencies in each category. <br> 
ddof : int, optional.  Default is 0.<br> 
axis : int or None, optional. Default is 0.<br> 
**Returns**<br>
chisq : float or ndarray. The chi-squared test statistic. <br>
p : float or ndarray.  The p-value of the test. <br> 
<br>
Input
---
The wikipedia table provides a random sample of recorded (observed) data but there is no expected frequencies available.<br>
The "Delta degrees of freedom" are also not available and therefore set to default of 0.<br>
In terms of the axis, when it is null, the values in the first array are treated as a single data set which is true for this example.<br>
<br>
Upon calling the function with only *f_obs*, included the result (156.6215384) was very different to that provided on the wiki page.<br> The *f_exp* parameter, while deemed optional in the function documentation is required to get an accurate result in this instance.<br>
The original article provides an equation to estimate the expected figures of white collar workers in Neighbourhood A: <br> the total sample of residents in Neighbourhood A (150) <br>
is multiplied by the total number of white collar workers in entire sample divided by the sample total (349/650)<br>
![](formula.jpg)
I used this formula to provide an array of expected frequencies <br>[80.54,34.85,34.61,80.54,34.85,34.61,107.38,46.46,46.15,80.54,34.85,34.61] <br>and passed them into the function.<br>
Output
---
This additional data produced a statistic of 24.570833837270943.<br>
My result corresponds with the one published on wikipedia.org.  This is a logical outcome because the chi-squared test is a standard in the study of statistics.

The $p$ value, is low (0.01) which indicates the null hypothesis can be rejected. According to [Admond Lee](https://towardsdatascience.com/p-values-explained-by-data-scientist-f40a746cfc8)
> The lower the p-value, the more surprising the evidence is, the more ridiculous our null hypothesis looks.

In lectures covering t-tests and statistics as part of this module, an important principle put across by Dr Ian McLoughlin is that it is easy to run the code that produces statistics - the hard part for a data analyst is to really understand the input and interpret the output.<br>

There is much more work that can be done on this data set, but that is a "task" for another day.

## References Task 2

Chi-squared test https://en.wikipedia.org/wiki/Chi-squared_test [Online] Accessed November 08, 2020<br>
MCLOUGHLIN, IAN 2020. Machine Learning and Statistics - From t-tests to ANOVA (Video Lecture) [Online] Available from: https://learnonline.gmit.ie/<br>
LEE, ADMOND 2019. P-values Explained By Data Scientist https://towardsdatascience.com/p-values-explained-by-data-scientist-f40a746cfc8 [Online] Accessed November 08, 2020<br>
scipy.stats.chisquare¶ https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html [Online] Accessed November 08, 2020<br>


<b>---------------------------------------------------------------------------------------------------------------------------------------------------------
## Task 3
</b>

### What are STDEV.P and STDEV.S excel functions, and what is the difference between them?

The STDEV function in Microsoft excel was replaced with STDEV.P and STDEV.S in Excel 2010.

### Use of numpy to perform a simulation demonstrating that the STDEV.S calculation is a better estimate for the standard deviation of a population when performed on a sample.

## References Task 3

STDEV.S function https://support.microsoft.com/en-us/office/stdev-s-function-7d69cf97-0c1f-4acf-be27-f3e83904cc23 [Online] Accessed November 18, 2020
STDEV.P function https://support.microsoft.com/en-us/office/stdev-p-function-6e917c05-31a0-496f-ade7-4f4e7462f285 [Online] Accessed November 18, 2020

<b>---------------------------------------------------------------------------------------------------------------------------------------------------------
## Task 4
</b>