# Machine learning and statistics Assessment 2020


In [18]:
import scipy.stats
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import chi2_contingency

### 1. Python function to calculate square root of 2 to 100 decimal places

Our assignment here is to write a python function, called sqrt2, that calculates and prints the square root of 2 to 100 decimal places. This has to be done without using any modules.

First I tried the easiest method I could think of, which was:

In [2]:
def sqrt2():
    x = 2 ** 0.5
    print(x)
    
sqrt2()

1.4142135623730951


Unfortunately, Python only calculated 16 decimal places, so let's try to get python to print out 100 decimals using the format() function.

In [20]:
def sqrt2():
    x = 2 ** 0.5
    print("{:.100f}".format(x))
    
sqrt2()

1.4142135623730951454746218587388284504413604736328125000000000000000000000000000000000000000000000000


Well, we got a bit further, 51 decimals this time. But then python is just showing us a bunch of zeros. 
We know that the square root of 2 is an irrational number, which never can be represented exactly by a finite number of digits,  so chances of this being correct are extremely small. Also, I checked what the correct value should be and I found this: 1.41421356237309504880168872420969807856967187537694 80731766797379907324784621070388503875343276415727 
https://nerdparadise.com/math/reference/2sqrt10000 

So, it seems we haven't gotten much closer to what we are looking for. This simple formula is not going to do, time to really delve into the subject and find a solution.

#### Newton's Method

When trying to find a manual way of calculating the square root, I came across Newton's method.
Basically, what this method does, is take an original guess at the square root, and then with a simple formula, get closer and closer, until the difference between the last and the second last guess is adequately small.


In [144]:
# Return square root of 2, using Newton's approach  
def squareRoot() : 

    # Initial guess 
    x = 2
    l = 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
    
    # Keep going until condition becomes false
    while True : 
         
  
        # Calculate closer and closer to square root, using Newton's Method 
        root = 0.5 * (x + (2 / x))  
  
        # If difference between root and x(last and second last guess) is smaller than l, break the loop. Use abs to convert negative int to positive 
        if (abs(root - x) < l) : 
            break 
  
        # last guess becomes root  
        x = root
        
    # Show 100 decimals
    s = format(root, ".100f")
        
    return s 
  

    print(squareRoot())

    # Loosely adapted from: https://www.geeksforgeeks.org/find-root-of-a-number-using-newtons-method/

In [145]:
squareRoot()

'1.4142135623730949234300169337075203657150268554687500000000000000000000000000000000000000000000000000'

Newton's method works, but still, Python won't calculate the square root of 2 to 100 digits.

After spending hours trying to figure out a way, I found something promising on stackoverflow. What if we try to find the square root of 2*10*200 (to leave 101 digits after square root), and then format it in a way that gets rid of the superfluous zero's at the end and displays it in the right way.

In [184]:
# Return square root of 2, adapting Newton's approach 

def sqrt2() : 

# Set initial value for x
    x = 2 * 10 ** 200

    r = x

    def difference(x, r):
        d1 = abs(x - r**2) # Make sure d1 is always an integer 
        d2 = abs(x - (r-1)**2)
        d3 = abs(x - (r+1)**2)
        minimised = d1 <= d2 and d1 <= d3
        below_min = d3 < d2
        return minimised, below_min

    while True:
        oldr = r
        r = (r + x // r) // 2

        minimised, below_min = test_diffs(x, r)
        if minimised:
            break

        if r == oldr:
            if below_min:
                r += 1
            else:
                r -= 1
            minimised, _ = test_diffs(x, r)
            if minimised:
                break

    print(f'{r // 10**100}.{r % 10**100:0100d}')

        
    # Show 100 decimals
   # s = format(root, ".100f")
        
   # return s
  

    # Adapted from https://stackoverflow.com/questions/64278117/is-there-a-way-to-create-more-decimal-points-on-python-without-importing-a-libra

In [185]:
sqrt2()

1.4142135623730950488016887242096980785696718753769480731766797379907324784621070388503875343276415727


### 2. Chi-squared test

A Chi-squared test tells us if there is a statistically significant difference between the expected frequencies and the observed frequencies in one or more categories of a contingency table. https://en.wikipedia.org/wiki/Chi-squared_test 

First, I need to create a Pandas DataFrame from the table, so I can perform the Chi-squared test on it. I got some help on how to create a DataFrame and how to rename the rows here:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html

In [11]:
df = pd.DataFrame(np.array([[90, 60, 104, 95], [30, 50, 51, 20], [30, 40, 45, 35]]), columns=["A", "B", "C", "D"])
df.rename(index={0: "White collar", 1: "Blue collar", 2: "No collar"})


Unnamed: 0,A,B,C,D
White collar,90,60,104,95
Blue collar,30,50,51,20
No collar,30,40,45,35


In [13]:
chi2, p, dof, expected = chi2_contingency(df) # Learned how to do this here: https://stackoverflow.com/questions/43963606/python-pandas-chi-squared-test-of-independence

In [14]:
chi2 # Display Chi-squared value

24.5712028585826

In [15]:
p # Display the associated p-value

0.0004098425861096696

In [16]:
dof # display the degrees of freedom

6

In [17]:
expected # Display the expected values if neighbourhood of residence is independent of occupation.

array([[ 80.53846154,  80.53846154, 107.38461538,  80.53846154],
       [ 34.84615385,  34.84615385,  46.46153846,  34.84615385],
       [ 34.61538462,  34.61538462,  46.15384615,  34.61538462]])

With a p-value of 0.0004 it is extremely unlikely that random chance can explain the observed values.