# Tasks 2020

These are my solutions to the Tasks assessment for the module Machine Learning and Statistics at GMIT. The author is Alexandra Macuga (G00376287@gmit.ie).

***

### Task 1: Calculate a square root

***

#### Instructions

Write a Python function called **sqrt2** that calculates and prints to the screen the square root of 2 to 100 decimal places. Your code should not depend on any module from the standard library or otherwise. You should research the task first and include references and a description of your algorithm.

By the standard library, we mean the modules and packages that come as standard with Python. Anything built-in that can be used without an import statement can be used.

#### Research

In mathematics, a square root of a number $x$ is a number $y$ such that 

$$y2 = x$$

in other words, a number $y$ whose square (the result of multiplying the number by itself, or $y$ * $y$) is $x$.[1] 

The square root of 2, or the one-half power of 2, written in mathematics as $$ \sqrt{2} $$ or $$ 2^{1/2} $$ is the positive algebraic number that, when multiplied by itself, equals the number 2.[2] Technically, it must be called the principal square root of 2, to distinguish it from the negative number with the same property.

Geometrically, the square root of 2 is the length of a diagonal across a square with sides of one unit of length;[2] this follows from the Pythagorean theorem. It was probably the first number known to be irrational.[2] The fraction

$$ \frac{99} {70} (\approx 1.4142857)$$

is sometimes used as a good rational approximation with a reasonably small denominator.

![title](img/img2.png)

We can calculate the square root of number using Newton's method [3, 4]. To find the square root $z$ of a number $x$, we can iterate using the following equation.

$$ z = z - \frac{z^2 - x} {2z} $$


[1] Wikipedia; Square root; https://en.wikipedia.org/wiki/Square_root

[2] Wikipedia; Square root of 2; https://en.wikipedia.org/wiki/Square_root_of_2

[3] A Tour of Go; Exercise: Loops and Functions; https://tour.golang.org/flowcontrol/8

[4] Wikipedia; Newton's method; https://en.wikipedia.org/wiki/Newton%27s_method


#### Solution

In [1]:
def sqrt2(x):
    """
    A function to calculate the square root of a number x.
    """
    # Initial guess for the square root z.
    z = x / 2
    # Loop until we're happy with the accuracy.
    while abs(x - (z * z)) > 0.000001:
        # Calculate a better guess for the square root.
        z -= (z*z - x) / (2 * z)
    # Return the (approximate) square root of x to 100 decimal places.
    z = ("%.100f" % z)
    return z

#### Tests of the function

Here we test the function with some known values.

In [2]:
# Test the function on 100.
sqrt2(100)

'10.0000000001074464961448029498569667339324951171875000000000000000000000000000000000000000000000000000'

In [3]:
# Return square root of 2 to 100 decimal places using my function sqrt2 and imported modules
# Adapted from: https://stackoverflow.com/a/4733196

import math
from decimal import getcontext, Decimal

a =  sqrt2(2)

b = math.sqrt(2)
b = format(b, '.100f')

getcontext().prec = 100
c = Decimal(2).sqrt()

print("Sqrt2: ", a,  "\n", "Math: ", b, "\n", "Decimal: ", c)

Sqrt2:  1.4142135623746898698271934335934929549694061279296875000000000000000000000000000000000000000000000000 
 Math:  1.4142135623730951454746218587388284504413604736328125000000000000000000000000000000000000000000000000 
 Decimal:  1.414213562373095048801688724209698078569671875376948073176679737990732478462107038850387534327641573


In [4]:
# Check if any of the results are the same number
a == b or b==c or a==c

False

***

### Task 2: The Chi-squared test

***

#### Instructions

The Chi-squared test for independence is a statistical hypothesis test like a t-test. It is used to analyse whether two categorical variables are independent. The Wikipedia article gives the table below as an example [4], stating the Chi-squared value based on it is approximately *24.6*. Use **scipy.stats** to verify this value and calculate the associated *p* value. You should include a short note with references justifying your analysis in a markdown cell.

![Table](img/img1.png)

[4] Wikipedia contributors, “Chi-squared test — Wikipedia, the free encyclopedia,” 2020, [Online; accessed 1-November-2020]. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Chi-squared_test&oldid=983024096

#### Research

#### Solution

***

### Task 3: The standard deviation

***

#### Instructions

The standard deviation of an array of numbers $x$ is calculated using **numpy** as 

$$ np.sqrt(np.sum((x - np.mean(x))**2)/len(x))$$ 

However, Microsoft Excel has two different versions of the standard deviation calculation, **STDEV.P** and **STDEV.S**. The **STDEV.P** function performs the above calculation but in the **STDEV.S** calculation the division is by $len(x)-1$ rather than $len(x)$. Research these Excel functions, writing a note in a Markdown cell about the difference between them. Then use **numpy** to perform a simulation demonstrating that the **STDEV.S** calculation is a better estimate for the standard deviation of a population when performed on a sample. Note that part of this task is to figure out the terminology in the previous sentence.

#### Research

#### Solution

***

### Task 4: Make predictions

***

#### Instructions

Use **scikit-learn** to apply *k*-means clustering to Fisher’s famous Iris data set. You will easily obtain a copy of the data set online. Explain in a Markdown cell how your code works and how accurate it might be, and then explain how your model could be used to make predictions of species of iris.

#### Research

#### Solution