# Tasks

Machine Learning

Winter 2023/24

by James Connolly (G00232918)

***

## Task 1

> Square roots are difficult to calculate. In Python, you typically use the power operator (a double asterisk) or a package such
as math. In this task,1 you should write a function sqrt(x) to 1 approximate the square root of a floating point number x without
using the power operator or a package.

> Rather, you should use the initial guess for the square root called $z_0$. You then repeatedly improve it using the following formula, until the difference between some previous guess $z_i$ and the next $z{i+1}$ is less than some threshold, say 0.01.

$$ z_{i+1} = z_i - \frac{z_i x z_i - x}{2z_i} $$


In [4]:
def sqrt(x):
    # Initial guess for the square root
    z = 4 /4.0

    # Loop until we are accurate enough
    # while (z could be improved):
    for i in range(100):
        # Newton's method for a better approximation
        z = z - (((z*z)-x)/(2*z))

    # z should now be a good approximation for the square root
    return z

In [5]:
# test the function on 3.
sqrt(3)

1.7320508075688774

In [6]:
# Check Python's value for square root of 3.
3**0.5

1.7320508075688772

In [21]:
### Alertnative answer

def sqrt1(x):

    # Starting point is 'x / 2.0', which is a reasonable estimate based on reference.
    z = x / 2.0
    
    
    # Set a threshold for stop criteria
    # when the difference between consecutive approximations is less than or 
    # equal to this threshold.
    threshold = 0.01

    while True:
        # Use Newton's method to compute a better approximation of the square root.
        z_next = z - ((z * z - x) / (2 * z))
        
        # Check if the absolute difference between the current and next approximation is within the threshold.
        # If it is, we consider 'z_next' to be a good approximation and return it.
        if abs(z_next - z) <= threshold:
            return z_next  
        
        # Update 'z' with the new approximation for the next iteration.
        z = z_next
        

# Test the sqrt1 function on the number 3.
result = sqrt1(3)
print(result)  

# Check with Python Square root operator
python_sqrt = 3**0.5
print(python_sqrt)


1.7320508100147276
1.7320508075688772


Reference - https://www.rookieslab.com/posts/finding-square-root-using-guess-and-check-algorithm-in-python

### Notes

***

1. The calculation $z^2 - x$ is exactly when $z$ is the square root of $x$. It is greater than zero when $z$ is too big. It is less than zero when $z$ is too small. Thus $(z^2 - x)^2$ is a good candidate for a cost function. 
2. The derivate of the numerator $z^2 - x$ with respect to $z$ is $2z$. This is the denominator of the fraction in the formula from the question. 

***
## Task 1 End

## Task 2

> Consider the below contingency table based on a survey asking respondents whether they prefer coffee or tea and whether they
prefer plain or chocolate biscuits. Use scipy.stats to perform a chi-squared test to see whether there is any evidence of an association between drink preference and biscuit preference in this instance.




In [12]:
# Importing the necessary libraries
import pandas as pd
import numpy as np
import scipy.stats as ss

# Defining the variables as Categorical
drink = pd.Categorical(['Coffee', 'Tea'], categories=['Coffee', 'Tea']) 
biscuit = pd.Categorical(['Chocolate', 'Plain'], categories=['Chocolate', 'Plain'])  

# converting the data to a NumPy array
data = np.array([[43, 57], [56, 45]])
# Creating the cross tabulation
cross_tab = pd.DataFrame(data, index=drink, columns=biscuit)  

# Performing the chi-squared test
chi2, p, dof, expected = ss.chi2_contingency(cross_tab)

# Output the results
# Chi-squared statistic
print("Chi-squared statistic:", chi2)
# p-value
print("P-value:", p)
# degrees of freedom
print("Degrees of freedom:", dof)
# expected frequency table
print("Expected frequencies table:")
print(expected)


Chi-squared statistic: 2.6359100836554257
P-value: 0.10447218120907394
Degrees of freedom: 1
Expected frequencies table:
[[49.25373134 50.74626866]
 [49.74626866 51.25373134]]


Based on the provided results, it seems that there is no significant association categorical variables being analysed, as the p-value is greater than the typical significance level of 0.05.

References
* How to set up the dataframe for cross tab - https://pandas.pydata.org/docs/reference/api/pandas.crosstab.html
* Understanding pd.categorical - https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.Categorical.html
* chi square example - https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html
* Analsying the results of p-value - https://study.com/skill/learn/how-to-interpret-the-p-value-for-the-chi-square-test-for-goodness-of-fit-explanation.html

***
## Task 2 End