# Tasks: Machine Learning and Statistics

Winter 2023/2024

Author: Daria Sep

***

## Task 1

### Description

***

Square roots are difficult to calculate. In Python, you typically use the power operator (a double asterisk) or a package such
as `math`. In this task, you should write a function `sqrt(x)` to approximate the square root of a floating point number `x` without
using the power operator or a package.

Rather, you should use the Newton’s method. Start with an initial guess for the square root called $z_0$. You then repeatedly
improve it using the following formula, until the difference between some previous guess $z_i$ and the next $z_{i+1}$ is less than some threshold, say 0.01

$$ z_{i+1} = z_i - \frac{z_i*z_i - x}{2z_i} $$

### Solution

***

In [1]:
def sqrt(x):
    # Initial guess for the square root
    z = x / 2.0
    # Defining the threshold for convergence
    threshold = 0.01
    
    while True:
        # Calculating the next approximation using Newton's method formula
        z_next = z - (z * z - x) / (2 * z)
        
        # Checking if the difference between the current and next approximation is less than the threshold
        if abs(z_next - z) < threshold:
            break
        
        # Updating the current approximation
        z = z_next
    
    return z

### Tests

***

In [2]:
# Test the function on 101
x = 101.0
result = sqrt(x)
print(f"The square root of {x} is approximately {result}, rounded to 2 decimal points is {result:.2f}")

The square root of 101.0 is approximately 10.049925395190327, rounded to 2 decimal points is 10.05


In [3]:
# Test the function on 13
x = 13
result = sqrt(x)
print(f"The square root of {x} is approximately {result}, rounded to 2 decimal points is {result:.2f}")

The square root of 13 is approximately 3.6058779145461, rounded to 2 decimal points is 3.61


In [4]:
# Test the function on 24
x = 24
result = sqrt(x)
print(f"The square root of {x} is approximately {result}, rounded to 2 decimal points is {result:.2f}")

The square root of 24 is approximately 4.908512720156556, rounded to 2 decimal points is 4.91


### Notes

The provided code defines a Python function called `sqrt(x)` that calculates the square root of a given number `x` using the Newton's method for finding square roots. It iteratively refines an initial guess until it reaches to a value that is within the specified threshold of accuracy. It's a numerical approximation technique commonly used in mathematics and engineering.

1. Function `sqrt(x)` that takes an argument `x` is defined.
2. A variable `z` is defined to half of `x` as the initial guess for the square root.
3. A `threshold` of `0.01` is set to determine when the approximation is considered sufficiently close to the actual square root.
4. A `while` loop is entered until the desired level of accuracy is achieved.
5. The next approximation is calculated using the Newton's method formula. The `z_next` is updated based on the current approximation `z` and the input value `x`.
6. If the difference is below `threshold`, the loop breaks, otherwise, `z` is updated to the value of `z_next` for the next iteration of the loop.
7. The final approximation is returned as the square root of `x`.

### References

***

Agrawal U. (2022). *Find root of a number using Newton’s method.* Available online at <https://www.geeksforgeeks.org/find-root-of-a-number-using-newtons-method/>

Markdown Guide (n.d.). *Basic Syntax - The Markdown elements outlined in the original design document.* Available online at <https://www.markdownguide.org/basic-syntax/#overview>

Strang G., Herman E. (2016). *Newton’s Method. Calculus Volume 1.* Available online at <https://math.libretexts.org/Bookshelves/Calculus/Calculus_(OpenStax)/04%3A_Applications_of_Derivatives/4.09%3A_Newtons_Method>

***

## Task 2

### Description

***

Consider the below contingency table based on a survey asking respondents whether they prefer coffee or tea and whether they
prefer plain or chocolate biscuits. Use `scipy.stats` to perform a chi-squared test to see whether there is any evidence of an association between drink preference and biscuit preference in this instance.

| Drink/Biscuit| Chocolate | Plain |
| :----: | :----: | :----: |
| Coffee | 43 | 57 |
| Tea | 56 | 45 |


### Solution

***

#### Imports

In [5]:
import pandas as pd
import random
import scipy.stats as ss 

#### Data

In [6]:
coffee_chocolate = [['Coffee', 'Chocolate']] * 43
tea_chocolate = [['Tea', 'Chocolate']] * 56
coffee_plain = [['Coffee', 'Plain']] * 57
tea_plain = [['Tea', 'Plain']] * 45

# Merging four lists
raw_data = coffee_chocolate + coffee_plain + tea_chocolate + tea_plain

# Shuffling the data
random.shuffle(raw_data)

# Zipping the list
drink, biscuit = list(zip(*raw_data))

# Creating a dataframe
df = pd.DataFrame({'drink': drink, 'biscuit': biscuit})

df

Unnamed: 0,drink,biscuit
0,Tea,Chocolate
1,Tea,Chocolate
2,Coffee,Plain
3,Tea,Plain
4,Tea,Plain
...,...,...
196,Coffee,Plain
197,Tea,Plain
198,Coffee,Chocolate
199,Coffee,Plain


#### Contingency table

In [13]:
cross = ss.contingency.crosstab(df['drink'], df['biscuit'])
cross

CrosstabResult(elements=(array(['Coffee', 'Tea'], dtype=object), array(['Chocolate', 'Plain'], dtype=object)), count=array([[43, 57],
       [56, 45]]))

In [14]:
cross.count

array([[43, 57],
       [56, 45]])

#### Chi Square Test

In [9]:
chi2, p, dof, expected = ss.chi2_contingency(cross.count, correction=False)
print(f"Chi-Square Statistic: {chi2}")
print(f"P-value: {p}")
print(f"Degrees of Freedom: {dof}")
print("Expected Frequencies Table:")
print(expected)


Chi-Square Statistic: 3.113937364324669
P-value: 0.07762509678333357
Degrees of Freedom: 1
Expected Frequencies Table:
[[49.25373134 50.74626866]
 [49.74626866 51.25373134]]


#### Interpretation

In [10]:
# H0 (null hypothesis) states that there is no relation between the variables. 

alpha = 0.05
print("P-value is " + str(p))


if p <= alpha:
    print('Dependent (reject H0) - There is a significant association between "drink" and "biscuit".')
else:
    print('Independent (H0 holds true) - There is no significant association between "drink" and "biscuit".')

P-value is 0.07762509678333357
Independent (H0 holds true) - There is no significant association between "drink" and "biscuit".


### Notes

***

1. Four lists (`coffee_chocolate`, `tea_chocolate`, `coffee_plain`, and `tea_plain`) are created, each containing pairs of "drink" and "biscuit" values, indicating frequencies.
2. These four lists are merged into a single list called `raw_data`.
3. The order of elements in the `raw_data` list is shuffled using `random.shuffle(raw_data)`, effectively randomizing the dataset.
4. The `zip(*raw_data)` operation splits the pairs of "drink" and "biscuit" into two separate lists.
5. A pandas DataFrame `df` is created using these two lists, where "drink" and "biscuit" become columns in the DataFrame.
6. A contingency table `cross` is created using `ss.contingency.crosstab()` to count the occurrences of different combinations of "drink" and "biscuit" in the DataFrame.
7. Chi-Square test is performed on the contingency table and the test statistics, p-value, degrees of freedom, and expected frequencies are printed.
8. The significance level (`alpha`) is set to 0.05. 
9. The `p-value` is compared with `alpha` to determine whether there's a significant association between "drink" and "biscuit".
10. The interpretation of the Chi-Square test results is printed, indicating the independence between "drink" and "biscuit."

### References

***
Brownlee J. (2019). *A Gentle Introduction to the Chi-Squared Test for Machine Learning.* Available online at <https://machinelearningmastery.com/chi-squared-test-for-machine-learning/>

GeeksForGeeks (n.d.). *Python – Pearson’s Chi-Square Test.* Available online at <https://www.geeksforgeeks.org/python-pearsons-chi-square-test/>

Markdown Guide (n.d.). *Extended Syntax - Advanced features that build on the basic Markdown syntax.* Available online at <https://www.markdownguide.org/extended-syntax/#markdown-processors>

Mulani S. (2021). *Chi-square test in Python — All you need to know!!* Available online at <https://www.askpython.com/python/examples/chi-square-test>

***

***

## End