Machine Learning and Statistics

Winter 2023

by Ioan Domsa

***

## Task 1

***

> Square roots are difficult to calculate. In Python, you typically use the power operator (a double asterisk) or a package such
as `math`. In this task you should write a function `sqrt(x)` to approximate the square root of a floating point number x without
using the power operator or a package.

>Rather, you should use the Newton’s method. Start with an initial guess for the square root called $z_0$. You then repeatedly
improve it using the following formula, until the difference between some previous guess $z_{i}$ and the next $z_{i+1}$
is less than some threshold, say 0.01.

$$ z_{i+1} = z_i − \frac {z_i × z_i − x}{2z_i} $$


In [1]:
def sqrt(x):
    # Initial guess for the square root.
    z1 = x / 2.0
    # set a threshold of approximation
    t = 0.0000001
    # counter
    # c = 0

# Loop until we are accurate enough
    while True:
        # Newtons method
        z2 = z1 - ((z1*z1)-x)/(2*z1)
        # c = c + 1 
        # check the threshold
        if abs(z2-z1) <= t:
            break
        z1 = z2
    # return z2, c
    return z2

x = 3
# result, count = sqrt(x)
result = sqrt(x)
print(result)
# print(count)

1.7320508075688772


In [2]:
# Test function
sqrt(3)

1.7320508075688772

In [3]:
# Check Python's value for square root of 3
3**0.5

1.7320508075688772

### Notes

***

1. The calculation $ z^2 - x $ is exactly zero when $z$ is the sqare root of $x$. It is greater than zero when $z$ is too big. It is less than zero when $z$ is too small. Thus $(z^2 -x)^2$ is a good candidate for a cost function.

2. The derivative of the numerator $z^2 - x$ with respect to $z$ is $2z$. That is the denominator of the fraction in the formula from the question

***

## Task 2

***

> Consider the below contingency table based on a survey asking respondents whether they prefer coffee or tea and whether they prefer plain or chocolate biscuits.

>Use scipy.stats to perform a chi-squared test to see whether there is any evidence of an association between drink preference and biscuit preference in this instance.

|           	|            	|  **Biscuit** 	|              	|  
|:---------:	|:----------:	|:------------:	|:---------:	|  
|           	|            	| Chocolate     | Plain      	|  
| **Drink** 	| Coffee 	    |      43      	|     57    	|  
|           	| Tea  	        |      56      	|     45    	|  

In [4]:
# Data frames.
import pandas as pd

# Shuffles.
import random

# Statistics.
import scipy.stats as ss

In [5]:
# 43 coffee drinkers who preferred chocolate biscuit
coffee_choc = [["coffee", "chocolate"]] * 43

# Show
coffee_choc

[['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 

In [6]:
# 56 tea drinkers who preferred chocolate biscuit
tea_choc = [["tea", "chocolate"]] * 56

# Show
tea_choc

[['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'c

In [7]:
# 57 coffee drinkers who preferred plain biscuit
coffee_plain = [["coffee", "plain"]] * 57

# Show
coffee_plain

[['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['coffee'

In [8]:
# 45 tea drinkers who preferred plain biscuit
tea_plain = [["tea", "plain"]] * 45

# Show
tea_plain

[['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain']]

In [9]:
# Raw data, merge the four lists

raw_data = coffee_choc + coffee_plain + tea_choc + tea_plain

# show raw data
raw_data

[['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 

In [10]:
# Shuffle the data

random.shuffle(raw_data)

# show raw_data
raw_data

[['coffee', 'chocolate'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'chocolate'],
 ['coffee', 'plain'],
 ['tea', 'plain'],
 ['coffee', 'chocolate'],
 ['tea', 'chocolate'],
 ['coffee', 'plain'],
 ['tea', 'plain'],
 ['coffee', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['tea', 'chocolate'],
 ['coffee', 'plain'],
 ['tea', 'chocolate'],
 ['coffee', 'plain'],
 ['tea', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'plain'],
 ['tea', 'plain'],
 ['tea', 'plain'],
 ['coffee', 'chocolate'],
 ['tea', 'plain'],
 ['tea', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['tea', 'plain'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['coffee', 'chocolate'],
 ['coffee', 'chocolate'],
 ['tea', 'plain'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['tea', 'plain'],
 ['tea', 'chocolate'],
 ['tea', 'chocolate'],
 ['tea', 'plain'],
 ['tea', 'chocolate'],
 ['coffee', 'plain'],
 ['coffee', 'plain'],
 ['tea', 'plain'],
 ['tea', 'chocolate'],
 ['coffee',

In [11]:
# Zip the list - make the rows columns and the columns rows
# Interchange the outer and inner lists
drink, biscuit = list(zip(*raw_data))

# Show drink, biscuit
drink, biscuit

(('coffee',
  'tea',
  'tea',
  'tea',
  'coffee',
  'tea',
  'coffee',
  'tea',
  'coffee',
  'tea',
  'coffee',
  'tea',
  'tea',
  'tea',
  'tea',
  'coffee',
  'tea',
  'coffee',
  'tea',
  'coffee',
  'coffee',
  'tea',
  'tea',
  'coffee',
  'tea',
  'tea',
  'coffee',
  'coffee',
  'tea',
  'tea',
  'tea',
  'coffee',
  'coffee',
  'tea',
  'coffee',
  'coffee',
  'tea',
  'tea',
  'tea',
  'tea',
  'tea',
  'coffee',
  'coffee',
  'tea',
  'tea',
  'coffee',
  'tea',
  'coffee',
  'coffee',
  'tea',
  'coffee',
  'coffee',
  'tea',
  'coffee',
  'coffee',
  'tea',
  'tea',
  'coffee',
  'tea',
  'coffee',
  'coffee',
  'coffee',
  'coffee',
  'tea',
  'coffee',
  'coffee',
  'coffee',
  'tea',
  'coffee',
  'coffee',
  'tea',
  'coffee',
  'coffee',
  'tea',
  'tea',
  'coffee',
  'tea',
  'coffee',
  'coffee',
  'tea',
  'coffee',
  'tea',
  'coffee',
  'coffee',
  'tea',
  'tea',
  'coffee',
  'tea',
  'coffee',
  'coffee',
  'coffee',
  'tea',
  'tea',
  'tea',
  'coffee',
 

In [12]:
# create a data frame
df = pd.DataFrame({"drink": drink, "biscuit": biscuit})

# show
df

Unnamed: 0,drink,biscuit
0,coffee,chocolate
1,tea,plain
2,tea,plain
3,tea,chocolate
4,coffee,plain
...,...,...
196,tea,chocolate
197,coffee,plain
198,coffee,plain
199,coffee,plain


In [13]:
# perform cross tab contingency
cross = ss.contingency.crosstab(df["drink"], df["biscuit"])

# show
cross

((array(['coffee', 'tea'], dtype=object),
  array(['chocolate', 'plain'], dtype=object)),
 array([[43, 57],
        [56, 45]]))

In [14]:
# The counts.
cross.count

<function tuple.count(value, /)>

In [15]:
# The first variable and the second

first, second = cross.elements

#show
first, second

AttributeError: 'tuple' object has no attribute 'elements'

In [None]:
# Do the statistics.
result = ss.chi2_contingency(cross.count, correction = False)

# Show.
result

TypeError: '<' not supported between instances of 'builtin_function_or_method' and 'int'

In [None]:
# The expected frequesncies if independent
result.expected_freq

AttributeError: 'float' object has no attribute 'expected_freq'

In [None]:
# Preferd chocolate biscuits irespective of drink.
99 / 201

0.4925373134328358

In [None]:
# If no relationship between drink and biscuit, 
# then we should have same proportion of coffee drinkers
# liking chocolate biscuits as we have overall
100 * (99 / 201)

49.25373134328358

In [None]:
# If no relationship between drink and biscuit, 
# then we should have same proportion of peopple 
# liking plain biscuit who are tea drinkers as we have overall
102 * (101 / 201)

51.25373134328359

***
## End
