<p align="center">
  <img src="https://www.edvancer.in/wp-content/uploads/2016/01/ML-vs.-stats1.png" 
</p>

## <div align="center">Machine Learning and Statistics: Tasks</div>
### <div align="center">Author: Sean Elliott</div>

----

In [1]:
# Data frames.
import pandas as pd

# Statistics.
import scipy.stats as ss

#for shuffling the data.
import random


## Task 1 
Square roots are difficult to calculate. In Python, you typically use the power operator (a double asterisk) or a package such
as 'math'. In this task,1 you should write a function 'sqrt(x)' to approximate the square root of a floating point number 'x' without using the power operator or a package.
Rather, you should use 'Newton’s method'. Start with an initial guess for the square root called $z_0$. You then repeatedly improve it using the following formula, until the difference between some previous guess $z_i$ and the next $z_{i+1}$ is less than some threshold, say 0.01.

$$z_{i+1} = z_i - \frac{z_i * z_i - x}{2z_i} $$

'*' denotes multiplication


In [2]:
# First attempt at writing code for square root 
def sqrt(x):
  # First guess for square root.
  z = x / 4.0
  # create a loop that will run for a designated set number of times.
  for i in range (1000):
    z = z - (((z * z) - x) / (2 * z))
# return z which should be a good approximation fo the square root.
  return z

In [3]:
# test function created above.
sqrt(15)

3.8729833462074166

In [4]:
# test built in python function.
15**0.5

3.872983346207417

## References: 

https://medium.com/@shouke.wei/how-to-embed-an-image-size-and-align-it-in-the-jupyter-notebook-542a2e4e2c98 Date Accessed: 26/09/2023 19:42
https://saturncloud.io/blog/how-to-position-embedded-images-in-jupyter-notebooks-using-markdown/ Date Accessed: 26/09/2023 19:47


***

## Task 2 

Consider the below contingency table based on a survey asking respondents whether they prefer coffee or tea and whether they prefer plain or chocolate biscuits. 
Use scipy.stats to perform a chi-squared test to see whether there is any evidence of an association between drink preference and biscuit preference in this instance.


In [5]:
# Create the data represented in the tabnle so that it can be fed into the program.

coffee_choc = [['Coffee','Chocolate']] * 43
coffee_plain = [['Coffee','Plain']] * 57
tea_choc = [['Tea','Chocolate']] * 56
tea_plain = [['Tea','Plain']] * 45

#store the 4 value sets above in 1 variable 'data'.
data = coffee_choc + coffee_plain + tea_choc + tea_plain

In [6]:
#shuffle the way the data appears in the dataset, but doesnt alter the results - ensures that the data doesnt look contrived.
random.shuffle(data)

In [7]:
drink, biscuit = list(zip(*data))

drink, biscuit

(('Coffee',
  'Tea',
  'Coffee',
  'Coffee',
  'Tea',
  'Coffee',
  'Tea',
  'Coffee',
  'Tea',
  'Coffee',
  'Tea',
  'Coffee',
  'Tea',
  'Coffee',
  'Coffee',
  'Tea',
  'Coffee',
  'Coffee',
  'Tea',
  'Coffee',
  'Tea',
  'Coffee',
  'Coffee',
  'Tea',
  'Coffee',
  'Tea',
  'Tea',
  'Coffee',
  'Tea',
  'Tea',
  'Tea',
  'Tea',
  'Tea',
  'Coffee',
  'Tea',
  'Coffee',
  'Coffee',
  'Tea',
  'Coffee',
  'Tea',
  'Tea',
  'Coffee',
  'Tea',
  'Tea',
  'Tea',
  'Coffee',
  'Coffee',
  'Tea',
  'Coffee',
  'Tea',
  'Coffee',
  'Coffee',
  'Coffee',
  'Coffee',
  'Tea',
  'Coffee',
  'Tea',
  'Tea',
  'Coffee',
  'Coffee',
  'Coffee',
  'Coffee',
  'Coffee',
  'Tea',
  'Tea',
  'Tea',
  'Tea',
  'Tea',
  'Coffee',
  'Tea',
  'Coffee',
  'Coffee',
  'Tea',
  'Coffee',
  'Tea',
  'Coffee',
  'Tea',
  'Tea',
  'Tea',
  'Tea',
  'Coffee',
  'Tea',
  'Tea',
  'Tea',
  'Coffee',
  'Coffee',
  'Coffee',
  'Coffee',
  'Tea',
  'Tea',
  'Tea',
  'Tea',
  'Tea',
  'Coffee',
  'Coffee',
  'Coff

In [8]:
# create dataframe 
df = pd.DataFrame({'drink': drink, 'biscuit': biscuit})

#print out datafarme to ensure running as expected.
df


Unnamed: 0,drink,biscuit
0,Coffee,Plain
1,Tea,Plain
2,Coffee,Plain
3,Coffee,Chocolate
4,Tea,Plain
...,...,...
196,Tea,Chocolate
197,Coffee,Chocolate
198,Tea,Chocolate
199,Coffee,Plain


In [9]:
cross = ss.contingency.crosstab(df['drink'], df['biscuit'])

# Show.
cross

CrosstabResult(elements=(array(['Coffee', 'Tea'], dtype=object), array(['Chocolate', 'Plain'], dtype=object)), count=array([[43, 57],
       [56, 45]]))

In [10]:
# organise data within dataset for easy manipulation
first, second = cross.elements

# Show arrys 
first, second

(array(['Coffee', 'Tea'], dtype=object),
 array(['Chocolate', 'Plain'], dtype=object))

In [11]:
cross.count 

array([[43, 57],
       [56, 45]])

In [12]:
result = ss.chi2_contingency(cross.count, correction=False)

# Show.
result

Chi2ContingencyResult(statistic=3.113937364324669, pvalue=0.07762509678333357, dof=1, expected_freq=array([[49.25373134, 50.74626866],
       [49.74626866, 51.25373134]]))

In [13]:
# The expected fequencies if independent.
result.expected_freq

array([[49.25373134, 50.74626866],
       [49.74626866, 51.25373134]])

In [14]:
cross.count - result.expected_freq

array([[-6.25373134,  6.25373134],
       [ 6.25373134, -6.25373134]])

In [15]:
(cross.count - result.expected_freq)**2

array([[39.10915571, 39.10915571],
       [39.10915571, 39.10915571]])

In [16]:
(cross.count - result.expected_freq)**2 / result.expected_freq

array([[0.79403437, 0.77068042],
       [0.78617265, 0.76304992]])

In [17]:
((cross.count - result.expected_freq)**2 / result.expected_freq).sum()

3.113937364324669