## Tasks

Machine Learning and Statistics

ATU 

Winter 2023/24

Lecturer: Ian McLoughlin

Author: Jamie Tohall


***

<br/>

### Task 1

> Square roots are difficult to calculate. In Python, you typically use the power operator (a double asterisk) or a package such as 'math'. In this task, you should write a function 'sqrt(x)' to approximate the square root of a floating point number 'x' without using the power operator or a package.
>
> Rather, you should use Newton’s method. Start with an initial guess for the square root called $z_0$. You then repeatedly improve it using the following formula, until the difference between some previous guess $z_i$ and the next $zi+1$ is less than some threshold, say 0.01.

$$ z_{i+1} = z_i − \frac{z_i × z_i − x}{2z_i} $$

In [238]:
# Defining the function
def sqrt(x):
    # Inital guess for the square root
    z = x / 4.0

    # Create a loop for accurancy, also increased range from 100 to 1000 for more accurate output
    for i in range(1000):
        # Newtowns method for a better approximation
        z = z - (((z * z) - x) / (2 * z))

    # z should be a better approximation for the square root
    return z

In [239]:
# Testing the outcome of the function

sqrt(11)

3.3166247903554

In [240]:
# Using Python to compare results

11**0.5

3.3166247903554

***

<br/>

### Task 2

> Consider the below contingency table based on a survey asking respondents whether they prefer coffee or tea and whether they prefer plain or chocolate biscuits. Use 'scipy.stats' to perform a chi-squared test to see whether there is any evidence of an association between drink preference and biscuit preference in this instance.

<br/>


<center>

|         |         | Biscuit   |       |
|---------|---------|-----------|-------|
|         |         |Chocolate  |Plain  |
|**Drink**| Coffee  |   43      |  57   |
|         |  Tea    |   56      |  45   |

<center/>


<br/>

### Importing Modules

In [241]:
# Data frames.
import pandas as pd

# Shuffles.
import random

import numpy as np

# Statistics.
import scipy.stats as ss

from scipy.stats import chi2_contingency

from scipy.stats.contingency import crosstab



<br/>

### Raw Data


In [242]:
# 43 people in total preferred the coffee and chocolate biscuit combination

coffee_chocolate = [['Coffee', 'Chocolate']] * 43

# Output
coffee_chocolate

[['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 

In [243]:
# 56 people preferred tea and chocolate biscuit

tea_chocolate = [['Tea', 'Chocolate']] * 56

# Output
tea_chocolate

[['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'C

In [244]:
# 57 people preferred Coffee with a plain biscuit

coffee_plain = [['Coffee', 'Plain']] * 57

# Output
coffee_plain

[['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Plain'],
 ['Coffee'

In [245]:
# 45 people preferred tea with a plain biscuit

tea_plain = [['Tea', 'Plain']] * 45

# Output
tea_plain

[['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain']]

<br/>

### Merging raw data

In [246]:
raw_data = coffee_chocolate + tea_chocolate + coffee_plain + tea_plain

raw_data

[['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 

In [247]:
# Shuffle the data.
random.shuffle(raw_data)

# Show.
raw_data

[['Tea', 'Plain'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Coffee', 'Plain'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Plain'],
 ['Coffee', 'Chocolate'],
 ['Coffee', 'Plain'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Plain'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Plain'],
 ['Tea', 'Chocolate'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Coffee', 'Plain'],
 ['Tea', 'Plain'],
 ['Coffee', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Coffee', 'Plain'],
 ['Tea', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Tea', 'Plain'],
 ['Tea', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Coffee', 'Plain'],
 ['Coffee', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Chocolate'],
 ['Coffee', 'Chocolate'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Plain'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Plain'],
 ['Coffee', 'Plain'],
 ['Tea', 'Chocolate'],
 ['Tea', 'Plain'],
 ['Coffee', 'Chocolate'],
 ['Tea', 'Ch

In [248]:
# Zip the list - make the rows columns and the columns rows.
# Interchanges the outer and inner lists.
Beverage, Biscuit = list(zip(*raw_data))

# Show.
Beverage, Biscuit

(('Tea',
  'Tea',
  'Tea',
  'Coffee',
  'Tea',
  'Tea',
  'Coffee',
  'Coffee',
  'Tea',
  'Tea',
  'Tea',
  'Tea',
  'Tea',
  'Coffee',
  'Coffee',
  'Tea',
  'Coffee',
  'Tea',
  'Coffee',
  'Tea',
  'Tea',
  'Coffee',
  'Tea',
  'Coffee',
  'Tea',
  'Tea',
  'Coffee',
  'Tea',
  'Coffee',
  'Coffee',
  'Tea',
  'Tea',
  'Tea',
  'Tea',
  'Coffee',
  'Tea',
  'Tea',
  'Tea',
  'Tea',
  'Tea',
  'Tea',
  'Coffee',
  'Tea',
  'Tea',
  'Coffee',
  'Tea',
  'Tea',
  'Coffee',
  'Tea',
  'Coffee',
  'Tea',
  'Coffee',
  'Coffee',
  'Coffee',
  'Coffee',
  'Coffee',
  'Tea',
  'Coffee',
  'Coffee',
  'Tea',
  'Tea',
  'Coffee',
  'Tea',
  'Tea',
  'Coffee',
  'Coffee',
  'Coffee',
  'Coffee',
  'Coffee',
  'Coffee',
  'Tea',
  'Coffee',
  'Coffee',
  'Coffee',
  'Coffee',
  'Coffee',
  'Coffee',
  'Tea',
  'Tea',
  'Coffee',
  'Coffee',
  'Tea',
  'Tea',
  'Coffee',
  'Coffee',
  'Tea',
  'Tea',
  'Coffee',
  'Tea',
  'Coffee',
  'Coffee',
  'Coffee',
  'Tea',
  'Coffee',
  'Coffee',
  'T

In [249]:
# Create a data frame with two lists, 'beverage' and 'biscuit'.
df = pd.DataFrame({'Beverage': Beverage, 'Biscuit': Biscuit})

# Show
df

Unnamed: 0,Beverage,Biscuit
0,Tea,Plain
1,Tea,Chocolate
2,Tea,Chocolate
3,Coffee,Plain
4,Tea,Chocolate
...,...,...
196,Tea,Chocolate
197,Tea,Chocolate
198,Coffee,Chocolate
199,Coffee,Plain


<br/>

### Contingency Table

In [271]:
# Perform Crosstabs Contingency
# store as variable contingencyTable
Table = pd.crosstab(index=df['Beverage'], columns=df['Biscuit'], margins=True)

# Show 
Table

Biscuit,Chocolate,Plain,All
Beverage,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Coffee,43,57,100
Tea,56,45,101
All,99,102,201


In [272]:
chisquare = ss.chi2_contingency(Table, correction=False)

chisquare

(3.113937364324669,
 0.5389425856850661,
 4,
 array([[ 49.25373134,  50.74626866, 100.        ],
        [ 49.74626866,  51.25373134, 101.        ],
        [ 99.        , 102.        , 201.        ]]))

In [268]:
# Perform Crosstabs Contingency.
cross = ss.contingency.crosstab(df['Beverage'], df['Biscuit'])

# Show.
cross

((array(['Coffee', 'Tea'], dtype=object),
  array(['Chocolate', 'Plain'], dtype=object)),
 array([[43, 57],
        [56, 45]]))

In [261]:
# The first variable values, and the second.
first, second = cross

# Show.
first, second

((array(['Coffee', 'Tea'], dtype=object),
  array(['Chocolate', 'Plain'], dtype=object)),
 array([[43, 57],
        [56, 45]]))


<br/>

### Statistical Test

In [253]:

data = pd.DataFrame({'Biscuit': ['Chocolate', 'Plain'],
                     'Coffee': [43, 57],
                     'Tea': [56, 45]})


data.set_index('Biscuit', inplace=True)

In [254]:
results = ss.chi2_contingency(data)

# Show.
print("Chi-squared value:", chi2)
print("P-value:", p)

Chi-squared value: 2.6359100836554257
P-value: 0.10447218120907394



<br/>

### Results

***

<br/>

### Task 3

> Perform a t-test on the famous penguins data set to investigate whether there is evidence of a significant difference in the body mass of male and female gentoo penguins.

Introduction

Background on penguins dataset


In [None]:
# Read in dataset

#Load penguin dataset
df = pd.read_csv('')

#Show 
df


***

<br/>

### Task 4

> Using the famous iris data set, suggest whether the setosa class is easily separable from the other two classes. Provide evidence for your answer.


***

<br/>

### Task 5

> Perform Principal Component Analysis on the iris data set, reducing the number of dimensions to two. Explain the purpose of the analysis and your results. 

***

<br/>

### References

Task 1

1. https://atlantictu-my.sharepoint.com/:v:/r/personal/ian_mcloughlin_atu_ie/Documents/student_shares/machine_learnning_and_statistics/1_general/t01v11_task_one_and_repo.mkv?csf=1&web=1&e=kuIJoM&nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZyIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19

2. https://go.dev/tour/flowcontrol/8

3. https://math.mit.edu/~stevenj/18.335/newton-sqrt.pdf

<br/>

Task 2

1. https://en.wikipedia.org/wiki/Chi-squared_test

2. https://www.jmp.com/en_be/statistics-knowledge-portal/chi-square-test.html

3. https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/8-chi-squared-tests

4. https://statistics.laerd.com/spss-tutorials/chi-square-test-for-association-using-spss-statistics.php

5. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html

6. https://medium.com/swlh/how-to-run-chi-square-test-in-python-4e9f5d10249d


<br/>

Task 3

1. https://www.researchgate.net/publication/361755492_Data_Analysis_Using_Statistical_Methods_Case_Study_of_Categorizing_the_Species_of_Penguin

2.

Task 4

Task 5