# Task 1
***
#### Write a function sqrt(x) to approximate the square root of a floating point number x without using the power operator or a package

In this code I have used recursion - where a function calls itself in its own body until the set tolerance is reached. A guess is made of the square root which is checked for accuracy against the actual square root by squaring it and making the difference, or tolerance, a small number, in this case 0.0001. If the guess isn’t accurate enough a new guess is created and so the programme keeps performing this calculation recursively until the guess comes within the set tolerance and the recursion and function ends.

I understand that there are schools of thought that recursion is a poor approach in Python - perceived as slower than an iterative solution and because Python's recursion depth is limited to 1000 there is potential for stack overflow (although the recursion depth can be reset to a higher number). Having researched opinions on the use of recursion I felt that it was appropriate to use it for this particular task.

In [5]:
# read in the variables needed to perform the calculation
num = int(input("Enter a number: "))
given = num/2
tol = 0.0001 # set the tolerance

# this function will loop the iteration until the tolerance is reached
def sqRoot(x):
    if((x * x > num - tol) and (x * x <= num + tol)):
        return x
    x = (x + num/x)/2
    return sqRoot(x)

# call the function
root = sqRoot(given)

# rounding the output so that it is more approximate!
roundRoot = round(root, 2)

print("The square root of {} is approximately {}".format(num,roundRoot))

Enter a number:  64


The square root of 64 is approximately 8.0


## REFERENCES:
* https://hackernoon.com/calculating-the-square-root-of-a-number-using-the-newton-raphson-method-a-how-to-guide-yr4e32zo
* https://data-flair.training/blogs/python-function/
* https://stackoverflow.com/questions/16005123/how-can-i-make-a-recursive-square-root-in-python#:~:text=The%20basic%20strategy%20for%20a,the%20true%20root%20to%20return.
* https://stackoverflow.com/questions/48823833/simple-program-to-find-squre-root-using-recursion/48823931
* https://beapython.dev/2020/05/14/is-recursion-bad-in-python/#:~:text=Recursion%20can%20be%20considered%20bad,calls%20on%20the%20call%20stack.
* https://stackoverflow.com/questions/4278327/danger-of-recursive-functions  

# Task 2
***
#### Consider the below contingency table based on a survey asking respondents whether they prefer coffee or tea and whether they prefer plain or chocolate biscuits. Use scipy.stats to perform a chi-squared test to see whether there is any evidence of an association between drink preference and biscuit preference in this instance.


           | Chocolate  |   Plain
-------------------------------------
Coffee     |     43     |     57
Tea        |     56     |     45


In [None]:
from scipy.stats import chi2_contingency

# Define the contingency table with labels
contingency_table = [
    ['Drink', 'Biscuit', 'Count'],  # Column labels
    ['Coffee', 'Chocolate', 43],    # Row 1
    ['Coffee', 'Plain', 57],        # Row 2
    ['Tea', 'Chocolate', 56],       # Row 3
    ['Tea', 'Plain', 45],           # Row 4
]

# Extracting the actual data (excluding column labels)
data = [row[2] for row in contingency_table[1:]]

# Perform chi-squared test
chi2, p, _, _ = chi2_contingency([data[:2], data[2:]])

# Print results
print(f"Chi-squared statistic: {chi2}")
print(f"P-value: {p}")

# Interpret the results
alpha = 0.05
print("\nSignificance Test:")
if p <= alpha:
    print("There is evidence of an association between drink preference and biscuit preference.")
else:
    print("There is no evidence of an association between drink preference and biscuit preference.")


### Method

First I created a contingency table using a list of lists. Column labels are ['Drink', 'Biscuit', 'Count'] and the rows represent different combinations of drink and biscuit preferences, with corresponding counts.

I then extracted the actual count values from the rows of the contingency table and excluded the column labels from the data.

I used `chi2_contingency` from `scipy.stats` to perform a chi-squared test on the data.  The function expects a 2D array or contingency table, so I split the data into two lists representing the counts for 'Chocolate' and 'Plain' biscuits.

Having obtained the chi-squared statistic and p-value from the test, the findings are printed using print statements.  

The chi-squared statistic is used to quantify the difference between the observed and expected frequencies in a contingency table. In the context of the chi-squared test of independence, the statistic is used to assess whether there is a significant association between the categorical variables represented in the table.

If the p-value is less than or equal to the signifiance level (`alpha`), it suggests evidence to reject the null hypothesis, indicating an association.  Otherwise, there is no evidence of an association.  For my purposes I have used the standard significance level of 0.05.  

### Finding
The P value here is 0.10447218120907394, well above the significance level of 0.05 indicating that there is no correlation between a persons preferred drink (tea or coffee) and the type of biscuit they prefer (chocolate/plain). 


## REFERENCES:
* https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html
* https://www.geeksforgeeks.org/python-pearsons-chi-square-test/
* https://analyticsindiamag.com/a-beginners-guide-to-chi-square-test-in-python-from-scratch/
* https://stats.stackexchange.com/questions/104468/understanding-the-chi-squared-test-and-the-chi-squared-distribution

# Task 3
***
#### Perform a t-test on the famous penguins data set to investigate whether there is evidence of a significant difference in the body mass of male and female gentoo penguins

In [1]:
import pandas as pd
from scipy.stats import ttest_ind

# Load the penguins dataset
penguins = pd.read_csv('penguins.csv')

# Filter data for only Gentoo penguins
gentoo_penguins = penguins[penguins['species'] == 'Gentoo']

# Separate data for male and female Gentoo penguins
male_data = gentoo_penguins[gentoo_penguins['sex'] == 'Male']['body_mass_g']
female_data = gentoo_penguins[gentoo_penguins['sex'] == 'Female']['body_mass_g']

# Perform t-test
t_stat, p_value = ttest_ind(male_data, female_data)

# Print the results
print(f'T-statistic: {t_stat}')
print(f'P-value: {p_value}')

# Check for statistical significance
alpha = 0.05
if p_value < alpha:
    print('There is evidence of a significant difference in body mass between male and female Gentoo penguins.')
else:
    print("There is no significant difference in body mass between male and female Gentoo penguins.")


T-statistic: nan
P-value: nan
There is no significant difference in body mass between male and female Gentoo penguins.
