# Tasks

Machine Learning and Statistics

Winter 2023/2024

Author: Sofiia Meteliuk


## Task 1

*Square roots are difficult to calculate. In Python, you typically
use the power operator (a double asterisk) or a package such
as `math`.*

*In this task, you should write a function sqrt(x) to
approximate the square root of a floating point number x without
using the power operator or a package. Rather, you should use the Newton’s method.2 Start with an 
initial guess for the square root called z0. You then repeatedly
improve it using the following formula, until the difference between some previous guess $z_i$ and the next $z_{i+1}$ is less than some threshold, say 0.01.*

$$ z_{i+1} = z_i − \frac {z_i × z_i − x}{2z_i}


In [1]:
def sqrt(x, threshold=0.01, initial_guess=None):
    # If no initial guess is provided, start with x/2
    z0 = initial_guess if initial_guess is not None else x / 2.0
    
    while True:
        # Newton's method formula
        z1 = z0 - (z0 * z0 - x) / (2 * z0)
        
        # Check if the difference between consecutive guesses is less than the threshold
        if abs(z1 - z0) < threshold:
            return z1
        
        # Update the guess for the next iteration
        z0 = z1



In [2]:
# Example usage
x = 25.0
result = sqrt(x)
print(f"The square root of {x} is approximately {result}")


The square root of 25.0 is approximately 5.000000000016778


In [3]:
# Example usage
x = 16
result = sqrt(x)
print(f"The square root of {x} is approximately {result}")


The square root of 16 is approximately 4.0000001858445895


I think that for scientific purpose it`s better to leave answer like this 4.0000001858445895 but for example if we programmed calculator this answer would not satisfy user.
In this version, I added an optional parameter *decimals* to the sqrt function, which specifies the number of decimal places to round the result. The round function is then used to round the result to the specified number of decimals.

In [4]:
def sqrt2(x, threshold=0.01, initial_guess=None, decimals=2):
    # If no initial guess is provided, start with x/2
    z0 = initial_guess if initial_guess is not None else x / 2.0
    
    while True:
        # Newton's method formula
        z1 = z0 - (z0 * z0 - x) / (2 * z0)
        
        # Check if the difference between consecutive guesses is less than the threshold
        if abs(z1 - z0) < threshold:
            # Round the result to the specified number of decimals
            return round(z1, decimals)
        
        # Update the guess for the next iteration
        z0 = z1


In [5]:
# Example usage
x = 25.0
result = sqrt2(x)
print(f"The square root of {x} is approximately {result}")

The square root of 25.0 is approximately 5.0


_____

## TASK 2


Consider the below contingency table based on a survey asking
respondents whether they prefer coffee or tea and whether they
prefer plain or chocolate biscuits. Use scipy.stats to perform
a chi-squared test to see whether there is any evidence of an association between drink preference and biscuit preference in this
instance

|                | Chocolate | Plain |
| -------------- | --------- | ----- |
| **Coffee**     |    43     |   57  |
| **Tea**        |    56     |   45  |


In [6]:
import numpy as np
from scipy.stats import chi2_contingency

# Contingency table
observed_data = np.array([[43, 57],  # Coffee
                          [56, 45]])  # Tea

# Perform chi-squared test
chi2_stat, p_value, _, _ = chi2_contingency(observed_data)

# Output the results
print(f"Chi-squared Statistic: {chi2_stat}")
print(f"P-value: {p_value}")

# Check for significance
alpha = 0.05
if p_value < alpha:
    print("There is evidence of an association between drink preference and biscuit preference.")
else:
    print("There is no significant evidence of an association between drink preference and biscuit preference.")


Chi-squared Statistic: 2.6359100836554257
P-value: 0.10447218120907394
There is no significant evidence of an association between drink preference and biscuit preference.


________________________________________

## TASK 3

Perform a t-test on the famous penguins data set to investigate whether there is evidence of a significant difference in the body
mass of male and female gentoo penguins.

Data repository for seaborn examples. https://github.com/mwaskom/seaborn-data/blob/master/penguins.csv


In [27]:
# Importing necessary libraries
import pandas as pd
from scipy.stats import ttest_ind

# Load the penguins dataset
penguins_df = pd.read_csv('penguins.csv')

# Filter data for Gentoo penguins
gentoo_penguins = penguins_df[penguins_df['species'] == 'Gentoo']

# Separate data for male and female Gentoo penguins
male_body_mass = gentoo_penguins[gentoo_penguins['sex'] == 'MALE']['body_mass_g']
female_body_mass = gentoo_penguins[gentoo_penguins['sex'] == 'FEMALE']['body_mass_g']

# Perform independent two-sample t-test
t_statistic, p_value = ttest_ind(male_body_mass, female_body_mass)

# Display t-statistic and p-value
print(f'T-Statistic: {t_statistic}')
print(f'P-Value: {p_value}')

# Check for statistical significance (common significance level is 0.05)
if p_value < 0.05:
    print('There is evidence of a significant difference in body mass between male and female Gentoo penguins.')
else:
    print('There is no significant difference in body mass between male and female Gentoo penguins.')




T-Statistic: 14.721676481405709
P-Value: 2.133687602018886e-28
There is evidence of a significant difference in body mass between male and female Gentoo penguins.


-  **Understanding the Two-Sample T-Test:**

  The two-sample t-test is a statistical method used to assess if there is a significant difference between the means of two independent groups. It is particularly useful when comparing means of numerical data from two distinct groups.

  -  **Why Use the T-Test:**
    - The t-test helps determine if observed differences between groups are statistically significant or could have occurred by chance.
    - Commonly applied in scientific research and data analysis to assess group-level differences.

  -  **Penguins Dataset T-Test Outcome:**
    - In the penguins dataset, a two-sample t-test was conducted on the body mass of male and female Gentoo penguins.
    - The resulting p-value was compared to a significance level (e.g., 0.05).
    - If p < 0.05, it suggests a significant difference. The outcome indicates whether there is evidence of a significant difference in body mass between male and female Gentoo penguins.
