In [None]:
# Q 1 Answer:
"""
To calculate the Pearson correlation coefficient between the time spent studying and the final exam scores, 
we would first need to have paired observations for each student. Once we have these paired observations, we can use the following formula to 
calculate the Pearson correlation coefficient:

r = (nΣxy - ΣxΣy) / sqrt[(nΣx^2 - (Σx)^2)(nΣy^2 - (Σy)^2)]

where:

n is the number of paired observations
x and y are the two variables (time spent studying and final exam scores)
Σxy is the sum of the products of each paired observation (x_i * y_i)
Σx and Σy are the sums of x and y, respectively
Σx^2 and Σy^2 are the sums of the squares of x and y, respectively
Once we have calculated the Pearson correlation coefficient, we can interpret it as follows:

If r is positive, it indicates a positive linear relationship between the two variables, meaning that as one variable increases,
so does the other variable.
If r is negative, it indicates a negative linear relationship between the two variables, meaning that as one variable increases,
the other variable decreases.
If r is close to 0, it indicates that there is no linear relationship between the two variables.
The strength of the relationship is indicated by the absolute value of r, with values closer to 1 indicating a stronger linear relationship.
For example, if the calculated Pearson correlation coefficient is r = 0.8, we would interpret this as a strong positive linear relationship
between the amount of time spent studying and the final exam scores. This means that students who spend more time studying tend to earn higher 
scores on the final exam, and vice versa. If the calculated Pearson correlation coefficient is r = -0.2, we would interpret this as a weak negative 
linear relationship, indicating that there is some tendency for students who spend less time studying to earn slightly higher scores on the final exam,
but the relationship is not very strong.



"""

In [None]:
# Q 2 Answer:

"""
To calculate Spearman's rank correlation between the amount of sleep individuals get each night and their overall job satisfaction level, 
we first need to rank the observations for each variable. We would assign a rank of 1 to the observation with the lowest value, a rank of 2 to 
the next lowest value, and so on. Ties are assigned the average of the ranks they would have received.

Once we have the ranked data, we can use the following formula to calculate Spearman's rank correlation coefficient:

r_s = 1 - (6Σd^2 / n(n^2 - 1))

where:

n is the number of paired observations
d is the difference between the ranks of the paired observations
The resulting coefficient, r_s, ranges from -1 to 1, where:

r_s = 1 indicates a perfect monotonic relationship where as one variable increases, the other variable increases in a strictly monotonic way.
r_s = -1 indicates a perfect monotonic relationship where as one variable increases, the other variable decreases in a strictly monotonic way.
r_s = 0 indicates no monotonic relationship between the two variables.
Once we have calculated the Spearman's rank correlation coefficient, we can interpret it as follows:

If r_s is positive, it indicates a monotonic relationship between the two variables, meaning that as one variable increases, so does the other
variable in a non-decreasing way.
If r_s is negative, it indicates a monotonic relationship between the two variables, meaning that as one variable increases, the other variable
decreases in a non-increasing way.
If r_s is close to 0, it indicates that there is no monotonic relationship between the two variables.
The strength of the relationship is indicated by the absolute value of r_s, with values closer to 1 indicating a stronger monotonic relationship.
For example, if the calculated Spearman's rank correlation coefficient is r_s = 0.75, we would interpret this as a strong positive monotonic
relationship between the amount of sleep individuals get each night and their overall job satisfaction level. This means that individuals who get
more sleep tend to report higher levels of job satisfaction, and vice versa. If the calculated Spearman's rank correlation coefficient is r_s = -0.25,
we would interpret this as a weak negative monotonic relationship, indicating that there is some tendency for individuals who get less sleep to report
slightly higher levels of job satisfaction, but the relationship is not very strong.
"""

In [3]:
# Q 3 Answer:
import pandas as pd
import scipy.stats as stats

# create a sample dataset
df = pd.DataFrame({
    'hours_of_exercise': [3, 5, 2, 7, 4, 1, 6, 2, 5, 3, 4, 6, 1, 2, 5, 3, 7, 4, 6, 2, 5, 3, 4, 6, 2, 5, 3, 4, 6, 2, 5, 3, 4, 6, 2, 5, 3, 4, 6, 
                          2, 5, 3, 4, 6, 2, 5, 3, 4, 6,3],
    'bmi': [21.3, 26.5, 22.7, 28.1, 24.9, 18.7, 27.5, 22.6, 25.8, 21.2, 24.8, 26.9, 17.9, 20.8, 24.1, 21.7, 27.8, 23.9, 27.3, 22.1, 25.4, 21.4, 25.0, 27.0, 22.3, 25.9,
            24.7, 26.8, 22.8, 25.6, 21.9, 24.2, 26.6, 23.0, 25.1, 21.5, 24.9, 26.1, 22.5, 25.3, 21.1, 24.3, 26.3, 23.2, 25.7, 22.4, 24.5, 26.4, 23.5, 25.2]
})

# calculate the Pearson correlation coefficient
pearson_corr, _ = stats.pearsonr(df['hours_of_exercise'], df['bmi'])
print("Pearson correlation coefficient:", pearson_corr)

# calculate the Spearman's rank correlation
spearman_corr, _ = stats.spearmanr(df['hours_of_exercise'], df['bmi'])
print("Spearman's rank correlation:", spearman_corr)


Pearson correlation coefficient: 0.465861920504224
Spearman's rank correlation: 0.41800710595043267


In [4]:
# Q 4 Answer:
import pandas as pd

# create a DataFrame with X and Y variables
df = pd.DataFrame({'X': [3, 2, 5, 1, 4, 3, 2, 4, 5, 1],
                   'Y': [2, 3, 1, 4, 1, 2, 3, 1, 2, 4]})

# calculate the Pearson correlation coefficient
r = df['X'].corr(df['Y'], method='pearson')

print('Pearson correlation coefficient:', r)


Pearson correlation coefficient: -0.899954085146515


In [None]:
# Q 5 Answer:
"""
It is not possible to calculate a correlation coefficient between age and soft drink 
preference since age is a continuous variable and soft drink preference is a categorical variable.
To examine the relationship between these variables, a chi-squared test or a contingency table analysis could be used.

"""

In [6]:
# Q 6 Answer:
import numpy as np

# sample data
x = [10, 12, 8, 14, 9, 11, 15, 12, 13, 16, 11, 10, 12, 14, 15, 13, 10, 11, 12, 13, 16, 15, 14, 12, 11, 9, 8, 10, 11, 12]
y = [3, 4, 2, 5, 3, 4, 6, 5, 5, 7, 4, 3, 4, 5, 6, 5, 3, 4, 4, 5, 7, 6, 5, 4, 4, 2, 2, 3, 4, 5]

# calculate Pearson correlation coefficient
r, p = np.corrcoef(x, y)

print("Pearson correlation coefficient:", r)


Pearson correlation coefficient: [1.         0.96815001]
