Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
you have collected data on the amount of time students spend studying for an exam and their final exam
scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

To calculate the Pearson correlation coefficient, we need to compute the covariance between the two variables and their standard deviations. Let's assume that we have n data points for studying time and final exam scores, denoted by x and y respectively. Then, the formula for the Pearson correlation coefficient, denoted by r, is:

r = (1/n) * Σ[(xᵢ - ȳ) * (yᵢ - ȳ)] / (sx * sy)

where Σ is the sum over all i, xᵢ and yᵢ are the i-th values of x and y, respectively, ȳ is the mean of y, and sx and sy are the standard deviations of x and y, respectively.

Interpretation of the result:

The Pearson correlation coefficient is a value between -1 and 1. A value of 1 indicates a perfect positive linear relationship between the two variables, while a value of -1 indicates a perfect negative linear relationship. A value of 0 indicates no linear relationship between the variables.

If the Pearson correlation coefficient between the amount of time students spend studying for an exam and their final exam scores is positive, it means that there is a positive linear relationship between the two variables. In other words, as the amount of time students spend studying increases, their final exam scores also tend to increase. A value close to 1 indicates a strong positive correlation, while a value close to 0 indicates a weak positive correlation.

On the other hand, if the Pearson correlation coefficient is negative, it means that there is a negative linear relationship between the two variables. In other words, as the amount of time students spend studying increases, their final exam scores tend to decrease. A value close to -1 indicates a strong negative correlation, while a value close to 0 indicates a weak negative correlation.

In summary, the Pearson correlation coefficient allows us to quantify the strength and direction of the linear relationship between two variables. In the context of studying time and final exam scores, a positive correlation would suggest that students who study more tend to perform better on the exam, while a negative correlation would suggest the opposite.

Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
Suppose you have collected data on the amount of sleep individuals get each night and their overall job
satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
variables and interpret the result.

To calculate the Spearman's rank correlation, we need to convert the original data into ranks and then compute the Pearson correlation coefficient on the ranks. The Spearman's rank correlation, denoted by ρ, ranges between -1 and 1, where a value of 1 indicates a perfect monotonic increasing relationship between the two variables, a value of -1 indicates a perfect monotonic decreasing relationship, and a value of 0 indicates no monotonic relationship.

To compute the Spearman's rank correlation between the amount of sleep individuals get each night and their overall job satisfaction level, we need to:

Rank the values of each variable separately, from lowest to highest.
Calculate the differences between the ranks of each pair of observations.
Compute the Pearson correlation coefficient on the ranks.
Interpretation of the result:

If the Spearman's rank correlation is positive, it means that there is a monotonic increasing relationship between the two variables. In other words, as the amount of sleep individuals get each night increases, their overall job satisfaction level tends to increase as well. A value close to 1 indicates a strong positive rank correlation, while a value close to 0 indicates a weak positive rank correlation.

On the other hand, if the Spearman's rank correlation is negative, it means that there is a monotonic decreasing relationship between the two variables. In other words, as the amount of sleep individuals get each night increases, their overall job satisfaction level tends to decrease. A value close to -1 indicates a strong negative rank correlation, while a value close to 0 indicates a weak negative rank correlation.

If the Spearman's rank correlation is close to 0, it means that there is no monotonic relationship between the two variables. This could happen if there is a nonlinear relationship or if there are outliers that affect the correlation.

In summary, the Spearman's rank correlation allows us to measure the strength and direction of the monotonic relationship between two variables, in this case, the amount of sleep individuals get each night and their overall job satisfaction level. A positive correlation would suggest that individuals who get more sleep tend to have higher job satisfaction levels, while a negative correlation would suggest the opposite.

Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
between these two variables and compare the results.

In [2]:
import pandas as pd
from scipy.stats import pearsonr, spearmanr

# Create a DataFrame with the data
df = pd.DataFrame({
    'hours_of_exercise': [5, 8, 2, 4],
    'bmi': [22, 25, 20, 24]
})

# Calculate the Pearson correlation coefficient
pearson_corr, _ = pearsonr(df['hours_of_exercise'], df['bmi'])
print('Pearson correlation coefficient:', pearson_corr)

# Calculate the Spearman's rank correlation
spearman_corr, _ = spearmanr(df['hours_of_exercise'], df['bmi'])
print('Spearman rank correlation:', spearman_corr)


Pearson correlation coefficient: 0.8268106308031116
Spearman rank correlation: 0.7999999999999999


Q4. A researcher is interested in examining the relationship between the number of hours individuals
spend watching television per day and their level of physical activity. The researcher collected data on
both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
these two variables.

In [3]:
import pandas as pd
from scipy.stats import pearsonr

# Create a DataFrame with the data
df = pd.DataFrame({
    'hours_of_tv': [2, 3, 1, 4],
    'physical_activity': [20, 15, 25,18]
})

# Calculate the Pearson correlation coefficient
corr, _ = pearsonr(df['hours_of_tv'], df['physical_activity'])
print('Pearson correlation coefficient:', corr)


Pearson correlation coefficient: -0.7985836518841365


Q6. A company is interested in examining the relationship between the number of sales calls made per day
and the number of sales made per week. The company collected data on both variables from a sample of
30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.



In [4]:
import pandas as pd
from scipy.stats import pearsonr

# Create a DataFrame with the data
df = pd.DataFrame({
    'sales_calls_per_day': [20, 25, 30,  15],
    'sales_per_week': [10, 12, 15,  8]
})

# Calculate the Pearson correlation coefficient
corr, _ = pearsonr(df['sales_calls_per_day'], df['sales_per_week'])
print('Pearson correlation coefficient:', corr)


Pearson correlation coefficient: 0.9943767126843689
