#### Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.


r = (nΣXY - ΣXΣY) / sqrt((nΣX^2 - (ΣX)^2) * (nΣY^2 - (ΣY)^2))
where:

    n is the sample size
    X and Y are the two variables (time spent studying and final exam scores)
    Σ denotes the sum of the values
    ΣXY denotes the sum of the products of corresponding values of X and Y
    ΣX^2 and ΣY^2 denote the sum of the squares of the values of X and Y, respectively
    Here's an example calculation in Python:

In [1]:
import numpy as np

# create sample data
time_spent_studying = np.array([5, 10, 15, 20, 25])
final_exam_scores = np.array([50, 60, 70, 80, 90])

# calculate the correlation coefficient
n = len(time_spent_studying)
sum_x = np.sum(time_spent_studying)
sum_y = np.sum(final_exam_scores)
sum_xy = np.sum(time_spent_studying * final_exam_scores)
sum_x_squared = np.sum(time_spent_studying ** 2)
sum_y_squared = np.sum(final_exam_scores ** 2)

numerator = (n * sum_xy) - (sum_x * sum_y)
denominator = np.sqrt((n * sum_x_squared - sum_x ** 2) * (n * sum_y_squared - sum_y ** 2))

r = numerator / denominator

print("Pearson correlation coefficient:", r)


Pearson correlation coefficient: 1.0


Interpreting the result: The Pearson correlation coefficient is a value between -1 and 1 that indicates the strength and direction of the linear relationship between two variables. In this case, the correlation coefficient is 1, which indicates a very strong positive correlation between the amount of time students spend studying and their final exam scores. This means that as the amount of time spent studying increases, the final exam scores tend to increase as well (and vice versa).

#### Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.


To calculate the Spearman's rank correlation between the two variables "amount of sleep" and "job satisfaction level", we need to first assign ranks to the data points for each variable. We can then calculate the Pearson correlation coefficient between the ranks of the two variables, which gives us the Spearman's rank correlation coefficient. Here's an example calculation in Python:

In [2]:
import numpy as np
from scipy.stats import spearmanr

# create sample data
amount_of_sleep = np.array([7, 8, 6, 5, 9, 7])
job_satisfaction_level = np.array([6, 8, 4, 5, 9, 7])

# assign ranks to the data points for each variable
sleep_ranks = np.argsort(np.argsort(amount_of_sleep)) + 1
satisfaction_ranks = np.argsort(np.argsort(job_satisfaction_level)) + 1

# calculate the Spearman's rank correlation coefficient
rho, p_value = spearmanr(sleep_ranks, satisfaction_ranks)

print("Spearman's rank correlation coefficient:", rho)


Spearman's rank correlation coefficient: 0.942857142857143


Interpreting the result: The Spearman's rank correlation coefficient is a value between -1 and 1 that indicates the strength and direction of the monotonic relationship between two variables. In this case, the correlation coefficient is 0.943, which indicates a  strong positive monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction level. This means that as the amount of sleep increases, the job satisfaction level tends to increase as well (and vice versa), but the relationship may be linear.

#### Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.


To calculate the Pearson correlation coefficient and Spearman's rank correlation coefficient between the number of hours of exercise per week and BMI, we can use Python's NumPy and SciPy libraries. Here's an example code:


In [9]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

# create sample data
hours_of_exercise = np.array([5, 2, 3, 6, 1, 2, 5, 4, 2, 6, 3, 4, 5, 6, 7, 2, 3, 1,
                              6, 5, 2, 3, 7, 4, 6,2, 5, 4, 2, 6, 3, 4, 5, 6, 7, 2, 3, 1,
                              6, 5, 2, 3, 7, 4, 6, 1, 2, 5])
bmi = np.array([23.2, 25.1, 26.8, 30.5, 28.1, 27.2, 31.6, 29.5, 24.8, 26.9, 30.3, 25.7,
                27.4, 29.2, 23.5, 28.9, 31.8, 25.3, 26.4, 28.5, 29.9, 27.6, 26.2, 25.8,
                30.1, 28.7, 24.9, 26.7, 31.7, 29.4, 24.7, 26.8, 30.4, 28.3, 27.1, 31.5,
                29.3, 23.4, 28.8, 31.9, 25.2, 26.3, 28.4, 29.8, 27.5, 26.1, 25.7, 30.2])

# calculate Pearson correlation coefficient
r, p_value = pearsonr(hours_of_exercise, bmi)
print("Pearson correlation coefficient:", r)

# calculate Spearman's rank correlation coefficient
rho, p_value = spearmanr(hours_of_exercise, bmi)
print("Spearman's rank correlation coefficient:", rho)


Pearson correlation coefficient: 0.10984628294438262
Spearman's rank correlation coefficient: 0.14380057641615937


Interpreting the results: The Pearson correlation coefficient is 0.109, which indicates a weak positive linear relationship between the number of hours of exercise per week and BMI. This means that as the number of hours of exercise increases, the BMI tends to decrease slightly (and vice versa), but the relationship is not very strong.

The Spearman's rank correlation coefficient is 0.143, which indicates a weak positive monotonic relationship between the two variables. This means that as the number of hours of exercise increases, the BMI tends to decrease slightly (and vice versa), but the relationship may not necessarily be linear. The Spearman's rank correlation is lower than the Pearson correlation, which is expected since the relationship between the two variables is not strictly linear.

#### Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.



In [12]:
import numpy as np

# Create arrays for the hours of television watched and physical activity levels
tv_hours = np.array([3, 2, 4, 1, 5, 3, 2, 1, 4, 2, 3, 1, 5, 4, 2, 3, 1, 2, 4, 5, 3, 1, 2, 4, 5, 3, 2, 1, 4, 5, 3, 2, 1, 4, 5, 2, 3, 1, 5, 4, 2, 1, 3, 5, 4, 2, 3, 1, 4, 5])
phys_activity = np.array([5, 8, 6, 9, 4, 6, 8, 9, 5, 7, 6, 9, 4, 5, 7, 6, 9, 8, 5, 4, 7, 6, 5, 4, 6, 9, 7, 5, 4, 6, 9, 8, 7, 5, 4, 6, 9, 7, 8, 5, 4, 6, 9, 8, 7, 5, 4, 6, 9, 7])

# Calculate the Pearson correlation coefficient
corr_coef = np.corrcoef(tv_hours, phys_activity)[0, 1]

print("Pearson correlation coefficient:", corr_coef)


Pearson correlation coefficient: -0.36105936290811486


The Pearson correlation coefficient ranges from -1 to 1, with -1 indicating a perfect negative linear relationship, 0 indicating no linear relationship, and 1 indicating a perfect positive linear relationship. In this case, the Pearson correlation coefficient will tell you the strength and direction of the linear relationship between the number of hours of television watched per day and the level of physical activity. A positive correlation coefficient indicates that as the number of hours of television watched per day increases, the level of physical activity also tends to increase, and a negative correlation coefficient indicates that as the number of hours of television watched per day increases, the level of physical activity tends to decrease. The closer the correlation coefficient is to -1 or 1, the stronger the linear relationship between the variables. If the correlation coefficient is close to 0, there is no linear relationship between the variables.

Here it is negative correlation,which is not strong.

#### Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:

![image.png](attachment:image.png)

To calculate the correlation coefficient between age and preference for soft drink brand, we need to convert the categorical variable (soft drink brand) to a numerical one. We can use label encoding for this purpose. Then, we can calculate the Pearson correlation coefficient using Python's NumPy library:

In [13]:
import numpy as np

# create the data as NumPy arrays
age = np.array([25, 42, 37, 19, 31, 28])
brand = np.array([0, 1, 2, 0, 1, 0]) # 0 = Coke, 1 = Pepsi, 2 = Mountain Dew

# calculate the Pearson correlation coefficient
corr_coef = np.corrcoef(age, brand)[0, 1]
print("The Pearson correlation coefficient is:", corr_coef)


The Pearson correlation coefficient is: 0.7587035441865057


Since the correlation coefficient is positive, we can say that there is a moderately positive correlation between age and preference for soft drink brand. However, the correlation coefficient is moderate, which suggests that age is moderatelt a strong predictor of soft drink brand preference.

#### Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [14]:
import numpy as np

# create the data as NumPy arrays
calls_per_day = np.array([30, 20, 35, 25, 15, 40, 30, 20, 25, 15, 30, 20, 35, 25, 15, 40, 30, 20, 25, 15, 30, 20, 35, 25, 15, 40, 30, 20, 25, 15])
sales_per_week = np.array([10, 6, 12, 9, 5, 14, 10, 6, 9, 5, 10, 6, 12, 9, 5, 14, 10, 6, 9, 5, 10, 6, 12, 9, 5, 14, 10, 6, 9, 5])

# calculate the Pearson correlation coefficient
corr_coef = np.corrcoef(calls_per_day, sales_per_week)[0, 1]
print("The Pearson correlation coefficient is:", corr_coef)


The Pearson correlation coefficient is: 0.9903414488498195


Since the correlation coefficient is close to 1, we can say that there is a strong positive correlation between the number of sales calls made per day and the number of sales made per week. In other words, the more sales calls a representative makes per day, the more sales they are likely to make per week. This information could be useful for the company in terms of setting targets and evaluating the performance of its sales representatives.