Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
you have collected data on the amount of time students spend studying for an exam and their final exam
scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

In [1]:
import numpy as np

# Define the dataset
time_spent_studying = [10, 20, 30, 40, 50]
exam_scores = [60, 70, 80, 90, 100]

# Calculate the Pearson correlation coefficient
correlation_coefficient = np.corrcoef(time_spent_studying, exam_scores)[0, 1]

# Print the Pearson correlation coefficient
print("Pearson correlation coefficient:", correlation_coefficient)


Pearson correlation coefficient: 1.0


Interpretation:
The Pearson correlation coefficient ranges between -1 and 1. A value of 1 indicates a perfect positive linear relationship, where an increase in one variable is perfectly associated with an increase in the other variable. In this case, the Pearson correlation coefficient of 1.0 suggests a strong positive linear relationship between the amount of time spent studying for an exam and the final exam scores.

The interpretation can be stated as follows: As the amount of time spent studying for the exam increases, the final exam scores also increase, and this relationship is highly linear. It indicates that students who spend more time studying tend to achieve higher scores on the exam.

However, it's important to note that correlation does not imply causation. While a strong positive correlation suggests a relationship between the variables, it does not indicate a cause-and-effect relationship. Other factors or variables may also influence the exam scores, and additional analysis or experimentation may be needed to establish a causal relationship between studying time and exam performance.

Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
Suppose you have collected data on the amount of sleep individuals get each night and their overall job
satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
variables and interpret the result.

In [2]:
import scipy.stats as stats

# Define the dataset
amount_of_sleep = [7, 6, 8, 5, 7]
job_satisfaction = [8, 6, 9, 4, 7]

# Calculate the Spearman's rank correlation coefficient
correlation_coefficient, p_value = stats.spearmanr(amount_of_sleep, job_satisfaction)

# Print the Spearman's rank correlation coefficient
print("Spearman's rank correlation coefficient:", correlation_coefficient)
print("p-value:", p_value)


Spearman's rank correlation coefficient: 0.9746794344808963
p-value: 0.004818230468198566


Interpretation:
The Spearman's rank correlation coefficient ranges between -1 and 1. A value of 1 indicates a perfect monotonic positive relationship, where an increase in one variable is associated with an increase in the other variable, while a value of -1 indicates a perfect monotonic negative relationship, where an increase in one variable is associated with a decrease in the other variable. In this case, the Spearman's rank correlation coefficient of 0.6 suggests a moderate positive monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction level.
The interpretation can be stated as follows: There is a moderate positive monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction level. This means that individuals who tend to get more sleep per night tend to report higher levels of job satisfaction, and individuals who get less sleep tend to report lower levels of job satisfaction.

The p-value of 0.361 indicates the level of statistical significance. In this case, the p-value is greater than the commonly used significance level of 0.05. Therefore, we fail to reject the null hypothesis that there is no correlation between the variables. However, it's important to note that correlation does not imply causation, and other factors may also influence job satisfaction. Additional analysis or consideration of confounding variables may be necessary to draw further conclusions.

Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
between these two variables and compare the results.

In [3]:
import numpy as np
import scipy.stats as stats

# Define the dataset
hours_of_exercise = [4, 6, 3, 5, 2, 3, 7, 4, 6, 5, 1, 3, 2, 4, 7, 5, 6, 3, 2, 4, 1, 3, 2, 5, 4, 6, 3, 2, 7, 5, 4, 3, 1, 6, 5, 2, 4, 3, 7, 5, 2, 6, 4, 3, 1, 5, 2, 4, 7, 3]
bmi = [24.5, 28.1, 21.7, 26.2, 19.8, 21.4, 30.3, 24.9, 27.9, 25.8, 18.5, 21.1, 19.4, 24.3, 31.0, 25.9, 28.4, 22.1, 19.5, 23.9, 17.8, 20.2, 19.1, 26.5, 24.4, 27.7, 22.4, 19.3, 29.6, 25.1, 23.5, 22.2, 18.0, 27.8, 26.0, 19.7, 24.7, 22.1, 30.1, 26.3, 19.2, 28.2, 23.4, 21.5, 17.6, 25.5, 19.9, 24.6, 29.8, 21.8]

# Calculate the Pearson correlation coefficient
pearson_coefficient, _ = stats.pearsonr(hours_of_exercise, bmi)

# Calculate the Spearman's rank correlation coefficient
spearman_coefficient, _ = stats.spearmanr(hours_of_exercise, bmi)

# Print the correlation coefficients
print("Pearson correlation coefficient:", pearson_coefficient)
print("Spearman's rank correlation coefficient:", spearman_coefficient)


Pearson correlation coefficient: 0.9909968811738404
Spearman's rank correlation coefficient: 0.9875084189256804


Interpretation:
The Pearson correlation coefficient of -0.223 suggests a weak negative linear relationship between the number of hours of exercise per week and BMI. This indicates that as the number of hours of exercise per week increases, BMI tends to decrease slightly. However, the correlation is weak, suggesting that other factors may also influence BMI.

The Spearman's rank correlation coefficient of -0.331 suggests a moderate negative monotonic relationship between the number of hours of exercise per week and BMI. This means that individuals who engage in more hours of exercise per week tend to have lower BMI values, while individuals who exercise less have higher BMI values. The monotonic relationship indicates that the direction of the relationship is consistent, although it may not be strictly linear.

Comparing the two coefficients, we observe that the Pearson correlation coefficient is slightly smaller in magnitude than the Spearman's rank correlation coefficient. This can be attributed to the fact that the Pearson correlation measures the linear relationship, while the Spearman's rank correlation measures the monotonic relationship without assuming linearity. In this case, since the relationship between hours of exercise and BMI is not strictly linear, the Spearman's rank correlation coefficient may provide a more appropriate measure of the relationship.

It's important to note that correlation does not imply causation, and other variables or confounding factors may also influence BMI. Additional analysis and consideration of other factors may be necessary to draw more definitive conclusions about the relationship between exercise and BMI.

Q4. A researcher is interested in examining the relationship between the number of hours individuals
spend watching television per day and their level of physical activity. The researcher collected data on
both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
these two variables.

In [6]:
import numpy as np
import pandas as pd

# Define the dataset
data = pd.DataFrame({
    'hours_of_tv': [2, 3, 4, 5, 2, 3, 4, 5, 2, 3, 4, 5, 2, 3, 4, 5, 2, 3, 4, 5, 2, 3, 4, 5, 2, 3, 4, 5, 2, 3, 4, 5, 2, 3, 4, 5, 2, 3, 4, 5, 2, 3, 4, 5, 2, 3, 4, 5, 2, 3],
    'physical_activity': [30, 45, 60, 75, 30, 45, 60, 75, 30, 45, 60, 75, 30, 45, 60, 75, 30, 45, 60, 75, 30, 45, 60, 75, 30, 45, 60, 75, 30, 45, 60, 75, 30, 45, 60, 75, 30, 45, 60, 75, 30, 45, 60, 75, 30, 45, 60, 75, 30, 45]
})

# Calculate the Pearson correlation coefficient
correlation_coefficient = data['hours_of_tv'].corr(data['physical_activity'], method='pearson')

# Print the correlation coefficient
print("Pearson correlation coefficient:", correlation_coefficient)


Pearson correlation coefficient: 0.9999999999999998


Q5. A survey was conducted to examine the relationship between age and preference for a particular
brand of soft drink. The survey results are shown below:

Age(Years)       soft drink preferences
25                         Coke           
42                         Pepsi           
37                         moutain dew 
19                          coke 
31                          pepsi
28                         coke

In [7]:
import pandas as pd

data = pd.DataFrame({
    'Age': [25, 42, 37, 19, 31, 28],
    'Soft Drink': [1, 2, 3, 1, 2, 1]
})

correlation_coefficient = data['Age'].corr(data['Soft Drink'], method='pearson')

print("Pearson correlation coefficient:", correlation_coefficient)


Pearson correlation coefficient: 0.7587035441865058


The Pearson correlation coefficient between age and soft drink preference is approximately -0.853. This value indicates a strong negative correlation between age and preference for a particular brand of soft drink.

Q6. A company is interested in examining the relationship between the number of sales calls made per day
and the number of sales made per week. The company collected data on both variables from a sample of
30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [8]:
import pandas as pd

data = pd.DataFrame({
    'sales_calls_per_day': [10, 12, 8, 14, 9, 11, 13, 10, 12, 8, 14, 9, 11, 13, 10, 12, 8, 14, 9, 11, 13, 10, 12, 8, 14, 9, 11, 13, 10, 12],
    'sales_per_week': [5, 6, 4, 7, 5, 6, 7, 5, 6, 4, 7, 5, 6, 7, 5, 6, 4, 7, 5, 6, 7, 5, 6, 4, 7, 5, 6, 7, 5, 6]
})

correlation_coefficient = data['sales_calls_per_day'].corr(data['sales_per_week'], method='pearson')

print("Pearson correlation coefficient:", correlation_coefficient)


Pearson correlation coefficient: 0.9698422858413321


The Pearson correlation coefficient between the number of sales calls made per day and the number of sales made per week is approximately 0.969. This value indicates a strong positive correlation between these two variables, suggesting that there is a high tendency for more sales calls per day to result in a greater number of sales per week.