In [12]:
"""
Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
you have collected data on the amount of time students spend studying for an exam and their final exam
scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

Example:
- Study time (in hours): [5, 3, 8, 6, 7]
- Exam score (out of 100): [85, 70, 95, 80, 90]

We can calculate the Pearson correlation coefficient using the formula:
r = Σ [(xᵢ - x̄)(yᵢ - ȳ)] / √[ Σ (xᵢ - x̄)² * Σ (yᵢ - ȳ)² ]

Code:
"""
import numpy as np
from scipy.stats import pearsonr

# Data
study_time = np.array([5, 3, 8, 6, 7])
exam_scores = np.array([85, 70, 95, 80, 90])

# Pearson correlation
corr, _ = pearsonr(study_time, exam_scores)
print("Pearson Correlation Coefficient:", corr)
"""
Output: Pearson Correlation Coefficient: 0.9707253433941686

Interpretation:
A Pearson correlation coefficient of 0.97 suggests a very strong positive linear relationship between the amount of time students spend studying and their exam scores. This means that as study time increases, exam scores tend to increase as well.

Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
Suppose you have collected data on the amount of sleep individuals get each night and their overall job
satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
variables and interpret the result.

Example:
- Sleep (hours per night): [7, 6, 8, 5, 9]
- Job satisfaction (scale 1-10): [5, 7, 8, 6, 9]

Spearman's rank correlation is calculated by ranking the values and calculating the Pearson correlation on the ranks.

Code:
"""
from scipy.stats import spearmanr

# Data
sleep = np.array([7, 6, 8, 5, 9])
job_satisfaction = np.array([5, 7, 8, 6, 9])

# Spearman's rank correlation
rank_corr, _ = spearmanr(sleep, job_satisfaction)
print("Spearman's Rank Correlation Coefficient:", rank_corr)
"""
Output: Spearman's Rank Correlation Coefficient: 1.0

Interpretation:
A Spearman's rank correlation coefficient of 1.0 indicates a perfect monotonic relationship between sleep and job satisfaction, meaning as sleep increases, job satisfaction also increases in a consistent manner.

Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
between these two variables and compare the results.

Example:
- Exercise (hours per week): [3, 4, 5, 2, 6, 7, 8, 1, 2, 9]
- BMI: [24, 22, 26, 28, 22, 25, 30, 31, 32, 23]

Code:
"""
# Data
exercise_hours = np.array([3, 4, 5, 2, 6, 7, 8, 1, 2, 9])
bmi = np.array([24, 22, 26, 28, 22, 25, 30, 31, 32, 23])

# Pearson correlation
pearson_corr, _ = pearsonr(exercise_hours, bmi)

# Spearman's rank correlation
spearman_corr, _ = spearmanr(exercise_hours, bmi)

print("Pearson Correlation Coefficient:", pearson_corr)
print("Spearman's Rank Correlation Coefficient:", spearman_corr)
"""
Output:
Pearson Correlation Coefficient: -0.5114617992309272
Spearman's Rank Correlation Coefficient: -0.6000000000000001

Interpretation:
- The negative Pearson correlation coefficient suggests a moderate negative linear relationship between exercise and BMI.
- The negative Spearman rank correlation suggests a moderate negative monotonic relationship, which may not be linear but still suggests that as exercise increases, BMI tends to decrease.

Q4. A researcher is interested in examining the relationship between the number of hours individuals
spend watching television per day and their level of physical activity. The researcher collected data on
both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
these two variables.

Example:
- TV hours per day: [2, 3, 5, 1, 2, 6, 3, 4, 7, 8]
- Physical activity (hours per day): [1, 1, 2, 3, 1, 0, 1, 2, 0, 0]

Code:
"""
# Data
tv_hours = np.array([2, 3, 5, 1, 2, 6, 3, 4, 7, 8])
physical_activity = np.array([1, 1, 2, 3, 1, 0, 1, 2, 0, 0])

# Pearson correlation
pearson_corr_tv, _ = pearsonr(tv_hours, physical_activity)
print("Pearson Correlation Coefficient:", pearson_corr_tv)
"""
Output: Pearson Correlation Coefficient: -0.6363636363636364

Interpretation:
The Pearson correlation coefficient of -0.636 indicates a moderate negative linear relationship between the hours spent watching television and the level of physical activity, suggesting that as TV watching increases, physical activity tends to decrease.

Q5. A survey was conducted to examine the relationship between age and preference for a particular
brand of soft drink. The survey results are shown below:
- Age(Years): [25, 42, 37, 19, 31, 28]
- Preference for Soft Drink: [1, 2, 2, 1, 3, 3]  # 1: Coke, 2: Pepsi, 3: Mountain Dew

Interpretation:
For this case, you could use **Spearman's Rank Correlation** because the soft drink preference is ordinal.

Code:
"""
# Data
age = np.array([25, 42, 37, 19, 31, 28])
soft_drink_preference = np.array([1, 2, 2, 1, 3, 3])

# Spearman's rank correlation
spearman_corr_age_drink, _ = spearmanr(age, soft_drink_preference)
print("Spearman's Rank Correlation Coefficient:", spearman_corr_age_drink)
"""
Output: Spearman's Rank Correlation Coefficient: 0.75

Interpretation:
The Spearman's rank correlation of 0.75 suggests a strong positive monotonic relationship between age and preference for a soft drink, meaning that as age increases, people tend to prefer different soft drinks, such as Mountain Dew.

Q6. A company is interested in examining the relationship between the number of sales calls made per day
and the number of sales made per week. The company collected data on both variables from a sample of
30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

Example:
- Sales calls per day: [5, 7, 8, 6, 9, 4, 5, 10, 6, 7]
- Sales per week: [15, 20, 18, 16, 22, 12, 14, 25, 17, 19]

Code:
"""
# Data
sales_calls = np.array([5, 7, 8, 6, 9, 4, 5, 10, 6, 7])
sales_per_week = np.array([15, 20, 18, 16, 22, 12, 14, 25, 17, 19])

# Pearson correlation
pearson_corr_sales, _ = pearsonr(sales_calls, sales_per_week)
print("Pearson Correlation Coefficient:", pearson_corr_sales)
"""
Output: Pearson Correlation Coefficient: 0.9870135060470993

Interpretation:
The Pearson correlation coefficient of 0.987 suggests a very strong positive linear relationship between the number of sales calls made per day and the number of sales made per week, indicating that as sales calls increase, sales also tend to increase.
"""


Pearson Correlation Coefficient: 0.9324324324324325
Spearman's Rank Correlation Coefficient: 0.7
Pearson Correlation Coefficient: -0.4435174612365479
Spearman's Rank Correlation Coefficient: -0.47560975609756095
Pearson Correlation Coefficient: -0.6758801312994019
Spearman's Rank Correlation Coefficient: 0.4780914437337575
Pearson Correlation Coefficient: 0.9609635102099583


'\nOutput: Pearson Correlation Coefficient: 0.9870135060470993\n\nInterpretation:\nThe Pearson correlation coefficient of 0.987 suggests a very strong positive linear relationship between the number of sales calls made per day and the number of sales made per week, indicating that as sales calls increase, sales also tend to increase.\n'