Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
you have collected data on the amount of time students spend studying for an exam and their final exam
scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result

In [2]:
import numpy as np
from scipy.stats import pearsonr

study_time = np.array([2, 3, 4, 5, 6, 7, 8])
exam_scores = np.array([50, 55, 60, 65, 70, 75, 80])

corr, p_value = pearsonr(study_time, exam_scores)
print("Pearson Correlation Coefficient:", corr)
print("P-value:", p_value)
#Interpretation: A coefficient close to +1 indicates a strong positive linear relationship (more study time → higher scores)

Pearson Correlation Coefficient: 1.0
P-value: 0.0


Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
Suppose you have collected data on the amount of sleep individuals get each night and their overall job
satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
variables and interpret the result

In [3]:
from scipy.stats import spearmanr

sleep_hours = np.array([6, 5, 8, 7, 4, 9, 6])
job_satisfaction = np.array([6, 4, 9, 8, 3, 10, 7])

corr, p_value = spearmanr(sleep_hours, job_satisfaction)
print("Spearman Correlation Coefficient:", corr)
print("P-value:", p_value)
#Interpretation: A high Spearman correlation (closer to ±1) means the ranks of values increase/decrease together

Spearman Correlation Coefficient: 0.991031208965115
P-value: 1.4561252916128956e-05


Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
between these two variables and compare the results

In [4]:
import pandas as pd

# Simulated data
np.random.seed(0)
exercise_hours = np.random.randint(1, 10, 50)
bmi = 30 - exercise_hours + np.random.normal(0, 2, 50)

pearson_corr = pearsonr(exercise_hours, bmi)
spearman_corr = spearmanr(exercise_hours, bmi)

print("Pearson Correlation:", pearson_corr)
print("Spearman Correlation:", spearman_corr)
#Interpretation: Pearson looks at linearity; Spearman captures monotonic trends. Results may vary depending on outliers and distribution

Pearson Correlation: PearsonRResult(statistic=-0.7685903620946289, pvalue=7.17366440149127e-11)
Spearman Correlation: SignificanceResult(statistic=-0.8074147357981044, pvalue=1.40532564984223e-12)


Q4. A researcher is interested in examining the relationship between the number of hours individuals
spend watching television per day and their level of physical activity. The researcher collected data on
both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
these two variables

In [5]:
tv_hours = np.random.randint(1, 10, 50)
physical_activity = 10 - tv_hours + np.random.normal(0, 1, 50)

corr, p = pearsonr(tv_hours, physical_activity)
print("Pearson Correlation Coefficient:", corr)
#Interpretation: Negative correlation suggests more TV = less activity.

Pearson Correlation Coefficient: -0.9523201950567017


Q5. A survey was conducted to examine the relationship between age and preference for a particular
brand of soft drink. The survey results are shown below:


  Age(Years) : 25,  42,  37, 19, 31, 28 
  
  Soft drink Preference: Coke Pepsi Mountain dew Coke Pepsi Coke


In [7]:
# This is categorical vs numerical, so we use Point Biserial correlation or ANOVA

import pandas as pd
from sklearn.preprocessing import LabelEncoder

age = [25, 42, 37, 19, 31, 28]
soft_drink = ['Coke', 'Pepsi', 'Mountain dew', 'Coke', 'Pepsi', 'Coke']

df = pd.DataFrame({'Age': age, 'Drink': soft_drink})

# Label encode drink preference
encoder = LabelEncoder()
df['Drink_encoded'] = encoder.fit_transform(df['Drink'])

corr, p = pearsonr(df['Age'], df['Drink_encoded'])
print("Correlation between Age and Drink preference:", corr)


Correlation between Age and Drink preference: 0.7691751415594736


Q6. A company is interested in examining the relationship between the number of sales calls made per day 
and the number of sales made per week. The company collected data on both variables from a sample of
30 sales representatives. Calculate the Pearson correlation coefficient between these two variables

In [8]:
sales_calls = np.random.randint(5, 15, 30)
sales_made = sales_calls * 2 + np.random.normal(0, 3, 30)  # Simulate a linear relation with noise

corr, p = pearsonr(sales_calls, sales_made)
print("Pearson Correlation Coefficient:", corr)
#Interpretation: A strong positive correlation implies more calls → more sales

Pearson Correlation Coefficient: 0.8941247512388018
