Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
you have collected data on the amount of time students spend studying for an exam and their final exam
scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

In [1]:
import numpy as np
from scipy.stats import pearsonr

# Data
study_time = np.array([2, 3, 5, 7, 9, 10])
exam_scores = np.array([50, 55, 65, 70, 85, 90])

# Compute Pearson correlation coefficient
correlation_coefficient, _ = pearsonr(study_time, exam_scores)

print(f"Pearson Correlation Coefficient: {correlation_coefficient:.2f}")


Pearson Correlation Coefficient: 0.99


Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
Suppose you have collected data on the amount of sleep individuals get each night and their overall job
satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
variables and interpret the result.



In [2]:
import numpy as np
from scipy.stats import spearmanr

# Data
sleep_hours = np.array([4, 5, 6, 7, 8, 9])
job_satisfaction = np.array([3, 4, 7, 8, 9, 10])

# Compute Spearman's rank correlation coefficient
spearman_corr, _ = spearmanr(sleep_hours, job_satisfaction)

print(f"Spearman's Rank Correlation Coefficient: {spearman_corr:.2f}")


Spearman's Rank Correlation Coefficient: 1.00


Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
between these two variables and compare the results.



Interpret:-

In our generated data, we expect negative correlations (since more exercise generally lowers BMI).

Pearson’s r may be weaker if the relationship isn’t strictly linear.
Spearman’s ρ will likely be stronger if BMI decreases consistently as exercise incre

In [3]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

# Set random seed for reproducibility
np.random.seed(42)

# Generate random data for 50 participants
exercise_hours = np.random.uniform(0, 10, 50)  # Exercise hours (0 to 10 hours per week)
bmi_values = np.random.uniform(18, 35, 50) - 0.3 * exercise_hours  # BMI decreasing with more exercise

# Compute Pearson correlation
pearson_corr, _ = pearsonr(exercise_hours, bmi_values)

# Compute Spearman's rank correlation
spearman_corr, _ = spearmanr(exercise_hours, bmi_values)

print(f"Pearson Correlation Coefficient: {pearson_corr:.2f}")
print(f"Spearman's Rank Correlation Coefficient: {spearman_corr:.2f}")


Pearson Correlation Coefficient: -0.10
Spearman's Rank Correlation Coefficient: -0.05


Q4. A researcher is interested in examining the relationship between the number of hours individuals
spend watching television per day and their level of physical activity. The researcher collected data on
both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
these two variables.

In [4]:
import numpy as np
from scipy.stats import pearsonr

# Set random seed for reproducibility
np.random.seed(42)

# Generate random data for 50 participants
tv_hours = np.random.uniform(0, 6, 50)  # Hours of TV watched per day (0 to 6 hours)
physical_activity = np.random.uniform(0, 10, 50) - 0.5 * tv_hours  # Physical activity decreases with more TV

# Compute Pearson correlation coefficient
pearson_corr, _ = pearsonr(tv_hours, physical_activity)

pearson_corr


np.float64(-0.21547654463138363)

Q5. A survey was conducted to examine the relationship between age and preference for a particular
brand of soft drink. The survey results are shown below:

In [5]:
import pandas as pd
from scipy.stats import chi2_contingency

# Sample data
data = {'Age': [25, 42, 37, 19, 31, 28], 
        'Soft_Drink': ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']}

df = pd.DataFrame(data)

# Create age bins
df['Age_Group'] = pd.cut(df['Age'], bins=[18, 25, 35, 45], labels=['18-25', '26-35', '36-45'])

# Create contingency table
contingency_table = pd.crosstab(df['Age_Group'], df['Soft_Drink'])

# Perform Chi-Square test
chi2, p, dof, expected = chi2_contingency(contingency_table)

print(f"Chi-Square Statistic: {chi2:.2f}")
print(f"P-Value: {p:.4f}")


Chi-Square Statistic: 5.00
P-Value: 0.2873


In [6]:
from scipy.stats import f_oneway

# Group ages by soft drink preference
coke_ages = df[df['Soft_Drink'] == 'Coke']['Age']
pepsi_ages = df[df['Soft_Drink'] == 'Pepsi']['Age']
dew_ages = df[df['Soft_Drink'] == 'Mountain Dew']['Age']

# Perform ANOVA test
anova_result = f_oneway(coke_ages, pepsi_ages, dew_ages)

print(f"ANOVA P-Value: {anova_result.pvalue:.4f}")


ANOVA P-Value: 0.1631


Q6. A company is interested in examining the relationship between the number of sales calls made per day
and the number of sales made per week. The company collected data on both variables from a sample of
30 sales representatives. Calculate the Pearson correlation coefficient between these two variables

In [7]:
# Import necessary libraries
import numpy as np
from scipy.stats import pearsonr

# Set random seed for reproducibility
np.random.seed(42)

# Generate random data for 30 sales representatives
sales_calls_per_day = np.random.randint(5, 30, 30)  # Number of sales calls made per day (5 to 30)
sales_per_week = np.random.randint(10, 100, 30) + 2 * sales_calls_per_day  # Sales per week

# Compute Pearson correlation coefficient
pearson_corr, _ = pearsonr(sales_calls_per_day, sales_per_week)

pearson_corr


np.float64(0.4176122039520441)