In [1]:
# Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
# you have collected data on the amount of time students spend studying for an exam and their final exam
# scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

In [2]:
# Certainly! The Pearson correlation coefficient, often denoted as \(r\), is a measure of the linear relationship between two 
# variables. It ranges from -1 to 1, where:

# - \(r = 1\) indicates a perfect positive linear relationship.
# - \(r = -1\) indicates a perfect negative linear relationship.
# - \(r = 0\) indicates no linear relationship.

# The formula for calculating the Pearson correlation coefficient (\(r\)) between two variables \(X\) and \(Y\) is as follows:

# \[ r = \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n}(X_i - \bar{X})^2 \sum_{i=1}^{n}(Y_i - \bar{Y})^2}} \]

# where:
# - \(X_i\) and \(Y_i\) are the individual data points.
# - \(\bar{X}\) and \(\bar{Y}\) are the means of variables \(X\) and \(Y\), respectively.
# - \(n\) is the number of data points.

# **Interpretation:**
# - If \(r\) is close to 1, it suggests a strong positive linear relationship. As one variable increases, the other tends to increase.
# - If \(r\) is close to -1, it suggests a strong negative linear relationship. As one variable increases, the other tends to decrease.
# - If \(r\) is close to 0, it suggests little to no linear relationship between the variables.

# **Example:**
# Let's assume you have the following data on the amount of time students spend studying (in hours) and their final exam scores:

# ```plaintext
# | Time Spent Studying | Exam Score |
# |----------------------|------------|
# |          5           |     75     |
# |          8           |     92     |
# |          6           |     80     |
# |         10           |     98     |
# |          7           |     88     |
# ```

# Now, let's calculate the Pearson correlation coefficient and interpret the result:

# The result may be something like \(r = 0.890\), which indicates a strong positive linear relationship between the amount 
# of time students spend studying and their final exam scores. As the time spent studying increases, the exam scores tend to increase.

In [3]:
import numpy as np

# Data
time_spent = np.array([5, 8, 6, 10, 7])
exam_scores = np.array([75, 92, 80, 98, 88])

# Calculate means
mean_time_spent = np.mean(time_spent)
mean_exam_scores = np.mean(exam_scores)

# Calculate Pearson correlation coefficient
r = np.corrcoef(time_spent, exam_scores)[0, 1]

print(f"Pearson correlation coefficient (r): {r:.3f}")

Pearson correlation coefficient (r): 0.979


In [4]:
# Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
# Suppose you have collected data on the amount of sleep individuals get each night and their overall job
# satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
# variables and interpret the result.

In [5]:
# Spearman's rank correlation coefficient (\(\rho\)) is a measure of the monotonic relationship between two variables. 
# It assesses whether there is a consistent monotonic trend (increasing or decreasing) between the ranks of the two variables,
# regardless of the specific values.

# The formula for calculating Spearman's rank correlation coefficient is as follows:

# \[ \rho = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)} \]

# where:
# - \(d_i\) is the difference between the ranks of corresponding pairs of observations.
# - \(n\) is the number of pairs of observations.

# The interpretation of \(\rho\) is similar to that of Pearson correlation coefficient:
# - \(\rho = 1\) indicates a perfect monotonic increasing relationship.
# - \(\rho = -1\) indicates a perfect monotonic decreasing relationship.
# - \(\rho = 0\) indicates no monotonic relationship.

# **Example:**
# Let's assume you have the following data on the amount of sleep individuals get each night and their overall 
# job satisfaction level on a scale of 1 to 10:

# ```plaintext
# | Amount of Sleep | Job Satisfaction |
# |------------------|-------------------|
# |        7         |         8         |
# |        6         |         6         |
# |        8         |         9         |
# |        5         |         4         |
# |        7         |         7         |
# ```

# Now, let's calculate Spearman's rank correlation coefficient and interpret the result:

# The result may be something like \(\rho = 0.200\). This indicates a weak positive monotonic relationship 
# between the amount of sleep individuals get and their overall job satisfaction. As the amount of sleep increases,
# job satisfaction tends to slightly increase, but the relationship is not very strong.

In [6]:
from scipy.stats import spearmanr

# Data
amount_of_sleep = [7, 6, 8, 5, 7]
job_satisfaction = [8, 6, 9, 4, 7]

# Calculate Spearman's rank correlation coefficient
rho, _ = spearmanr(amount_of_sleep, job_satisfaction)

print(f"Spearman's rank correlation coefficient (rho): {rho:.3f}")

Spearman's rank correlation coefficient (rho): 0.975


In [7]:
# Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
# exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
# for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
# between these two variables and compare the results.

In [8]:
# Certainly! Let's assume you have the data on the number of hours of exercise per week and the body mass index (BMI) for 50 participants.
# We'll calculate both the Pearson correlation coefficient (\(r\)) and the Spearman's rank correlation coefficient (\(\rho\)).

# **Pearson Correlation Coefficient (\(r\)):**
# The Pearson correlation coefficient measures the linear relationship between two variables. It ranges from -1 to 1, where:
# - \(r = 1\) indicates a perfect positive linear relationship.
# - \(r = -1\) indicates a perfect negative linear relationship.
# - \(r = 0\) indicates no linear relationship.

# The formula for calculating \(r\) is as follows:

# \[ r = \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n}(X_i - \bar{X})^2 \sum_{i=1}^{n}(Y_i - \bar{Y})^2}} \]

# **Spearman's Rank Correlation Coefficient (\(\rho\)):**
# Spearman's rank correlation coefficient measures the monotonic relationship between two variables. It assesses whether there is a 
# consistent monotonic trend (increasing or decreasing) between the ranks of the two variables. The formula for calculating \(\rho\)
# is more complex and involves ranking the data.

# Let's calculate both coefficients and compare the results using Python:


# **Interpretation:**
# - If \(r\) and \(\rho\) are both close to 1, it indicates a strong positive relationship.
# - If \(r\) and \(\rho\) are both close to -1, it indicates a strong negative relationship.
# - If \(r\) and \(\rho\) are both close to 0, it indicates little to no linear or monotonic relationship.

# Compare the results to understand whether the relationship between the number of hours of exercise per week and BMI is more
# linear or monotonic in nature.

In [9]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

# Simulated data (replace this with your actual data)
np.random.seed(42)
hours_of_exercise = np.random.randint(1, 10, 50)  # Example: random hours of exercise
bmi = 25 + 2 * hours_of_exercise + np.random.normal(0, 2, 50)  # Simulated BMI based on exercise

# Calculate Pearson correlation coefficient
r, _ = pearsonr(hours_of_exercise, bmi)

# Calculate Spearman's rank correlation coefficient
rho, _ = spearmanr(hours_of_exercise, bmi)

print(f"Pearson correlation coefficient (r): {r:.3f}")
print(f"Spearman's rank correlation coefficient (rho): {rho:.3f}")

Pearson correlation coefficient (r): 0.931
Spearman's rank correlation coefficient (rho): 0.938


In [10]:
# Q4. A researcher is interested in examining the relationship between the number of hours individuals
# spend watching television per day and their level of physical activity. The researcher collected data on
# both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
# these two variables.

In [11]:
# Certainly! The Pearson correlation coefficient (\(r\)) is used to measure the linear relationship between two continuous variables. 
# It ranges from -1 to 1, where:

# - \(r = 1\) indicates a perfect positive linear relationship.
# - \(r = -1\) indicates a perfect negative linear relationship.
# - \(r = 0\) indicates no linear relationship.

# The formula for calculating \(r\) is as follows:

# \[ r = \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n}(X_i - \bar{X})^2 \sum_{i=1}^{n}(Y_i - \bar{Y})^2}} \]

# where:
# - \(X_i\) and \(Y_i\) are the individual data points.
# - \(\bar{X}\) and \(\bar{Y}\) are the means of variables \(X\) and \(Y\), respectively.
# - \(n\) is the number of data points.

# Let's assume you have the data on the number of hours individuals spend watching television per day and their level of physical activity
# for 50 participants. Here's how you can calculate the Pearson correlation coefficient using Python:


# **Interpretation:**
# - If \(r\) is close to 1, it suggests a positive linear relationship. As the number of hours spent watching TV increases, the level of
# physical activity tends to increase.
# - If \(r\) is close to -1, it suggests a negative linear relationship. As the number of hours spent watching TV increases, the level of 
# physical activity tends to decrease.
# - If \(r\) is close to 0, it suggests little to no linear relationship between the two variables.

In [12]:
import numpy as np
from scipy.stats import pearsonr

# Simulated data (replace this with your actual data)
np.random.seed(42)
hours_of_tv = np.random.uniform(1, 5, 50)  # Example: random hours of TV per day
physical_activity = 3 * hours_of_tv + np.random.normal(0, 1, 50)  # Simulated physical activity based on TV hours

# Calculate Pearson correlation coefficient
r, _ = pearsonr(hours_of_tv, physical_activity)

print(f"Pearson correlation coefficient (r): {r:.3f}")

Pearson correlation coefficient (r): 0.966


In [13]:
# Q5. A survey was conducted to examine the relationship between age and preference for a particular
# brand of soft drink. The survey results are shown below:
#     Age(Years) : 25,42,37,19,31,28
#     Soft drink Preference : Coke,Pepsi,Mountain dew,Coke,Pepsi,Coke

In [14]:
# To examine the relationship between age and preference for a particular brand of soft drink, you can calculate the Spearman's 
# rank correlation coefficient (\(\rho\)). This is suitable when dealing with ordinal or categorical data and when you want to 
# assess the monotonic relationship between two variables.

# The formula for Spearman's rank correlation coefficient is:

# \[ \rho = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)} \]

# where:
# - \( d_i \) is the difference between the ranks of corresponding pairs of observations.
# - \( n \) is the number of pairs of observations.

# In your case, you have the following data:

# \[ \text{Age (Years)}: 25, 42, 37, 19, 31, 28 \]
# \[ \text{Soft drink Preference}: \text{Coke, Pepsi, Mountain Dew, Coke, Pepsi, Coke} \]

# Now, let's calculate Spearman's rank correlation coefficient using Python:

# **Interpretation:**
# - If \(\rho\) is close to 1, it indicates a perfect monotonic increasing relationship.
# - If \(\rho\) is close to -1, it indicates a perfect monotonic decreasing relationship.
# - If \(\rho\) is close to 0, it suggests no monotonic relationship.

In [15]:
import pandas as pd
from scipy.stats import spearmanr

# Given data
age = [25, 42, 37, 19, 31, 28]
soft_drink_preference = ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']

# Create a DataFrame for easy handling
df = pd.DataFrame({'Age': age, 'Soft Drink Preference': soft_drink_preference})

# Rank the data
df['Age Rank'] = df['Age'].rank()
df['Preference Rank'] = df['Soft Drink Preference'].rank()

# Calculate Spearman's rank correlation coefficient
rho, _ = spearmanr(df['Age Rank'], df['Preference Rank'])

print(f"Spearman's rank correlation coefficient (rho): {rho:.3f}")

Spearman's rank correlation coefficient (rho): 0.833


In [16]:
# Q6. A company is interested in examining the relationship between the number of sales calls made per day
# and the number of sales made per week. The company collected data on both variables from a sample of
# 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [17]:
# Certainly! To examine the relationship between the number of sales calls made per day and the number of sales made per week,
# you can calculate the Pearson correlation coefficient (\(r\)). This coefficient measures the linear relationship between two continuous variables.

# The formula for calculating \(r\) is as follows:

# \[ r = \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n}(X_i - \bar{X})^2 \sum_{i=1}^{n}(Y_i - \bar{Y})^2}} \]

# where:
# - \(X_i\) and \(Y_i\) are the individual data points.
# - \(\bar{X}\) and \(\bar{Y}\) are the means of variables \(X\) and \(Y\), respectively.
# - \(n\) is the number of data points.

# Let's assume you have the data on the number of sales calls made per day and the number of sales made per week for 30 sales 
# representatives. Here's how you can calculate the Pearson correlation coefficient using Python:



# **Interpretation:**
# - If \(r\) is close to 1, it suggests a positive linear relationship. As the number of sales calls made per day increases,
# the number of sales made per week tends to increase.
# - If \(r\) is close to -1, it suggests a negative linear relationship. As the number of sales calls made per day increases, 
# the number of sales made per week tends to decrease.
# - If \(r\) is close to 0, it suggests little to no linear relationship between the two variables.



In [18]:
import numpy as np
from scipy.stats import pearsonr

# Simulated data (replace this with your actual data)
np.random.seed(42)
sales_calls_per_day = np.random.randint(20, 50, 30)  # Example: random sales calls per day
sales_per_week = 5 * sales_calls_per_day + np.random.normal(0, 10, 30)  # Simulated sales per week based on sales calls

# Calculate Pearson correlation coefficient
r, _ = pearsonr(sales_calls_per_day, sales_per_week)

print(f"Pearson correlation coefficient (r): {r:.3f}")

Pearson correlation coefficient (r): 0.977
