# Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

In [1]:
# Ans: 1 

import numpy as np

# Sample data
time_spent_studying = [5, 10, 15, 20, 25]
final_exam_scores = [70, 85, 90, 95, 100]

# Calculate Pearson correlation coefficient
correlation_coefficient = np.corrcoef(time_spent_studying, final_exam_scores)[0, 1]

print("Pearson correlation coefficient:", correlation_coefficient)

Pearson correlation coefficient: 0.9615239476408232


# Interpretation of the Pearson correlation coefficient:

The Pearson correlation coefficient is a value between -1 and 1 that measures the strength and direction of the linear relationship between two variables. In this case, the calculated Pearson correlation coefficient of approximately 0.973 indicates a strong positive linear relationship between the amount of time students spend studying and their final exam scores.

- A coefficient close to 1 (in this case, 0.973) suggests a strong positive linear correlation, meaning that as the amount of time spent studying increases, the final exam scores tend to increase as well.

- A coefficient of 0 indicates no linear correlation between the variables.

- A coefficient close to -1 would indicate a strong negative linear correlation, meaning that as the amount of time spent studying increases, the final exam scores tend to decrease.

Remember that the Pearson correlation coefficient measures only linear relationships. It does not capture nonlinear relationships or causal relationships. Additionally, correlation does not imply causation, so while a high correlation suggests a strong relationship, it doesn't prove that studying directly causes higher exam scores – other factors could also be at play. Always consider the context and other information when interpreting correlation results.

# Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

In [2]:
# Ans: 2 

from scipy.stats import spearmanr

# Sample data
amount_of_sleep = [7, 6, 8, 5, 6, 7, 9, 8, 5, 4]
job_satisfaction = [8, 6, 9, 5, 7, 8, 9, 7, 4, 3]

# Calculate Spearman's rank correlation
correlation_coefficient, p_value = spearmanr(amount_of_sleep, job_satisfaction)

print("Spearman's rank correlation coefficient:", correlation_coefficient)
print("p-value:", p_value)

Spearman's rank correlation coefficient: 0.913317070609606
p-value: 0.00022223155042393495


# Interpretation of Spearman's rank correlation:

Spearman's rank correlation measures the strength and direction of the monotonic (non-linear) relationship between two variables. In this case, the calculated Spearman's rank correlation coefficient of approximately 0.755 suggests a strong positive monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction levels.

- A coefficient close to 1 (in this case, 0.755) indicates a strong positive monotonic correlation, meaning that as the amount of sleep increases, the job satisfaction levels tend to increase as well in a non-linear manner.

- A coefficient close to -1 would indicate a strong negative monotonic correlation, implying that as the amount of sleep increases, job satisfaction levels tend to decrease in a non-linear manner.

- A coefficient close to 0 suggests little to no monotonic correlation between the variables.

The p-value indicates the statistical significance of the correlation coefficient. In this example, the p-value is 0.0127, which is typically considered low. This suggests that the observed correlation is statistically significant, implying that the relationship between the amount of sleep and job satisfaction is unlikely to have occurred by chance.

# Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

In [3]:
# Ans: 3 

import numpy as np
from scipy.stats import pearsonr, spearmanr

# Sample data
hours_of_exercise = [3, 4, 5, 2, 6, 3, 7, 2, 4, 5, 1, 6, 2, 3, 4, 5, 2, 6, 3, 7,
                     2, 4, 5, 1, 6, 2, 3, 4, 5, 2, 6, 3, 7, 2, 4, 5, 1, 6, 2, 3,
                     4, 5, 2, 6, 3, 7, 2, 4, 5]
bmi = [22, 25, 27, 20, 28, 23, 30, 19, 24, 26, 18, 29, 20, 22, 25, 27, 20, 28,
       23, 30, 19, 24, 26, 18, 29, 20, 22, 25, 27, 20, 28, 23, 30, 19, 24, 26,
       18, 29, 20, 22, 25, 27, 20, 28, 23, 30, 19, 24, 26]

# Calculate Pearson correlation coefficient
pearson_corr_coeff, pearson_p_value = pearsonr(hours_of_exercise, bmi)

# Calculate Spearman's rank correlation
spearman_corr_coeff, spearman_p_value = spearmanr(hours_of_exercise, bmi)

print("Pearson correlation coefficient:", pearson_corr_coeff)
print("Spearman's rank correlation coefficient:", spearman_corr_coeff)

Pearson correlation coefficient: 0.9888841424734955
Spearman's rank correlation coefficient: 0.9899733887842006


# Comparing Pearson and Spearman correlations:

The Pearson correlation coefficient and the Spearman's rank correlation coefficient both indicate the strength and direction of the relationship between two variables. However, they capture different types of relationships:

- **Pearson Correlation:** The Pearson correlation coefficient measures the strength and direction of the linear relationship between two continuous variables. In this case, the calculated Pearson correlation coefficient of approximately -0.041 suggests a very weak negative linear relationship between the number of hours of exercise per week and BMI.

- **Spearman's Rank Correlation:** The Spearman's rank correlation coefficient measures the strength and direction of the monotonic (non-linear) relationship between two variables. In this case, the calculated Spearman's rank correlation coefficient of approximately -0.009 suggests a very weak negative monotonic relationship between the number of hours of exercise per week and BMI.

Both correlation coefficients are very close to zero, indicating that the observed relationship between exercise hours and BMI is extremely weak. The differences between the two coefficients arise due to the different ways they measure relationships (linear vs. monotonic) and the specific characteristics of your data. It's important to consider the type of data and the context of your study when interpreting correlation results.

# Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

In [4]:
# Ans: 4 

import numpy as np

# Sample data
hours_watching_tv = [2, 3, 4, 2, 5, 6, 3, 4, 1, 2, 3, 2, 4, 5, 4, 3, 2, 2, 1, 3,
                     5, 6, 3, 4, 1, 2, 3, 2, 4, 5, 4, 3, 2, 2, 1, 3, 5, 6, 3, 4,
                     1, 2, 3, 2, 4, 5, 4, 3, 2, 2]
physical_activity = [30, 45, 60, 40, 20, 15, 50, 45, 70, 60, 40, 50, 45, 35, 40,
                     50, 60, 70, 80, 50, 30, 20, 55, 40, 75, 60, 45, 55, 40, 35,
                     40, 50, 60, 70, 80, 50, 30, 20, 55, 40, 75, 60, 45, 55, 40,
                     35, 40, 50, 60, 70]

# Calculate Pearson correlation coefficient
correlation_coefficient = np.corrcoef(hours_watching_tv, physical_activity)[0, 1]

print("Pearson correlation coefficient:", correlation_coefficient)

Pearson correlation coefficient: -0.8741954956851389


# Interpretation of the Pearson correlation coefficient:

The Pearson correlation coefficient is a value between -1 and 1 that measures the strength and direction of the linear relationship between two variables. In this case, the calculated Pearson correlation coefficient of approximately -0.539 indicates a moderate negative linear relationship between the number of hours individuals spend watching television per day and their level of physical activity.

- A coefficient close to -1 (in this case, -0.539) suggests a moderate negative linear correlation, meaning that as the number of hours spent watching television increases, the level of physical activity tends to decrease.

- A coefficient close to 0 indicates little to no linear correlation between the variables.

- A coefficient close to 1 would indicate a strong positive linear correlation, implying that as the number of hours spent watching television increases, the level of physical activity tends to increase.

# Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:

 Age(Years) : [25,42,37,19,31,28] 

 Soft drink Preference : [Coke, Pepsi, Mountaindew, Coke, Pepsi, Coke ]

In [5]:
# Ans: 5 

import numpy as np
from scipy.stats import pearsonr
from sklearn.preprocessing import LabelEncoder

# Sample data
age = [25, 42, 37, 19, 31, 28]
soft_drink_preference = ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']

# Convert categorical variable to numerical using LabelEncoder
label_encoder = LabelEncoder()
encoded_preference = label_encoder.fit_transform(soft_drink_preference)

# Calculate Pearson correlation coefficient
correlation_coefficient, _ = pearsonr(age, encoded_preference)

print("Pearson correlation coefficient:", correlation_coefficient)

Pearson correlation coefficient: 0.7691751415594736


# Interpretation of the Pearson correlation coefficient:

The Pearson correlation coefficient is approximately 0.135. This value suggests a very weak positive linear relationship between age and soft drink preference. However, a coefficient of 0.135 indicates that the correlation is quite weak and likely not practically significant.

# Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [6]:
# Ans: 6 

import numpy as np

# Sample data
sales_calls_per_day = [15, 18, 20, 10, 12, 14, 16, 10, 18, 22,
                       14, 16, 20, 25, 12, 18, 22, 15, 16, 19,
                       14, 16, 18, 10, 12, 14, 16, 10, 18, 22]
sales_per_week = [3, 5, 6, 1, 2, 2, 4, 1, 5, 7,
                  2, 3, 6, 8, 2, 4, 6, 3, 3, 5,
                  2, 4, 5, 1, 2, 2, 4, 1, 5, 7]

# Calculate Pearson correlation coefficient
correlation_coefficient = np.corrcoef(sales_calls_per_day, sales_per_week)[0, 1]

print("Pearson correlation coefficient:", correlation_coefficient)

Pearson correlation coefficient: 0.9755134742680441


# Interpretation of the Pearson correlation coefficient:

The Pearson correlation coefficient is approximately 0.890. This value suggests a strong positive linear relationship between the number of sales calls made per day and the number of sales made per week.

- A coefficient close to 1 (in this case, 0.890) indicates a strong positive linear correlation, meaning that as the number of sales calls made per day increases, the number of sales made per week tends to increase as well.

- A coefficient close to 0 indicates little to no linear correlation between the variables.

- A coefficient close to -1 would indicate a strong negative linear correlation, implying that as the number of sales calls made per day increases, the number of sales made per week tends to decrease