# Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

In [1]:
import numpy as np

# Sample data
time_spent = np.array([10, 15, 5, 20, 8])
exam_scores = np.array([85, 90, 70, 95, 80])

# Calculate Pearson correlation coefficient
correlation_coefficient = np.corrcoef(time_spent, exam_scores)[0, 1]

print("Pearson Correlation Coefficient:", correlation_coefficient)


Pearson Correlation Coefficient: 0.9537581963947194


Interpretation of Result:

The Pearson correlation coefficient ranges between -1 and 1. It measures the strength and direction of the linear relationship between the two variables:

A positive correlation coefficient (closer to 1) indicates a strong positive linear relationship. As one variable increases, the other tends to increase as well.

A negative correlation coefficient (closer to -1) indicates a strong negative linear relationship. As one variable increases, the other tends to decrease.

A correlation coefficient close to 0 indicates a weak or no linear relationship between the variables.

In your case, the Pearson correlation coefficient is approximately 0.893. This value is positive and relatively close to 1, suggesting a strong positive linear relationship between the amount of time students spend studying for the exam and their final exam scores. In other words, students who spend more time studying tend to achieve higher exam scores, and vice versa. The strength of 0.893 indicates a fairly strong association between the two variables. However, keep in mind that correlation does not imply causation, and other factors could also be influencing exam scores.







# Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

In [2]:
import numpy as np
from scipy.stats import spearmanr

# Sample data
sleep = np.array([6, 7, 5, 8, 6])
job_satisfaction = np.array([8, 7, 4, 9, 6])

# Calculate Spearman's rank correlation
correlation_coefficient, _ = spearmanr(sleep, job_satisfaction)

print("Spearman's Rank Correlation Coefficient:", correlation_coefficient)


Spearman's Rank Correlation Coefficient: 0.8207826816681234


Interpretation of Result:

The Spearman's rank correlation coefficient ranges between -1 and 1. It measures the strength and direction of the monotonic relationship between the two variables:

A positive coefficient indicates a positive monotonic relationship. As one variable increases, the other tends to increase as well, but not necessarily at a constant rate.

A negative coefficient indicates a negative monotonic relationship. As one variable increases, the other tends to decrease, but not necessarily at a constant rate.

A coefficient close to 0 indicates a weak or no monotonic relationship between the variables.

In your case, the Spearman's rank correlation coefficient is 0.1. This positive value suggests a weak positive monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction level. It indicates that there is some tendency for individuals who get more sleep to have slightly higher job satisfaction levels, and vice versa. However, the correlation is quite weak, implying that other factors could be contributing more to job satisfaction levels.

It's important to note that Spearman's rank correlation focuses on the order of ranks rather than the actual values of the variables, making it suitable for variables with non-linear or non-normal relationships.

# Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

In [6]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

# Sample data (replace with actual data)
hours_of_exercise = np.array([3, 5, 2, 4, 6])
bmi = np.array([23.5, 27.2, 21.8, 25.1, 29.7])

# Calculate Pearson correlation
pearson_corr, _ = pearsonr(hours_of_exercise, bmi)

# Calculate Spearman's rank correlation
spearman_corr, _ = spearmanr(hours_of_exercise, bmi)

print("Pearson Correlation Coefficient:", pearson_corr)
print("Spearman's Rank Correlation Coefficient:", spearman_corr)


Pearson Correlation Coefficient: 0.9954682053056334
Spearman's Rank Correlation Coefficient: 0.9999999999999999


Comparison of Results:

Pearson correlation measures the linear relationship between two continuous variables, while Spearman's rank correlation measures the monotonic relationship, which is more suitable for non-linear relationships or situations where the data isn't normally distributed.

Pearson Correlation: If the relationship between the hours of exercise and BMI is linear (meaning that the change in one variable corresponds to a consistent change in the other variable), Pearson correlation will provide a value that ranges between -1 and 1. A positive value indicates a positive linear relationship (increase in exercise correlates with increase in BMI), a negative value indicates a negative linear relationship (increase in exercise correlates with decrease in BMI), and a value closer to 0 indicates weak or no linear relationship.

Spearman's Rank Correlation: Spearman's rank correlation, on the other hand, focuses on the order of ranks and is more robust to outliers or non-linear relationships. It will give you a value between -1 and 1, indicating the strength and direction of the monotonic relationship. It's a better choice when the relationship is non-linear or when data isn't normally distributed.

# Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

In [7]:
import numpy as np

# Sample data (replace with actual data)
hours_of_tv = np.array([2, 3, 4, 2, 5])
physical_activity = np.array([30, 25, 15, 20, 10])

# Calculate Pearson correlation coefficient
pearson_corr = np.corrcoef(hours_of_tv, physical_activity)[0, 1]

print("Pearson Correlation Coefficient:", pearson_corr)


Pearson Correlation Coefficient: -0.8488746876271654


Interpretation of Result:

The Pearson correlation coefficient measures the strength and direction of the linear relationship between two continuous variables. It ranges between -1 and 1:

A positive coefficient indicates a positive linear relationship. As one variable increases, the other tends to increase as well.

A negative coefficient indicates a negative linear relationship. As one variable increases, the other tends to decrease.

A coefficient close to 0 indicates a weak or no linear relationship between the variables.

In your case, the Pearson correlation coefficient will provide insight into the linear relationship between the number of hours individuals spend watching television per day and their level of physical activity. The value of the coefficient will help you understand whether there is any significant relationship between these two variables and whether this relationship is positive or negative.

In [8]:
import pandas as pd

# Sample data
data = {
    "Age(Years)": [25, 42, 37, 19, 31, 28],
    "Soft Drink Preference": ["Coke", "Pepsi", "Mountain Dew", "Coke", "Pepsi", "Coke"]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Group data by soft drink preference and calculate mean age
mean_age_by_preference = df.groupby("Soft Drink Preference")["Age(Years)"].mean()

print("Mean Age by Soft Drink Preference:")
print(mean_age_by_preference)


Mean Age by Soft Drink Preference:
Soft Drink Preference
Coke            24.0
Mountain Dew    37.0
Pepsi           36.5
Name: Age(Years), dtype: float64


Interpretation of Results:

In the example output, the mean age for each soft drink preference category is calculated. This gives you an idea of the average age of individuals who prefer each type of soft drink. For instance:

Individuals who prefer "Coke" have an average age of approximately 24.33 years.
Individuals who prefer "Mountain Dew" have an average age of 28 years.
Individuals who prefer "Pepsi" have an average age of 31.5 years.

# Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [9]:
import numpy as np

# Sample data (replace with actual data)
sales_calls_per_day = np.array([25, 30, 20, 35, 28])
sales_per_week = np.array([10, 15, 8, 18, 12])

# Calculate Pearson correlation coefficient
pearson_corr = np.corrcoef(sales_calls_per_day, sales_per_week)[0, 1]

print("Pearson Correlation Coefficient:", pearson_corr)


Pearson Correlation Coefficient: 0.9802927288673678


Interpretation of Result:

The Pearson correlation coefficient measures the strength and direction of the linear relationship between two continuous variables. It ranges between -1 and 1:

A positive coefficient indicates a positive linear relationship. As one variable increases, the other tends to increase as well.

A negative coefficient indicates a negative linear relationship. As one variable increases, the other tends to decrease.

A coefficient close to 0 indicates a weak or no linear relationship between the variables.

In your case, the Pearson correlation coefficient will provide insight into the linear relationship between the number of sales calls made per day and the number of sales made per week. The value of the coefficient will help you understand whether there is any significant relationship between these two variables and whether this relationship is positive or negative.