Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
you have collected data on the amount of time students spend studying for an exam and their final exam
scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

The Pearson correlation coefficient (also known as Pearson's r) measures the strength and direction of the linear relationship between two continuous variables. It quantifies how well the data points align along a linear trend. The Pearson correlation coefficient can range from -1 to 1, where:

** 1 indicates a perfect positive linear relationship (as one variable increases, the other increases proportionally).

** -1 indicates a perfect negative linear relationship (as one variable increases, the other decreases proportionally).

** 0 indicates no linear relationship between the variables.
In your scenario, the amount of time students spend studying for an exam and their final exam scores are the two continuous variables. Calculating the Pearson correlation coefficient between these variables will help you understand the strength and direction of their linear relationship.

In [1]:
import numpy as np
import pandas as pd

# Sample data: Amount of time studied and exam scores
time_studied = np.array([10, 8, 5, 12, 6, 9, 7, 11])
exam_scores = np.array([85, 70, 60, 90, 65, 80, 75, 88])

# Create a DataFrame
df = pd.DataFrame({'Time_Studied': time_studied, 'Exam_Scores': exam_scores})

# Calculate Pearson correlation coefficient
correlation_coefficient = df['Time_Studied'].corr(df['Exam_Scores'])

print("Pearson Correlation Coefficient:", correlation_coefficient)


Pearson Correlation Coefficient: 0.9671020604154756


Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
Suppose you have collected data on the amount of sleep individuals get each night and their overall job
satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
variables and interpret the result.

Spearman's rank correlation coefficient (often denoted as ρ or rs) is a non-parametric measure of the monotonic relationship between two variables. It assesses whether there is a consistent increase or decrease in the values of one variable as the values of another variable increase. It is suitable for both linear and non-linear relationships.

In your scenario, you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Spearman's rank correlation can help you understand the strength and direction of the monotonic relationship between these two variables.

In [2]:
import pandas as pd

# Sample data: Amount of sleep and job satisfaction levels
amount_of_sleep = [7, 6, 5, 8, 6, 7, 5, 9]
job_satisfaction = [8, 5, 3, 9, 4, 7, 6, 10]

# Create a DataFrame
df = pd.DataFrame({'Amount_of_Sleep': amount_of_sleep, 'Job_Satisfaction': job_satisfaction})

# Calculate Spearman's rank correlation coefficient
spearman_correlation = df['Amount_of_Sleep'].corr(df['Job_Satisfaction'], method='spearman')

print("Spearman's Rank Correlation Coefficient:", spearman_correlation)


Spearman's Rank Correlation Coefficient: 0.8849947770681914


Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
between these two variables and compare the results.

In [3]:
import numpy as np
import pandas as pd
from scipy.stats import spearmanr

# Sample data: Number of hours of exercise and BMI
hours_of_exercise = np.array([2, 4, 3, 5, 2, 1, 6, 3, 4, 2, 3, 1, 2, 5, 4, 3, 6, 1, 2, 5,
                              3, 4, 2, 1, 4, 6, 3, 2, 5, 4, 1, 3, 6, 2, 5, 4, 1, 3, 2, 6,
                              4, 5, 1, 3, 2, 4, 5, 6, 1, 2])
bmi = np.array([23.5, 27.8, 25.3, 30.2, 22.1, 21.0, 32.7, 24.6, 26.4, 21.9, 27.1, 20.2,
                22.5, 29.8, 28.3, 25.9, 33.1, 19.8, 23.6, 30.5, 24.7, 26.2, 20.8, 18.9,
                27.3, 34.0, 24.8, 21.5, 28.7, 26.6, 18.6, 25.0, 32.2, 21.4, 29.5, 27.0,
                18.0, 25.8, 22.7, 33.5, 28.9, 31.3, 19.5, 26.9, 24.2, 27.6, 30.1, 34.3,
                17.4, 21.8])

# Create a DataFrame
df = pd.DataFrame({'Hours_of_Exercise': hours_of_exercise, 'BMI': bmi})

# Calculate Pearson correlation coefficient
pearson_correlation = df['Hours_of_Exercise'].corr(df['BMI'])

# Calculate Spearman's rank correlation coefficient
spearman_correlation = spearmanr(df['Hours_of_Exercise'], df['BMI']).correlation

print("Pearson Correlation Coefficient:", pearson_correlation)
print("Spearman's Rank Correlation Coefficient:", spearman_correlation)


Pearson Correlation Coefficient: 0.9758056570406863
Spearman's Rank Correlation Coefficient: 0.976784145996443


Q4. A researcher is interested in examining the relationship between the number of hours individuals
spend watching television per day and their level of physical activity. The researcher collected data on
both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
these two variables.

In [4]:
import numpy as np
import pandas as pd

# Sample data: Hours of TV watching and level of physical activity
tv_hours = np.array([2, 3, 4, 5, 2, 1, 6, 3, 4, 2, 3, 1, 2, 5, 4, 3, 6, 1, 2, 5,
                     3, 4, 2, 1, 4, 6, 3, 2, 5, 4, 1, 3, 6, 2, 5, 4, 1, 3, 2, 6,
                     4, 5, 1, 3, 2, 4, 5, 6, 1, 2])
physical_activity = np.array([2, 3, 4, 5, 2, 1, 6, 3, 4, 2, 3, 1, 2, 5, 4, 3, 6, 1, 2, 5,
                              3, 4, 2, 1, 4, 6, 3, 2, 5, 4, 1, 3, 6, 2, 5, 4, 1, 3, 2, 6,
                              4, 5, 1, 3, 2, 4, 5, 6, 1, 2])

# Create a DataFrame
df = pd.DataFrame({'TV_Hours': tv_hours, 'Physical_Activity': physical_activity})

# Calculate Pearson correlation coefficient
pearson_correlation = df['TV_Hours'].corr(df['Physical_Activity'])

print("Pearson Correlation Coefficient:", pearson_correlation)


Pearson Correlation Coefficient: 0.9999999999999999


Q5. A survey was conducted to examine the relationship between age and preference for a particular
brand of soft drink. The survey results are shown below:

Age(Years)       Soft drink Preference

25               Coke

42               Pepsi

37               Mountain dew

19               Coke

31               Pepsi

28               Coke


To analyze the relationship between age and preference for a particular brand of soft drink, you can calculate the point-biserial correlation coefficient. This coefficient measures the strength and direction of the linear relationship between a binary categorical variable (in this case, the soft drink preference) and a continuous variable (age). However, since the data provided doesn't have quantitative values for the soft drink preference, we can't directly calculate a correlation coefficient.

Here's what you can do with the given data:

Convert Categorical Preference to Quantitative Values:
Assign numerical values to the soft drink preferences (e.g., Coke=1, Pepsi=2, Mountain Dew=3) so that you can treat it as a quantitative variable.

Calculate the Point-Biserial Correlation Coefficient:
Once you've assigned numerical values to the soft drink preferences, you can calculate the point-biserial correlation coefficient between the age and preference variables.

In [5]:
import pandas as pd
from scipy.stats import pointbiserialr

# Sample data
data = {
    'Age': [25, 42, 37, 19, 31, 28],
    'Soft_Drink_Preference': ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']
}

df = pd.DataFrame(data)

# Convert soft drink preference to numerical values
preference_mapping = {'Coke': 1, 'Pepsi': 2, 'Mountain Dew': 3}
df['Preference_Num'] = df['Soft_Drink_Preference'].map(preference_mapping)

# Calculate point-biserial correlation coefficient
correlation_coefficient = pointbiserialr(df['Age'], df['Preference_Num']).correlation

print("Point-Biserial Correlation Coefficient:", correlation_coefficient)


Point-Biserial Correlation Coefficient: 0.7587035441865057


Q6. A company is interested in examining the relationship between the number of sales calls made per day
and the number of sales made per week. The company collected data on both variables from a sample of
30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [1]:
import numpy as np
import pandas as pd

# Sample data: Number of sales calls per day and number of sales per week
sales_calls_per_day = np.array([20, 25, 18, 22, 16, 19, 23, 21, 24, 17, 20, 15, 26, 18, 22,
                                19, 23, 21, 24, 17, 20, 15, 26, 18, 22, 19, 23, 21, 24,
                                17, 20])
sales_per_week = np.array([80, 90, 75, 85, 70, 78, 88, 82, 93, 72, 82, 65, 96, 74, 84,
                           77, 88, 83, 91, 70, 81, 60, 97, 76, 85, 78, 89, 81, 92,
                           73, 86])

# Create a DataFrame
df = pd.DataFrame({'Sales_Calls_Per_Day': sales_calls_per_day, 'Sales_Per_Week': sales_per_week})

# Calculate Pearson correlation coefficient
pearson_correlation = df['Sales_Calls_Per_Day'].corr(df['Sales_Per_Week'])

print("Pearson Correlation Coefficient:", pearson_correlation)


Pearson Correlation Coefficient: 0.9751934992307941
