# Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.
To calculate the Pearson correlation coefficient between the amount of time students spend studying for an exam and their final exam scores, we would need to have paired data for each student with their study time and exam score.

Assuming we have this data, we can use the following formula to calculate the Pearson correlation coefficient:

r = (n * Σxy - Σx * Σy) / sqrt([n * Σx^2 - (Σx)^2][n * Σy^2 - (Σy)^2])

where n is the number of pairs of data, Σxy is the sum of the products of each paired data point, Σx is the sum of the study time data, Σy is the sum of the exam score data, Σx^2 is the sum of the squared study time data, and Σy^2 is the sum of the squared exam score data.

Interpreting the result of the Pearson correlation coefficient calculation depends on the value of r. The value of r ranges from -1 to +1, where -1 indicates a perfect negative linear relationship between the two variables, +1 indicates a perfect positive linear relationship, and 0 indicates no linear relationship.

If the calculated r is close to +1, it means that there is a strong positive correlation between study time and exam scores, indicating that as study time increases, exam scores also increase. Conversely, if r is close to -1, it means that there is a strong negative correlation between study time and exam scores, indicating that as study time increases, exam scores decrease. If r is close to 0, it means that there is no linear relationship between study time and exam scores, indicating that the two variables are not related.


# Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.
To calculate Spearman's rank correlation coefficient, we need to first rank both variables, then calculate the difference between the ranks for each data point, and finally calculate the correlation coefficient using the formula:

ρ = 1 - 6Σd^2 / (n(n^2-1))

where ρ is the Spearman's rank correlation coefficient, d is the difference between ranks for each data point, and n is the number of data points.


In [2]:
import pandas as pd
from scipy.stats import spearmanr

# Create a sample dataset
data = {
    'sleep': [6, 7, 8, 5, 9, 6, 7, 8, 4, 5],
    'job_satisfaction': [7, 8, 9, 5, 10, 6, 8, 9, 4, 5]
}

df = pd.DataFrame(data)

# Calculate the Spearman's rank correlation coefficient
corr, p_value = spearmanr(df['sleep'], df['job_satisfaction'])

print(f"Spearman's rank correlation coefficient: {corr:.2f}")
print(f"P-value: {p_value:.4f}")


Spearman's rank correlation coefficient: 1.00
P-value: 0.0000


# Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.


In [11]:
import pandas as pd
import scipy.stats as stats
import numpy as np

# Set seed for reproducibility
np.random.seed(42)

# Generate 50 random floats between 4 and 10
rand_nums = np.random.uniform(4, 10, size=50)

print(rand_nums)

# Create a DataFrame with exercise and BMI data for 50 participants
data = pd.DataFrame({
    'Exercise': list(np.random.randint(1,8,50)),
    'BMI': list(np.random.uniform(21,29,50))
})

# Calculate the Pearson correlation coefficient
pearson_corr = data['Exercise'].corr(data['BMI'], method='pearson')

# Calculate the Spearman's rank correlation
spearman_corr = data['Exercise'].corr(data['BMI'], method='spearman')

# Print the results
print(f"Pearson correlation coefficient: {pearson_corr}")
print(f"Spearman's rank correlation: {spearman_corr}")


[6.24724071 9.70428584 8.39196365 7.59195091 4.93611184 4.93596712
 4.34850167 9.19705687 7.60669007 8.24843547 4.12350697 9.81945911
 8.99465584 5.27403466 5.0909498  5.10042706 5.82545346 7.14853859
 6.59167011 5.74737484 7.67111737 4.83696316 5.75286789 6.19817106
 6.73641991 8.71105577 5.19804269 7.08540663 7.55448741 4.27870248
 7.64526911 5.02314474 4.39030956 9.69331322 9.7937922  8.85038409
 5.82768262 4.58603268 8.10539816 6.64091496 4.73222941 6.97106146
 4.20633113 9.45592241 5.55267989 7.97513371 5.87026646 7.12040813
 7.28026168 5.10912673]
Pearson correlation coefficient: 0.14710261836805238
Spearman's rank correlation: 0.1417742278165337


# Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.


In [12]:
import numpy as np
from scipy.stats import pearsonr

# generate random data for the number of hours of TV and physical activity
hours_of_tv = np.random.uniform(0, 6, size=50)
physical_activity = np.random.uniform(0, 10, size=50)

# calculate the Pearson correlation coefficient
corr_coeff, p_value = pearsonr(hours_of_tv, physical_activity)he survey results are shown below:¶

print("Pearson correlation coefficient: {:.3f}".format(corr_coeff))


Pearson correlation coefficient: 0.001


# Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:
![image.png](attachment:image.png)



# Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [13]:
import numpy as np
import pandas as pd

# create the dataframe
df = pd.DataFrame({'age': [25, 42, 37, 19, 31, 28],
                   'soft_drink': ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']})

# label encode the soft drink column
df['soft_drink'] = pd.factorize(df['soft_drink'])[0] + 1

# calculate the Pearson correlation coefficient
corr = np.corrcoef(df['age'], df['soft_drink'])[0, 1]

print('Pearson correlation coefficient:', corr)


Pearson correlation coefficient: 0.7587035441865058
