### Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

o calculate the Pearson correlation coefficient between two variables, we need to compute the covariance and the standard deviations of the variables. Let's say we have n pairs of data points for the amount of time spent studying (x) and the corresponding exam score (y). The formula for the Pearson correlation coefficient r is:

r = cov(x,y) / (std(x) * std(y))

where cov(x,y) is the covariance between x and y, and std(x) and std(y) are the standard deviations of x and y, respectively.

The Pearson correlation coefficient r ranges from -1 to 1, with 0 indicating no linear relationship between the variables, and values close to -1 or 1 indicating a strong linear relationship. A value of -1 indicates a perfect negative linear relationship, while a value of 1 indicates a perfect positive linear relationship.

To interpret the result, let's say we calculated a Pearson correlation coefficient of 0.7 for the relationship between the amount of time spent studying and the exam score. This indicates a strong positive linear relationship between the two variables, which means that students who spend more time studying tend to score higher on the exam. On the other hand, if we had calculated a Pearson correlation coefficient of -0.3, this would indicate a weak negative linear relationship, which means that students who spend more time studying tend to score lower on the exam. However, it is important to note that correlation does not necessarily imply causation, and other factors may be at play that affect the relationship between the two variables.






### Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

To calculate Spearman's rank correlation, we need to first assign ranks to the data points for each variable. Let's say we have n pairs of data points for the amount of sleep (x) and job satisfaction level (y). We can then calculate the rank correlation coefficient rs using the following formula:

rs = 1 - (6 * sum(d^2) / (n * (n^2 - 1)))

where d is the difference between the ranks of each pair of data points, and n is the number of pairs of data points.

The Spearman's rank correlation coefficient rs ranges from -1 to 1, with 0 indicating no monotonic relationship between the variables, and values close to -1 or 1 indicating a strong monotonic relationship. A value of -1 indicates a perfect negative monotonic relationship, while a value of 1 indicates a perfect positive monotonic relationship.

To interpret the result, let's say we calculated a Spearman's rank correlation coefficient of 0.6 for the relationship between the amount of sleep individuals get and their overall job satisfaction level. This indicates a strong positive monotonic relationship between the two variables, which means that individuals who get more sleep tend to have higher job satisfaction levels. On the other hand, if we had calculated a Spearman's rank correlation coefficient of -0.2, this would indicate a weak negative monotonic relationship, which means that individuals who get more sleep tend to have lower job satisfaction levels. However, as with the Pearson correlation coefficient, it is important to note that correlation does not necessarily imply causation, and other factors may be at play that affect the relationship between the two variables.

### Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

In [1]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

# Generate random data for hours of exercise and BMI
np.random.seed(123)  # set random seed for reproducibility
hours_exercise = np.random.normal(5, 2, 50)  # mean of 5 hours/week, std dev of 2 hours/week
bmi = np.random.normal(25, 4, 50)  # mean of 25, std dev of 4

# Pearson correlation coefficient
pearson_corr, _ = pearsonr(hours_exercise, bmi)
print("Pearson correlation coefficient: ", pearson_corr)

# Spearman's rank correlation coefficient
spearman_corr, _ = spearmanr(hours_exercise, bmi)
print("Spearman's rank correlation coefficient: ", spearman_corr)


Pearson correlation coefficient:  -0.08524059160268607
Spearman's rank correlation coefficient:  -0.08321728691476589


### Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

In [2]:
import numpy as np
from scipy.stats import pearsonr

# Generate random data for hours of TV watched and level of physical activity
np.random.seed(123)  # set random seed for reproducibility
hours_tv = np.random.normal(3, 1, 50)  # mean of 3 hours/day, std dev of 1 hour/day
physical_activity = np.random.normal(5, 2, 50)  # mean of 5, std dev of 2

# Calculate the Pearson correlation coefficient
pearson_corr, _ = pearsonr(hours_tv, physical_activity)

# Print the result
print("Pearson correlation coefficient: ", pearson_corr)


Pearson correlation coefficient:  -0.08524059160268603


### Q5. A survey was conducted to examine the relationship between age and preference for a particular  brand of soft drink. The survey results are shown below:

    Age(Years)   Soft drink Preference
        25           Coke
        42           Pepsi
        37          Mountain dew
        19           Coke
        31           Pepsi
        28           Coke




In [3]:
from scipy.stats import pearsonr, spearmanr

# Define the age and preference data
age = [25, 42, 37, 19, 31, 28]
preference = [1, -1, 0, 1, -1, 1]  # encode preference as 1 for Coke, -1 for Pepsi, and 0 for Mountain Dew

pearson_corr, _ = pearsonr(age, preference)
print("Pearson correlation coefficient: ", pearson_corr)

# Spearman's rank correlation coefficient
spearman_corr, _ = spearmanr(age,preference)
print("Spearman's rank correlation coefficient: ", spearman_corr)


Pearson correlation coefficient:  -0.7691751415594735
Spearman's rank correlation coefficient:  -0.8332380897952965


### Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [4]:
import numpy as np
from scipy.stats import pearsonr

# Generate random data for sales calls and sales made
np.random.seed(123)  # set random seed for reproducibility
sales_calls = np.random.randint(20, 50, 30)  # random number of calls between 20 and 50 per day
sales_made = np.random.randint(5, 20, 30)  # random number of sales between 5 and 20 per week

# Calculate the Pearson correlation coefficient
pearson_corr, _ = pearsonr(sales_calls, sales_made)

# Print the result
print("Pearson correlation coefficient: ", pearson_corr)


Pearson correlation coefficient:  0.19184115116806164
