### Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

To calculate the Pearson correlation coefficient between the amount of time students spend studying for an exam and their final exam scores, you can use the following formula:

**Pearson Correlation Coefficient (r) = Σ [(X - X̄)(Y - Ȳ)] / [√Σ(X - X̄)² * Σ(Y - Ȳ)²]**

Where:
- X and Y are data points (study time and exam scores).
- X̄ and Ȳ are the means of X and Y, respectively.

Interpretation of the result:
- If the Pearson correlation coefficient (r) is close to 1, it indicates a strong positive linear relationship. In this context, it would mean that students who spend more time studying tend to have higher exam scores.
- If r is close to -1, it indicates a strong negative linear relationship, suggesting that students who spend more time studying tend to have lower exam scores.
- If r is close to 0, it means there is little to no linear relationship between study time and exam scores. In other words, the two variables are not strongly related.

The Pearson correlation coefficient provides a quantitative measure of the strength and direction of the linear relationship between these variables. The result will help you understand whether there is a significant correlation between study time and exam scores in your dataset.

### Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

To calculate Spearman's rank correlation between the amount of sleep individuals get each night and their overall job satisfaction level, follow these steps:

1. Rank the data: Assign ranks to both sets of data separately, from the lowest to the highest, without considering the original values.

2. Calculate the differences between the ranks of the corresponding data points for both variables.

3. Square these differences and sum them to get the sum of squared differences (d²).

4. Use the formula for Spearman's rank correlation:

   **ρ (rho) = 1 - [(6 * Σd²) / (n * (n² - 1))]**

   Where:
   - n is the number of data points.
   - Σd² is the sum of squared rank differences.

Interpretation of the result:
- Spearman's rank correlation (ρ) ranges from -1 to 1.
- A positive ρ close to 1 indicates a strong positive monotonic relationship, implying that individuals who get more sleep tend to report higher job satisfaction.
- A negative ρ close to -1 suggests a strong negative monotonic relationship, indicating that individuals who get more sleep tend to report lower job satisfaction.
- A ρ close to 0 means there's little to no monotonic relationship between the two variables, suggesting that job satisfaction is not significantly related to the amount of sleep.

Spearman's rank correlation is particularly useful when the relationship between variables is monotonic but not necessarily linear. It quantifies the strength and direction of this relationship based on the ranks of the data.

### Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import pearsonr, spearmanr

# Create a sample dataset
hours_of_exercise = np.random.randint(1, 10, 50)
bmi = np.random.randint(18, 35, 50)

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(hours_of_exercise, bmi)

# Calculate Spearman's rank correlation
spearman_corr, _ = spearmanr(hours_of_exercise, bmi)

print(f"Pearson correlation coefficient: {pearson_corr:.2f}")
print(f"Spearman's rank correlation: {spearman_corr:.2f}")

Pearson correlation coefficient: -0.04
Spearman's rank correlation: -0.04


##### The output of this code will be two correlation coefficients: the Pearson correlation coefficient and the Spearman’s rank correlation. The Pearson correlation coefficient measures the linear relationship between two variables and ranges from -1 to 1. A value of -1 indicates a perfect negative linear relationship, a value of 0 indicates no linear relationship, and a value of 1 indicates a perfect positive linear relationship. The Spearman’s rank correlation measures the monotonic relationship between two variables and ranges from -1 to 1. A value of -1 indicates a perfect negative monotonic relationship, a value of 0 indicates no monotonic relationship, and a value of 1 indicates a perfect positive monotonic relationship.

### Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

In [2]:
import pandas as pd
import numpy as np
from scipy.stats import pearsonr

# Create a sample dataset
hours_of_tv = np.random.randint(1, 10, 50)
physical_activity = np.random.randint(1, 10, 50)

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(hours_of_tv, physical_activity)

print(f"Pearson correlation coefficient: {pearson_corr:.2f}")

Pearson correlation coefficient: 0.12


- The output of this code will be the Pearson correlation coefficient. The Pearson correlation coefficient measures the linear relationship between two variables and ranges from -1 to 1. A value of -1 indicates a perfect negative linear relationship, a value of 0 indicates no linear relationship, and a value of 1 indicates a perfect positive linear relationship.

- If we apply this code to our sample dataset, we might get an output like this:

- Pearson correlation coefficient: -0.05

- This suggests that there is little to no relationship between the number of hours individuals spend watching television per day and their level of physical activity in our sample of 50 participants.

### Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:

In [3]:
data={
    'age':[25,42,37,19,31,28],
    'soft drink prefernce':['coke','pepsi','mountain dew','coke','pepsi','coke']
}

In [4]:
df=pd.DataFrame(data)

In [5]:
df

Unnamed: 0,age,soft drink prefernce
0,25,coke
1,42,pepsi
2,37,mountain dew
3,19,coke
4,31,pepsi
5,28,coke


In [6]:
df.corr()

  df.corr()


Unnamed: 0,age
age,1.0


### Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [7]:
import pandas as pd
import numpy as np
from scipy.stats import pearsonr

# Create a sample dataset
sales_calls_per_day = np.random.randint(1, 10, 30)
sales_per_week = np.random.randint(1, 10, 30)

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(sales_calls_per_day, sales_per_week)

print(f"Pearson correlation coefficient: {pearson_corr:.2f}")

Pearson correlation coefficient: 0.03


The output of this code will be the Pearson correlation coefficient. The Pearson correlation coefficient measures the linear relationship between two variables and ranges from -1 to 1. A value of -1 indicates a perfect negative linear relationship, a value of 0 indicates no linear relationship, and a value of 1 indicates a perfect positive linear relationship.

If we apply this code to our sample dataset, we might get an output like this:

Pearson correlation coefficient: -0.09

This suggests that there is little to no relationship between the number of sales calls made per day and the number of sales made per week in our sample of 30 sales representatives.