Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
you have collected data on the amount of time students spend studying for an exam and their final exam
scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

Ans. The Pearson correlation coefficient, denoted by \(r\), is a measure of the linear relationship between two variables. It ranges from -1 to 1, where:

- \(r = 1\) indicates a perfect positive linear relationship,
- \(r = -1\) indicates a perfect negative linear relationship,
- \(r = 0\) indicates no linear relationship.


**Example:**

Suppose you collected data on the number of hours students spent studying (\(X\)) and their final exam scores (\(Y\)) for a sample of students:

```plaintext
| Hours Studied (X) | Exam Scores (Y) |
|-------------------|------------------|
| 10                | 75               |
| 15                | 85               |
| 20                | 90               |
| 25                | 95               |
| 30                | 80               |
```









In [1]:
import numpy as np

# Sample Data
X = np.array([10, 15, 20, 25, 30])
Y = np.array([75, 85, 90, 95, 80])

# Calculate means
mean_X = np.mean(X)
mean_Y = np.mean(Y)

# Calculate Pearson correlation coefficient
numerator = np.sum((X - mean_X) * (Y - mean_Y))
denominator = np.sqrt(np.sum((X - mean_X)**2) * np.sum((Y - mean_Y)**2))
r = numerator / denominator

print(f"Pearson Correlation Coefficient (r): {r}")


Pearson Correlation Coefficient (r): 0.4


**Interpretation:**

In this example, if the calculated \(r\) is positive, it suggests a positive linear relationship, meaning that as the number of hours spent studying increases, the final exam scores tend to increase. If \(r\) is negative, it suggests a negative linear relationship. If \(r\) is close to zero, it indicates a weak or no linear relationship. The sign and magnitude of \(r\) provide insights into the direction and strength of the linear relationship between the variables.

Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
Suppose you have collected data on the amount of sleep individuals get each night and their overall job
satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
variables and interpret the result.

Ans. Spearman's rank correlation coefficient, denoted by \( \rho \) (rho), is a measure of the monotonic relationship between two variables. It assesses whether there is a consistent increase or decrease in one variable corresponding to an increase or decrease in the other variable. Spearman's rank correlation is based on the ranks of the data rather than the actual values.


**Example:**

Suppose you collected data on the amount of sleep individuals get each night (\(X\)) and their overall job satisfaction level on a scale of 1 to 10 (\(Y\)) for a sample of individuals:

```plaintext
| Sleep Hours (X) | Job Satisfaction (Y) |
|------------------|-----------------------|
| 7                | 8                     |
| 6                | 5                     |
| 8                | 9                     |
| 5                | 3                     |
| 7                | 7                     |




In [7]:
import numpy as np
from scipy.stats import spearmanr

# Sample Data
X = np.array([7, 6, 8, 5, 7])
Y = np.array([8, 5, 9, 3, 7])

# Calculate Spearman's rank correlation coefficient
rho, p_value = spearmanr(X, Y)

print(f"Spearman's Rank Correlation Coefficient (rho): {rho}")


Spearman's Rank Correlation Coefficient (rho): 0.9746794344808963


Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
between these two variables and compare the results.

Ans. To calculate the Pearson correlation coefficient (\(r\)) and Spearman's rank correlation coefficient (\(\rho\)), you can follow these steps:

**Sample Data:**
Let's assume you have the following data for the number of hours of exercise per week (\(X\)) and the body mass index (BMI) (\(Y\)) for 50 participants:

```plaintext
| Hours of Exercise (X) | BMI (Y)  |
|-----------------------|----------|
| 3                     | 24       |
| 5                     | 22       |
| 1                     | 26       |
| 6                     | 21       |
| 4                     | 23       |
| ...                   | ...      |




In [3]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

# Sample Data
X = np.random.randint(1, 10, size=50)  # Replace with actual data
Y = np.random.uniform(20, 30, size=50)  # Replace with actual data

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(X, Y)

# Calculate Spearman's rank correlation coefficient
spearman_corr, _ = spearmanr(X, Y)

print(f"Pearson Correlation Coefficient (r): {pearson_corr}")
print(f"Spearman's Rank Correlation Coefficient (rho): {spearman_corr}")


Pearson Correlation Coefficient (r): 0.12138602559977776
Spearman's Rank Correlation Coefficient (rho): 0.1360557912497069


Q4. A researcher is interested in examining the relationship between the number of hours individuals
spend watching television per day and their level of physical activity. The researcher collected data on
both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
these two variables.

Ans. To calculate the Pearson correlation coefficient (\( r \)) between the number of hours individuals spend watching television per day and their level of physical activity, you can use the following steps:

**Sample Data:**
Assuming you have collected data on the number of hours of television watching (\( X \)) and the level of physical activity (\( Y \)) for 50 participants:

```plaintext
| Hours of TV (X) | Physical Activity (Y) |
|------------------|------------------------|
| 2                | 5                      |
| 1                | 7                      |
| 3                | 4                      |
| 4                | 3                      |
| 2                | 6                      |
| ...              | ...                    |
```



In [4]:
import numpy as np
from scipy.stats import pearsonr

# Sample Data
X = np.random.randint(1, 5, size=50)  # Replace with actual data (hours of TV watching)
Y = np.random.randint(3, 8, size=50)  # Replace with actual data (physical activity level)

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(X, Y)

print(f"Pearson Correlation Coefficient (r): {pearson_corr}")


Pearson Correlation Coefficient (r): -0.2013310375591204



Replace the placeholder data with your actual data for a meaningful analysis. The output will give you the Pearson correlation coefficient (\( r \)) that measures the strength and direction of the linear relationship between the number of hours individuals spend watching television and their level of physical activity.

Interpretation:
- If \( r > 0 \): Positive correlation (as one variable increases, the other tends to increase).
- If \( r < 0 \): Negative correlation (as one variable increases, the other tends to decrease).
- If \( r = 0 \): No linear correlation.


Q6. A company is interested in examining the relationship between the number of sales calls made per day
and the number of sales made per week. The company collected data on both variables from a sample of
30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

Ans. To calculate the Pearson correlation coefficient (\( r \)) between the number of sales calls made per day and the number of sales made per week for a sample of 30 sales representatives, you can follow these steps:

**Sample Data:**
Assuming you have collected data on the number of sales calls made per day (\( X \)) and the number of sales made per week (\( Y \)):

```plaintext
| Sales Calls per Day (X) | Sales per Week (Y) |
|--------------------------|---------------------|
| 20                       | 100                 |
| 25                       | 120                 |
| 18                       | 90                  |
| 30                       | 150                 |
| ...                      | ...                 |





In [6]:
import numpy as np
from scipy.stats import pearsonr

# Sample Data
X = np.random.randint(15, 35, size=30)  # Replace with actual data (sales calls per day)
Y = np.random.randint(80, 160, size=30)  # Replace with actual data (sales per week)

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(X, Y)

print(f"Pearson Correlation Coefficient (r): {pearson_corr}")


Pearson Correlation Coefficient (r): -0.14080372671830896


Replace the placeholder data with your actual data for a meaningful analysis. The output will provide the Pearson correlation coefficient (\( r \)), indicating the strength and direction of the linear relationship between the number of sales calls made per day and the number of sales made per week.

Interpretation:
- If \( r > 0 \): Positive correlation (as the number of sales calls increases, the number of sales tends to increase).
- If \( r < 0 \): Negative correlation (as the number of sales calls increases, the number of sales tends to decrease).
- If \( r = 0 \): No linear correlation.