In [1]:
# Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
# you have collected data on the amount of time students spend studying for an exam and their final exam
# scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

# To calculate the Pearson correlation coefficient between the amount of time students spend studying for an exam and their final exam scores, and interpret the result, follow these steps:

# 1. **Collect Data:**
#    Assume you have collected data for a sample of students where you have pairs of values: the amount of time spent studying (in hours) and their corresponding final exam scores.

# 2. **Compute Pearson Correlation Coefficient:**
#    The Pearson correlation coefficient \( r \) is computed using the formula:

#    \[
#    r = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum (X_i - \bar{X})^2 \sum (Y_i - \bar{Y})^2}}
#    \]

#    Where:
#    - \( X_i \) and \( Y_i \) are the individual data points (time spent studying and exam scores, respectively).
#    - \( \bar{X} \) and \( \bar{Y} \) are the means of the time spent studying and exam scores, respectively.

# 3. **Interpret the Result:**
#    - If \( r = 1 \): There is a perfect positive linear relationship. As time spent studying increases, exam scores also increase proportionally.
#    - If \( r = -1 \): There is a perfect negative linear relationship. As time spent studying increases, exam scores decrease proportionally.
#    - If \( r = 0 \): There is no linear relationship between the variables.

# Let's assume we have the following data:

# - Time spent studying (in hours): [10, 5, 20, 15, 8]
# - Exam scores: [85, 65, 95, 80, 70]

# We can calculate the Pearson correlation coefficient in Python using NumPy:

# ```python
# import numpy as np

# # Example data
# time_spent_studying = np.array([10, 5, 20, 15, 8])
# exam_scores = np.array([85, 65, 95, 80, 70])

# # Calculate Pearson correlation coefficient
# r = np.corrcoef(time_spent_studying, exam_scores)[0, 1]

# print(f"Pearson correlation coefficient: {r}")
# ```

# ### Interpretation of Example Results:
# Assume the calculated Pearson correlation coefficient is \( r = 0.85 \).

# - \( r = 0.85 \) indicates a strong positive linear relationship between time spent studying and exam scores.
# - This suggests that as students spend more time studying, their exam scores tend to be higher.

# ### Conclusion:
# The Pearson correlation coefficient helps quantify the strength and direction of the linear relationship between two variables. It's a valuable metric for understanding how changes in one variable relate to changes in another, which is essential in various fields, including education, economics, and social sciences.

In [None]:
Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
Suppose you have collected data on the amount of sleep individuals get each night and their overall job
satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
variables and interpret the result.

To calculate Spearman's rank correlation coefficient between the amount of sleep individuals get each night and their overall job satisfaction level, and interpret the result, follow these steps:

1. **Collect Data:**
   Assume you have collected data for a sample of individuals where you have pairs of values: the amount of sleep (in hours) and their corresponding job satisfaction level (on a scale of 1 to 10).

2. **Assign Ranks:**
   - **Step 1:** Rank the data for each variable separately. Assign ranks based on the values of each variable, where the smallest value gets the rank 1, the next smallest gets rank 2, and so on.
   - **Step 2:** Calculate the difference in ranks for each pair of observations.

3. **Compute Spearman's Rank Correlation Coefficient:**
   Spearman's rank correlation coefficient \( \rho \) is calculated using the formula:

   \[
   \rho = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}
   \]

   Where:
   - \( d_i \) is the difference between the ranks of corresponding pairs.
   - \( n \) is the number of pairs of observations.

   Alternatively, you can use Python libraries like `scipy.stats` to compute Spearman's rank correlation coefficient directly.

4. **Interpret the Result:**
   - \( \rho = 1 \): There is a perfect monotonic relationship. As one variable increases, the other also increases proportionally.
   - \( \rho = -1 \): There is a perfect negative monotonic relationship. As one variable increases, the other decreases proportionally.
   - \( \rho = 0 \): There is no monotonic relationship between the variables.

Let's assume we have the following data:

- Amount of sleep (hours): [7, 6, 8, 5, 6]
- Job satisfaction (scale of 1-10): [8, 6, 9, 4, 7]

We can calculate Spearman's rank correlation coefficient in Python using `scipy.stats.spearmanr`:

```python
from scipy.stats import spearmanr

# Example data
amount_of_sleep = [7, 6, 8, 5, 6]
job_satisfaction = [8, 6, 9, 4, 7]

# Calculate Spearman's rank correlation coefficient
rho, p_value = spearmanr(amount_of_sleep, job_satisfaction)

print(f"Spearman's rank correlation coefficient: {rho}")
```

### Interpretation of Example Results:
Assume the calculated Spearman's rank correlation coefficient is \( \rho = 0.85 \).

- \( \rho = 0.85 \) indicates a strong positive monotonic relationship between amount of sleep and job satisfaction.
- This suggests that as individuals get more sleep, their job satisfaction tends to be higher.

### Conclusion:
Spearman's rank correlation coefficient is useful when the relationship between variables is not necessarily linear but can be monotonic (increasing or decreasing). It's particularly valuable when dealing with ordinal or ranked data, where exact measurements may not be available or meaningful.

In [None]:
# Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
# exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
# for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
# between these two variables and compare the results.

# To examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults, we will calculate both the Pearson correlation coefficient and the Spearman's rank correlation coefficient. Let's go through the steps and calculations for each:

# ### Step-by-Step Calculation:

# Assume we have collected the following hypothetical data for 50 participants:

# - Hours of exercise per week (X): [3, 5, 2, 4, 6, ..., 5]  (50 values)
# - Body Mass Index (BMI) (Y): [25, 27, 30, 28, 26, ..., 29]  (50 values)

# #### 1. Pearson Correlation Coefficient:

# Pearson correlation coefficient measures the linear relationship between two variables. Let's calculate it using Python and NumPy:

# ```python
# import numpy as np
# from scipy.stats import pearsonr

# # Example data (hypothetical)
# hours_of_exercise = np.array([3, 5, 2, 4, 6, 5, 7, 3, 4, 5, 2, 1, 4, 6, 3, 5, 2, 4, 6, 5, 7, 3, 4, 5, 2, 1, 4, 6, 3, 5, 2, 4, 6, 5, 7, 3, 4, 5, 2, 1, 4, 6, 3, 5, 2, 4, 6, 5, 7, 3])
# # bmi = np.array([25, 27, 30, 28, 26, 29, 24, 26, 28, 27, 29, 30, 25, 27, 30, 28, 26, 29, 24, 26, 28, 27, 29, 30, 25, 27, 30, 28, 

In [None]:
Q4. A researcher is interested in examining the relationship between the number of hours individuals
spend watching television per day and their level of physical activity. The researcher collected data on
both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
these two variables.

To calculate the Pearson correlation coefficient between the number of hours individuals spend watching television per day and their level of physical activity based on a sample of 50 participants, follow these steps:

### Step-by-Step Calculation:

Assume we have collected the following hypothetical data for 50 participants:

- Hours of television watching per day (X): [3, 5, 2, 4, 6, ..., 5]  (50 values)
- Level of physical activity (Y): [25, 27, 30, 28, 26, ..., 29]  (50 values)

#### 1. Calculate the Pearson Correlation Coefficient:

The Pearson correlation coefficient \( r \) is calculated using the following formula:

\[
r = \frac{n \sum XY - \sum X \sum Y}{\sqrt{[n \sum X^2 - (\sum X)^2][n \sum Y^2 - (\sum Y)^2]}}
\]

Where:
- \( n \) is the number of observations (in this case, 50).
- \( X \) and \( Y \) are the variables of interest (hours of television watching and level of physical activity, respectively).
- \( \sum \) denotes the sum of the values.

Let's calculate the Pearson correlation coefficient using Python and NumPy:

```python
import numpy as np
from scipy.stats import pearsonr

# Example data (hypothetical)
hours_of_tv = np.array([3, 5, 2, 4, 6, 5, 7, 3, 4, 5, 2, 1, 4, 6, 3, 5, 2, 4, 6, 5, 7, 3, 4, 5, 2, 1, 4, 6, 3, 5, 2, 4, 6, 5, 7, 3, 4, 5, 2, 1, 4, 6, 3, 5, 2, 4, 6, 5, 7, 3])
physical_activity = np.array([25, 27, 30, 28, 26, 29, 24, 26, 28, 27, 29, 30, 25, 27, 30, 28, 26, 29, 24, 26, 28, 27, 29, 30, 25, 27, 30, 28, 26, 29, 24, 26, 28, 27, 29, 30, 25, 27, 30, 28, 26, 29, 24, 26, 28, 27, 29, 30, 25, 27])

# Calculate Pearson correlation coefficient
r, _ = pearsonr(hours_of_tv, physical_activity)

print(f"Pearson correlation coefficient: {r}")
```

### Interpretation of Results:

- The Pearson correlation coefficient \( r \) ranges from -1 to 1:
  - \( r = 1 \): Perfect positive correlation (as hours of television watching increase, physical activity level increases).
  - \( r = -1 \): Perfect negative correlation (as hours of television watching increase, physical activity level decreases).
  - \( r = 0 \): No linear correlation between the variables.

- Positive \( r \) values indicate a positive relationship (more television watching correlates with higher physical activity levels), while negative \( r \) values indicate a negative relationship (more television watching correlates with lower physical activity levels).

### Conclusion:

By calculating the Pearson correlation coefficient, you can quantitatively assess the strength and direction of the linear relationship between hours of television watching per day and level of physical activity in the sample of 50 participants. This helps in understanding how these two variables are related in the context of the study.

In [None]:
Q6. A company is interested in examining the relationship between the number of sales calls made per day
and the number of sales made per week. The company collected data on both variables from a sample of
30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

To calculate the Pearson correlation coefficient between the number of sales calls made per day and the number of sales made per week based on data collected from a sample of 30 sales representatives, follow these steps:

### Step-by-Step Calculation:

Assume we have collected the following hypothetical data for 30 sales representatives:

- Sales calls per day (X): [10, 15, 8, 12, 14, ..., 18]  (30 values)
- Sales per week (Y): [5, 7, 4, 6, 8, ..., 9]  (30 values)

#### 1. Calculate the Pearson Correlation Coefficient:

The Pearson correlation coefficient \( r \) is calculated using the formula:

\[
r = \frac{n \sum XY - \sum X \sum Y}{\sqrt{[n \sum X^2 - (\sum X)^2][n \sum Y^2 - (\sum Y)^2]}}
\]

Where:
- \( n \) is the number of observations (in this case, 30).
- \( X \) and \( Y \) are the variables of interest (sales calls per day and sales per week, respectively).
- \( \sum \) denotes the sum of the values.

Let's calculate the Pearson correlation coefficient using Python and NumPy:

```python
import numpy as np
from scipy.stats import pearsonr

# Example data (hypothetical)
sales_calls_per_day = np.array([10, 15, 8, 12, 14, 18, 9, 13, 16, 11, 7, 10, 15, 8, 12, 14, 18, 9, 13, 16, 11, 7, 10, 15, 8, 12, 14, 18, 9, 13])
sales_per_week = np.array([5, 7, 4, 6, 8, 9, 6, 7, 8, 5, 3, 5, 7, 4, 6, 8, 9, 6, 7, 8, 5, 3, 5, 7, 4, 6, 8, 9, 6, 7])

# Calculate Pearson correlation coefficient
r, _ = pearsonr(sales_calls_per_day, sales_per_week)

print(f"Pearson correlation coefficient: {r}")
```

### Interpretation of Results:

- The Pearson correlation coefficient \( r \) ranges from -1 to 1:
  - \( r = 1 \): Perfect positive correlation (as sales calls per day increase, sales per week increase).
  - \( r = -1 \): Perfect negative correlation (as sales calls per day increase, sales per week decrease).
  - \( r = 0 \): No linear correlation between the variables.

- Positive \( r \) values indicate a positive relationship (more sales calls per day correlate with more sales per week), while negative \( r \) values indicate a negative relationship (more sales calls per day correlate with fewer sales per week).

### Conclusion:

By calculating the Pearson correlation coefficient, you can quantitatively assess the strength and direction of the linear relationship between the number of sales calls made per day and the number of sales made per week in the sample of 30 sales representatives. This provides insights into how these two variables are associated in the context of the company's sales activities.