# Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

To calculate the Pearson correlation coefficient (denoted as \( r \)) between the time students spend studying and their exam scores, we use the formula:

\[
r = \frac{\sum (X - \overline{X})(Y - \overline{Y})}{\sqrt{\sum (X - \overline{X})^2 \sum (Y - \overline{Y})^2}}
\]

Where:
- \( X \) is the time spent studying,
- \( Y \) is the exam score,
- \( \overline{X} \) and \( \overline{Y} \) are the means of \( X \) and \( Y \), respectively.

### Steps:
1. Compute the means \( \overline{X} \) and \( \overline{Y} \).
2. For each pair of values, calculate \( (X_i - \overline{X})(Y_i - \overline{Y}) \).
3. Sum these values.
4. Compute the square root of the sum of squared differences for both \( X \) and \( Y \).
5. Divide the result of step 3 by the product of the square roots from step 4.

The value of \( r \) ranges from -1 to 1:
- \( r = 1 \) indicates a perfect positive linear relationship.
- \( r = -1 \) indicates a perfect negative linear relationship.
- \( r = 0 \) indicates no linear relationship.

If the computed \( r \) is close to 1, it suggests that as study time increases, exam scores also increase in a linear fashion. If \( r \) is close to 0, it suggests little to no linear relationship between study time and exam scores.

# Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these twovariables and interpret the result.

To calculate Spearman's rank correlation coefficient (denoted as \( \rho \)) between the amount of sleep individuals get and their job satisfaction level, we follow these steps:

### Steps:
1. **Rank the data**:
   - Assign ranks to both variables (sleep time and job satisfaction). If there are ties, assign the average of the ranks to the tied values.
   
2. **Compute the differences in ranks**:
   - For each data pair, compute the difference between the ranks of the two variables, denoted as \( d_i \).

3. **Apply the Spearman rank correlation formula**:
   \[
   \rho = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}
   \]
   Where:
   - \( d_i \) is the difference between the ranks of the \( i \)-th pair,
   - \( n \) is the number of data points.

### Interpretation:
- \( \rho = 1 \) indicates a perfect positive monotonic relationship (as one variable increases, the other also increases consistently).
- \( \rho = -1 \) indicates a perfect negative monotonic relationship (as one variable increases, the other decreases consistently).
- \( \rho = 0 \) suggests no monotonic relationship.

If \( \rho \) is close to 1, this means individuals who get more sleep tend to have higher job satisfaction. Conversely, if \( \rho \) is close to -1, more sleep is associated with lower job satisfaction. A value near 0 would indicate no consistent relationship between sleep and job satisfaction.

# Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlationbetween these two variables and compare the results.

To calculate both the Pearson correlation coefficient and Spearman's rank correlation between hours of exercise per week and BMI for 50 participants, we need the actual data. However, here's how you would proceed with the calculation and comparison:

### 1. **Pearson Correlation Coefficient**:
   - The **Pearson correlation coefficient** measures the linear relationship between exercise hours and BMI. Use the formula:

   \[
   r = \frac{\sum (X - \overline{X})(Y - \overline{Y})}{\sqrt{\sum (X - \overline{X})^2 \sum (Y - \overline{Y})^2}}
   \]

   Where:
   - \( X \) is the number of exercise hours,
   - \( Y \) is the BMI values,
   - \( \overline{X} \) and \( \overline{Y} \) are the means of \( X \) and \( Y \).

   Pearson's correlation will tell you the strength and direction of the **linear** relationship between the two variables.

### 2. **Spearman's Rank Correlation**:
   - The **Spearman rank correlation** measures the **monotonic** relationship between exercise hours and BMI. Even if the relationship is not linear, but consistently increases or decreases, Spearman's correlation can capture it. The formula is:

   \[
   \rho = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}
   \]

   Where:
   - \( d_i \) is the difference in ranks between exercise hours and BMI,
   - \( n \) is the number of participants (50 in this case).

   Spearman's correlation assesses whether an **increase in one variable corresponds to an increase or decrease in the other**, but without assuming a specific linear relationship.

### 3. **Comparison**:
   - **Pearson correlation** is appropriate if the relationship between exercise and BMI is **linear**. If \( r \) is close to 1 or -1, it suggests a strong linear relationship, while \( r = 0 \) implies no linear relationship.
   - **Spearman correlation** will measure whether the two variables have a consistent **monotonic relationship** (e.g., BMI consistently decreases or increases with more exercise, even if the rate of change is not constant).
   
If both coefficients are similar and significant, it suggests a strong **linear and monotonic relationship** between the variables. If Spearman's is strong but Pearson's is weaker, this indicates a **non-linear but monotonic** relationship. If both are close to 0, there might be little to no relationship between exercise and BMI.

Would you like to input the dataset for me to calculate these values for you?

# Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

To calculate the Pearson correlation coefficient between the number of hours spent watching television per day and the level of physical activity, we would follow the same steps as outlined earlier. Here’s the process you’d apply:

### 1. **Formula for Pearson Correlation Coefficient**:
   \[
   r = \frac{\sum (X - \overline{X})(Y - \overline{Y})}{\sqrt{\sum (X - \overline{X})^2 \sum (Y - \overline{Y})^2}}
   \]
   Where:
   - \( X \) is the number of hours spent watching television per day,
   - \( Y \) is the level of physical activity,
   - \( \overline{X} \) and \( \overline{Y} \) are the means of \( X \) and \( Y \).

### 2. **Steps**:
   1. **Compute the means** \( \overline{X} \) and \( \overline{Y} \) for television hours and physical activity.
   2. **For each pair** of values (television hours, physical activity), calculate \( (X_i - \overline{X})(Y_i - \overline{Y}) \).
   3. **Sum** these values.
   4. **Calculate the squared differences** \( (X_i - \overline{X})^2 \) and \( (Y_i - \overline{Y})^2 \), then sum them.
   5. **Divide the sum** of the product of deviations from step 3 by the square root of the product of the sums from step 4.

### 3. **Interpretation**:
   - \( r = 1 \) indicates a perfect positive linear relationship (as TV hours increase, physical activity increases).
   - \( r = -1 \) indicates a perfect negative linear relationship (as TV hours increase, physical activity decreases).
   - \( r = 0 \) suggests no linear relationship.

If you provide the actual data for the number of hours spent watching TV and the level of physical activity, I can calculate the Pearson correlation coefficient for you.

# Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:

It seems you want to analyze the survey data on the relationship between age and soft drink preference. Here’s the provided data organized:

| Age (Years) | Soft Drink Preference |
|-------------|-----------------------|
| 25          | Coke                  |
| 42          | Pepsi                 |
| 37          | Mountain Dew          |
| 19          | Coke                  |
| 31          | Pepsi                 |
| 28          | Coke                  |

You can analyze this data by:

1. Counting the frequency of each soft drink preference.
2. Checking if there's a visible pattern between age groups and their soft drink preferences.

Would you like to perform a more detailed analysis, or create some visualizations (like bar charts or pie charts)?

# Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

To calculate the Pearson correlation coefficient \( r \) between the number of sales calls made per day and the number of sales made per week, the steps are as follows:

### Formula for Pearson Correlation Coefficient:
\[
r = \frac{n(\Sigma xy) - (\Sigma x)(\Sigma y)}{\sqrt{[n\Sigma x^2 - (\Sigma x)^2][n\Sigma y^2 - (\Sigma y)^2]}}
\]
Where:
- \( n \) is the number of data points (in this case, 30).
- \( x \) represents the number of sales calls made per day.
- \( y \) represents the number of sales made per week.
- \( \Sigma xy \) is the sum of the product of corresponding x and y values.
- \( \Sigma x \) is the sum of x values.
- \( \Sigma y \) is the sum of y values.
- \( \Sigma x^2 \) is the sum of squared x values.
- \( \Sigma y^2 \) is the sum of squared y values.

### Steps:
1. **Collect the data**: The company collected data on 30 sales representatives. We need both variables (sales calls per day and sales per week).
2. **Compute the necessary sums**:
   - Sum of all \( x \) values (sales calls per day).
   - Sum of all \( y \) values (sales made per week).
   - Sum of the product of \( x \) and \( y \) (i.e., \( x_i \times y_i \) for each representative).
   - Sum of squares of \( x \) values and \( y \) values.

3. **Substitute these sums into the formula** to get the Pearson correlation coefficient.

Without the actual data, I can't calculate the precise value. Please provide the data for those 30 sales representatives to proceed.