## Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

### Ans :
Pearson Correlation Coefficient Calculation
The Pearson correlation coefficient (r) measures the strength and direction of the linear relationship between two variables. The formula is:

𝑟=∑(𝑥𝑖−𝑥ˉ)(𝑦𝑖−𝑦ˉ) /∑(𝑥𝑖−𝑥ˉ)^2∑(𝑦𝑖−𝑦ˉ)^2
 
Where:
𝑥𝑖 and 𝑦𝑖 are the individual data points of the two variables (e.g., study time and exam scores),
𝑥ˉ and yˉ are the means of the two variables.

Steps to Calculate:
1. Collect Data: Example dataset: Study time (in hours) vs Exam scores (out of 100)

* Study Time: [2, 3, 4, 5, 6]
*  Exam Scores: [50, 60, 70, 80, 90]
  
2. Calculate the Pearson Correlation Coefficient:

In [4]:
import numpy as np

# Data
study_time = np.array([2, 3, 4, 5, 6])
exam_scores = np.array([50, 60, 70, 80, 90])

# Calculate Pearson correlation coefficient
correlation = np.corrcoef(study_time, exam_scores)[0, 1]
print("Pearson correlation coefficient:", correlation)


Pearson correlation coefficient: 1.0


## Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

### Ans :

Spearman's Rank Correlation
Spearman's rank correlation coefficient measures the monotonic relationship between two variables, i.e., whether one variable tends to increase as the other increases (or decreases), regardless of whether the relationship is linear.

Steps to Calculate:
Rank the Data: Assign ranks to each value of both variables (sleep and job satisfaction).
Calculate the Spearman's rank correlation coefficient using the formula:
𝜌=1− 6∑𝑑𝑖^2 /𝑛(𝑛^2−1)

Where:
𝑑𝑖 is the difference between the ranks of each pair of values,
𝑛 is the number of data points.

In [9]:
from scipy.stats import spearmanr

# Data: Sleep (hours) and Job Satisfaction (1-10 scale)
sleep_hours = [7, 6, 5, 8, 7]
job_satisfaction = [6, 5, 4, 9, 7]

# Calculate Spearman's rank correlation
correlation, _ = spearmanr(sleep_hours, job_satisfaction)
print("Spearman's rank correlation coefficient:", correlation)


Spearman's rank correlation coefficient: 0.9746794344808963


## Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

### Ans :

Pearson vs. Spearman Correlation : Pearson correlation measures the linear relationship between two variables.
Spearman rank correlation measures the monotonic relationship between two variables, which means whether one increases as the other increases (regardless of whether the relationship is linear).

Steps to Calculate:

1. Pearson Correlation: Calculates the linear relationship.
2. Spearman Rank Correlation: Calculates the monotonic relationship.

## Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

### Ans :


In [15]:
import numpy as np
from scipy.stats import pearsonr

# Sample data: Hours of TV watched and physical activity level
tv_hours = np.array([2, 3, 4, 5, 6, 7, 2, 1, 3, 5, 4, 6, 3, 4, 5, 7, 2, 3, 6, 4, 
                    5, 6, 3, 2, 4, 5, 7, 6, 4, 3, 6, 7, 5, 4, 2, 3, 4, 6, 5, 3])
activity_level = np.array([4, 3, 2, 1, 0, 0, 4, 5, 3, 1, 2, 0, 3, 2, 1, 0, 4, 3, 0, 2, 
                           1, 0, 3, 4, 1, 2, 0, 0, 3, 2, 1, 0, 2, 3, 4, 2, 3, 1, 0, 2])

# Calculate Pearson correlation coefficient
correlation, _ = pearsonr(tv_hours, activity_level)

print("Pearson correlation coefficient:", correlation)


Pearson correlation coefficient: -0.9254575538258921


## Q5. A survey was conducted to examine the relationship between age and preference for a particularbrand of soft drink. The survey results are shown below:
Age(Year) = 25,42, 37,19,31,28
Soft drink preference = coke , pepsi, mountain dew, coke, pepsi, coke

### Ans :
To examine the relationship between age and soft drink preference, we need to encode the categorical variable (soft drink preference) into a numerical format so that we can calculate a correlation.

Step-by-Step Process:
Data:

Age: [25, 42, 37, 19, 31, 28]
Soft drink preference: ["coke", "pepsi", "mountain dew", "coke", "pepsi", "coke"]
Encode Soft Drink Preference: We need to transform the categorical values (coke, pepsi, mountain dew) into numerical values. For example, we can assign:

coke = 1
pepsi = 2
mountain dew = 3
The encoded values will look like this:

Age: [25, 42, 37, 19, 31, 28]
Encoded Preference: [1, 2, 3, 1, 2, 1]
Calculate the Pearson Correlation Coefficient between age and encoded preference using the formula:

𝑟 =∑(𝑥𝑖−𝑥ˉ)(𝑦𝑖−𝑦ˉ) /√∑(𝑥𝑖−𝑥ˉ)^2∑(𝑦𝑖−𝑦ˉ)^2


In [17]:
import numpy as np
from scipy.stats import pearsonr

# Data: Age and encoded soft drink preference
age = np.array([25, 42, 37, 19, 31, 28])
encoded_preference = np.array([1, 2, 3, 1, 2, 1])  # Coke=1, Pepsi=2, Mountain Dew=3

# Calculate Pearson correlation coefficient
correlation, _ = pearsonr(age, encoded_preference)

print("Pearson correlation coefficient:", correlation)


Pearson correlation coefficient: 0.7587035441865058


## Q6. A company is interested in examining the relationship between the number of sales calls made per dayand the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

### Ans :
To calculate the Pearson correlation coefficient between the number of sales calls made per day and the number of sales made per week, follow these steps:

Step-by-Step Process:
Data: You need to have two variables:

Number of sales calls made per day (let's call it X).
Number of sales made per week (let's call it Y).
Pearson Correlation Formula:

> 𝑟 =∑(𝑋𝑖−𝑋ˉ)(𝑌𝑖−𝑌ˉ) /√∑(𝑋𝑖−𝑋ˉ)^2∑(𝑌𝑖−𝑌ˉ)^2

        Where:𝑋i​  and 𝑌i  are individual data points.
        𝑋  and 𝑌 are the means of X and Y.

In [19]:
import numpy as np
from scipy.stats import pearsonr

# Sample data: Number of sales calls made per day (X) and number of sales made per week (Y)
sales_calls = np.array([20, 15, 30, 25, 18, 24, 22, 35, 28, 40, 18, 20, 30, 27, 33, 26, 31, 25, 22, 23,
                       29, 32, 21, 30, 19, 35, 28, 30, 22, 26])

sales_made = np.array([8, 6, 12, 10, 7, 9, 8, 14, 10, 15, 7, 8, 12, 11, 13, 9, 12, 10, 8, 8,
                       11, 13, 9, 11, 7, 14, 12, 11, 9, 10])

# Calculate Pearson correlation coefficient
correlation, _ = pearsonr(sales_calls, sales_made)

print("Pearson correlation coefficient:", correlation)


Pearson correlation coefficient: 0.9730742808546509
