### Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
### you have collected data on the amount of time students spend studying for an exam and their final exam
### scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.






The Pearson correlation coefficient, often denoted as \(r\), is a measure of the linear relationship between two variables. It ranges from -1 to 1, where:

- \(r = 1\) indicates a perfect positive linear relationship,
- \(r = -1\) indicates a perfect negative linear relationship, and
- \(r = 0\) indicates no linear relationship.

To calculate the Pearson correlation coefficient between the amount of time students spend studying for an exam and their final exam scores, you can use the following formula:

\[ r = \frac{\sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n} (X_i - \bar{X})^2 \cdot \sum_{i=1}^{n} (Y_i - \bar{Y})^2}} \]

Where:
- \(X_i\) and \(Y_i\) are individual data points.
- \(\bar{X}\) and \(\bar{Y}\) are the means of the two variables.
- \(n\) is the number of data points.

Here's a simplified example using Python with pandas and NumPy:



Interpretation of Result:
- If the Pearson correlation coefficient (\(r\)) is close to 1, it indicates a strong positive linear relationship, suggesting that as the amount of time spent studying increases, the final exam scores tend to increase.
- If \(r\) is close to -1, it indicates a strong negative linear relationship, suggesting that as the amount of time spent studying increases, the final exam scores tend to decrease.
- If \(r\) is close to 0, it suggests a weak or no linear relationship between the two variables.

The sign of the correlation coefficient (\(r\)) indicates the direction of the relationship (positive or negative), and the magnitude indicates the strength of the relationship.

In [1]:

import pandas as pd
import numpy as np

# Sample data
data = pd.DataFrame({
    'Study Time (hours)': [10, 15, 8, 12, 14],
    'Exam Scores': [85, 90, 75, 88, 92]
})

# Calculate the Pearson correlation coefficient
correlation_coefficient = np.corrcoef(data['Study Time (hours)'], data['Exam Scores'])[0, 1]

# Display the correlation coefficient
print(f"Pearson Correlation Coefficient: {correlation_coefficient}")


Pearson Correlation Coefficient: 0.916117368659589


### Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
### Suppose you have collected data on the amount of sleep individuals get each night and their overall job
### satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
### variables and interpret the result.

In [2]:
import pandas as pd
from scipy.stats import spearmanr

# Sample data
data = pd.DataFrame({
    'Sleep Hours': [7, 5, 8, 6, 7],
    'Job Satisfaction': [8, 4, 9, 6, 7]
})

# Calculate Spearman's rank correlation coefficient
spearman_corr, _ = spearmanr(data['Sleep Hours'], data['Job Satisfaction'])

# Display the correlation coefficient
print(f"Spearman's Rank Correlation Coefficient: {spearman_corr}")


Spearman's Rank Correlation Coefficient: 0.9746794344808963


### Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
### exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
### for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
### between these two variables and compare the results.

In [3]:
import pandas as pd
from scipy.stats import pearsonr, spearmanr

# Sample data
data = pd.DataFrame({
    'Hours of Exercise per Week': [3, 5, 2, 4, 6, 1, 3, 2, 5, 4, 6, 1, 3, 2, 4, 6, 1, 3, 2, 4, 6, 1, 3, 2, 4, 6, 1, 3, 2, 4, 6, 1, 3, 2, 4, 6, 1, 3, 2, 4, 6, 1, 3, 2, 4, 6, 1, 3, 2],
    'BMI': [22, 25, 21, 24, 26, 20, 22, 21, 24, 23, 25, 19, 22, 21, 23, 26, 18, 22, 21, 23, 25, 17, 21, 20, 23, 25, 17, 21, 20, 23, 25, 17, 21, 20, 23, 25, 17, 21, 20, 23, 25, 17, 21, 20, 23, 25, 17, 21, 20]
})

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(data['Hours of Exercise per Week'], data['BMI'])

# Calculate Spearman's rank correlation coefficient
spearman_corr, _ = spearmanr(data['Hours of Exercise per Week'], data['BMI'])

# Display the correlation coefficients
print(f"Pearson Correlation Coefficient: {pearson_corr}")
print(f"Spearman's Rank Correlation Coefficient: {spearman_corr}")


Pearson Correlation Coefficient: 0.955111668744694
Spearman's Rank Correlation Coefficient: 0.9754112616123711


Comparison of Results:

If both 
�
r and 
�
ρ are close to 1, it indicates a strong positive linear relationship.
If both 
�
r and 
�
ρ are close to -1, it indicates a strong negative linear relationship.
If 
�
r and 
�
ρ differ, it suggests a nonlinear relationship or the presence of outliers. Spearman's rank correlation is less sensitive to outliers and can capture monotonic relationships better.
Interpretation of Results:

If both coefficients are close to 1, it suggests a strong positive linear or monotonic relationship, indicating that as the number of hours of exercise per week increases, BMI tends to increase.
If both coefficients are close to -1, it suggests a strong negative linear or monotonic relationship, indicating that as the number of hours of exercise per week increases, BMI tends to decrease.
If the coefficients are close to 0, it suggests a weak or no relationship between the two variables.
Compare the magnitude and sign of both coefficients to understand the nature of the relationship between the number of hours of exercise per week and BMI in your sample.

### Q4. A researcher is interested in examining the relationship between the number of hours individuals
### spend watching television per day and their level of physical activity. The researcher collected data on
### both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
### these two variables.

In [4]:
import pandas as pd
from scipy.stats import pearsonr

# Sample data
data = pd.DataFrame({
    'Hours of TV per Day': [2, 3, 1, 4, 2, 5, 1, 3, 2, 4, 1, 5, 3, 2, 4, 1, 5, 3, 2, 4, 1, 5, 3, 2, 4, 1, 5, 3, 2, 4, 1, 5, 3, 2, 4, 1, 5, 3, 2, 4, 1, 5, 3, 2, 4, 1, 5, 3],
    'Physical Activity Level': [3, 4, 2, 5, 3, 1, 2, 4, 3, 5, 2, 1, 4, 3, 5, 1, 2, 4, 3, 5, 1, 2, 4, 3, 5, 1, 2, 4, 3, 5, 1, 2, 4, 3, 5, 1, 2, 4, 3, 5, 1, 2, 4, 3, 5, 1, 2, 4]
})

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(data['Hours of TV per Day'], data['Physical Activity Level'])

# Display the correlation coefficient
print(f"Pearson Correlation Coefficient: {pearson_corr}")


Pearson Correlation Coefficient: 0.3212140564990487


### Q5. A survey was conducted to examine the relationship between age and preference for a particular
### brand of soft drink. The survey results are shown below:
    
![image.png](attachment:image.png)

In [19]:
data = {'age': [25,42,37,19,31,28],'soft drink preference':['Coke','Pepsi' ,'Mountain dew','Coke','Pepsi','Coke']}
import pandas as pd
df = pd.DataFrame(data)
from sklearn.preprocessing import LabelEncoder
lb =LabelEncoder()
df['new soft drink preference']=lb.fit_transform(df['soft drink preference'])
df = df.drop('soft drink preference',axis=1)
corr = df.corr()
print(corr)

                                age  new soft drink preference
age                        1.000000                   0.769175
new soft drink preference  0.769175                   1.000000


### Q6. A company is interested in examining the relationship between the number of sales calls made per day
### and the number of sales made per week. The company collected data on both variables from a sample of
### 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [5]:
import pandas as pd
from scipy.stats import pearsonr

# Sample data
data = pd.DataFrame({
    'Sales Calls per Day': [25, 30, 20, 35, 28, 40, 22, 32, 27, 38, 21, 36, 30, 39, 24, 33, 26, 37, 29, 31, 23, 34, 19, 37, 26, 29, 33, 31, 28, 35],
    'Sales per Week': [150, 180, 120, 210, 170, 240, 130, 190, 160, 220, 140, 200, 180, 230, 140, 200, 150, 220, 160, 180, 140, 210, 120, 220, 150, 160, 200, 190, 180, 210]
})

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(data['Sales Calls per Day'], data['Sales per Week'])

# Display the correlation coefficient
print(f"Pearson Correlation Coefficient: {pearson_corr}")


Pearson Correlation Coefficient: 0.9823030790377343
