#Q1

To calculate the Pearson correlation coefficient (also known as Pearson's r) between the amount of time students spend studying for an exam and their final exam scores, you'll need the following information:

1. A dataset of paired values for each student, where one value represents the amount of time spent studying and the other represents the final exam score.

2. The means (average) of the time spent studying and the final exam scores.

3. The standard deviations of the time spent studying and the final exam scores.

4. The number of data points (students) in your dataset.

Once you have these values, you can use the following formula to calculate Pearson's r:

\[ r = \frac{\sum{(X_i - \bar{X})(Y_i - \bar{Y})}}{\sqrt{\sum{(X_i - \bar{X})^2}\sum{(Y_i - \bar{Y})^2}}}\]

Where:
- \(X_i\) and \(Y_i\) are the individual data points (amount of time studying and exam scores, respectively).
- \(\bar{X}\) and \(\bar{Y}\) are the means of the amount of time spent studying and the final exam scores.
- The summations are taken over all data points in your dataset.

Interpretation of the result:
- If the Pearson correlation coefficient (r) is close to 1, it indicates a strong positive linear relationship. This means that as the amount of time students spend studying increases, their final exam scores tend to increase as well. In other words, more study time is associated with higher exam scores.

- If r is close to -1, it indicates a strong negative linear relationship. This implies that as the amount of time spent studying increases, final exam scores tend to decrease. In this scenario, more study time is associated with lower exam scores.

- If r is close to 0, it suggests a weak or no linear relationship. This means that there is no clear linear association between the amount of time spent studying and the final exam scores.

It's important to note that correlation does not imply causation. A strong correlation does not prove that studying more causes higher exam scores; it only shows that there is a statistical relationship between the two variables.

To make a more robust interpretation, you should also consider the context, study design, and other factors that might be influencing the results.

In [3]:
import numpy as np
from scipy import stats

time_studied = [10,20,30,40,50]
exam_scores = [60,70,80,90,100]

correlation_coefficient, p_value = stats.pearsonr(time_studied, exam_scores)

if correlation_coefficient > 0:
    interpretation = "there is a positive linear relationship between time spent studying and exam scores."
elif correlation_coefficient < 0:
    interpretation = "There is a negative linear relationship between time spent studying and exam scores"
else:
    interpretation = "There is no linear relationship between time spent studying and exam scores."
print(f"pearson Correlation Cofficientr: {correlation_coefficient}")
print(f"P_value:{p_value}")
print(interpretation)

pearson Correlation Cofficientr: 1.0
P_value:0.0
there is a positive linear relationship between time spent studying and exam scores.


In [1]:
#Q2

from scipy import stats

sleep = [7,6,8,6,7,8,6,5,9,7]
job_satisfaction = [6,7,9,6,7,8,5,4,9,6]

correlation_coefficient, p_value = stats.spearmanr(sleep, job_satisfaction)

if correlation_coefficient > 0 :
    interpretation = "There is a positive monotonic relationship between the amount of sleep and job satisfaction."
elif correlation_coefficient < 0:
    interpretation = "There is a negative monotonic relationship between the amount of sleep and job satisfaction."
else:
    interpretation = "There is no monotonic relationship between the amount of sleep and job satisfaction."
print(f"Spearman's Rank Correlation : {correlation_coefficient}")
print(f"P_value :{p_value}")
print(interpretation)
    

Spearman's Rank Correlation : 0.8476574910073985
P_value :0.0019524334957064622
There is a positive monotonic relationship between the amount of sleep and job satisfaction.


In [4]:
#Q3

import numpy as np
from scipy import stats

exercise_hours = [3, 5, 7, 4, 6, 2, 1, 3, 5, 6, 2, 4, 7, 1, 3, 2, 6, 5, 4, 1, 7, 2, 3, 5, 4, 6, 1, 2, 4, 5, 7, 3, 6, 2, 1, 7, 4, 5, 2, 3, 1, 6, 5, 4, 7, 2, 3, 1, 5]
bmi = [25, 27, 29, 28, 30, 24, 22, 26, 27, 29, 23, 28, 31, 22, 26, 24, 30, 28, 27, 22, 31, 23, 26, 28, 29, 30, 21, 23, 27, 28, 31, 25, 30, 23, 21, 30, 28, 27, 22, 26, 24, 32, 29, 28, 31, 24, 26, 22, 28]

pearson_corr, pearson_p_value = stats.pearsonr(exercise_hours, bmi)
spearman_corr, spearman_p_value = stats.spearmanr(exercise_hours, bmi)

print(f"pearson correlation coefficient: {pearson_corr}")
print(f"pearson correlation P-value: {pearson_p_value}")
if pearson_corr > 0:
    print("Interpretation (pearson): There is a positive linear relationship between exercise hours and BMI.")
elif pearson_corr < 0:
    print("Interpretation (pearson): There is a negative linear relationship between exercise hours and BMI")
else :
    print("Interpretation (Pearson): There is no linear relationship between exercise hours and BMI.")
    
print(f"\nSpearman's Rank Correlation: {spearman_corr}")
print(f"Spearman's Rank Correlation P-Value: {spearman_p_value}")
if spearman_corr > 0:
    print("Interpretation (Spearman): There is a positive monotonic relationship between exercise hours and BMI.")
elif spearman_corr < 0:
    print("Interpretation (Spearman): There is a negative monotonic relationship between exercise hours and BMI.")
else:
    print("Interpretation (Spearman): There is no monotonic relationship between exercise hours and BMI.")  

pearson correlation coefficient: 0.9501743691866473
pearson correlation P-value: 1.9574732055341274e-25
Interpretation (pearson): There is a positive linear relationship between exercise hours and BMI.

Spearman's Rank Correlation: 0.9538844821623907
Spearman's Rank Correlation P-Value: 3.3093470761906774e-26
Interpretation (Spearman): There is a positive monotonic relationship between exercise hours and BMI.


In [7]:
import random

# Generate example data for hours of TV watching and level of physical activity
tv_hours = [random.uniform(0, 5) for _ in range(50)]  # Random values between 0 and 5
physical_activity = [random.uniform(1, 10) for _ in range(50)]  # Random values between 1 and 10

# Calculate the Pearson correlation coefficient
correlation_coefficient, p_value = stats.pearsonr(tv_hours, physical_activity)

# Interpret the result
if correlation_coefficient > 0:
    interpretation = "There is a positive linear relationship between the number of hours watching TV and the level of physical activity."
elif correlation_coefficient < 0:
    interpretation = "There is a negative linear relationship between the number of hours watching TV and the level of physical activity."
else:
    interpretation = "There is no linear relationship between the number of hours watching TV and the level of physical activity."

print(f"Pearson Correlation Coefficient: {correlation_coefficient}")
print(f"P-Value: {p_value}")
print(interpretation)


Pearson Correlation Coefficient: -0.0339545881932681
P-Value: 0.8149151535987661
There is a negative linear relationship between the number of hours watching TV and the level of physical activity.


#Q5

It seems you've provided a limited dataset showing the ages and soft drink preferences of a few individuals. To examine the relationship between age and preference for a particular brand of soft drink, you would typically need a larger dataset with a more diverse sample of individuals. The dataset you've given includes only a few observations, making it challenging to draw meaningful conclusions about the relationship between age and soft drink preference.

To analyze the relationship between age and soft drink preference, you would typically perform statistical tests or analyses on a more extensive dataset. Common methods might include calculating summary statistics, conducting surveys on a larger sample, and using statistical tests to determine if there's a significant relationship between age and soft drink preference.

If you have a larger dataset and specific questions or analysis you'd like to perform, please provide more information, and I'd be happy to help you further.

In [8]:
#Q6

from scipy import stats

# Example data for sales calls per day and sales per week
sales_calls_per_day = [10, 15, 12, 8, 14, 11, 9, 13, 16, 12, 10, 15, 14, 12, 9, 11, 8, 13, 16, 10, 15, 12, 8, 14, 11, 9, 13, 16, 12, 10]
sales_per_week = [25, 30, 28, 22, 29, 27, 23, 26, 31, 28, 26, 32, 30, 28, 24, 27, 23, 26, 31, 25, 30, 28, 22, 29, 27, 23, 26, 31, 28, 26]

# Calculate the Pearson correlation coefficient
correlation_coefficient, p_value = stats.pearsonr(sales_calls_per_day, sales_per_week)

# Interpret the result
if correlation_coefficient > 0:
    interpretation = "There is a positive linear relationship between the number of sales calls made per day and the number of sales made per week."
elif correlation_coefficient < 0:
    interpretation = "There is a negative linear relationship between the number of sales calls made per day and the number of sales made per week."
else:
    interpretation = "There is no linear relationship between the number of sales calls made per day and the number of sales made per week."

print(f"Pearson Correlation Coefficient: {correlation_coefficient}")
print(f"P-Value: {p_value}")
print(interpretation)


Pearson Correlation Coefficient: 0.9328437458536273
P-Value: 6.144764137473217e-14
There is a positive linear relationship between the number of sales calls made per day and the number of sales made per week.
