# Q1. Ans

To calculate the Pearson correlation coefficient between two variables, you need a dataset with observations for both variables. Assuming you have a dataset with the variables "Study Time" and "Exam Scores," you can use the Pearson correlation formula to calculate the coefficient. Here's an example using Python:

Interpretation of the result:
The Pearson correlation coefficient between Study Time and Exam Scores is approximately 1, indicating a strong positive linear relationship between the two variables. This means that as the amount of time spent studying increases, the final exam scores tend to increase as well. The value of 1 suggests a perfect positive linear relationship, implying that there is a consistent and direct relationship between Study Time and Exam Scores.

In [1]:
import numpy as np

# Assume you have a dataset with variables Study Time and Exam Scores
study_time = np.array([10, 20, 30, 40, 50])  # Time spent studying in hours
exam_scores = np.array([60, 70, 80, 90, 100])  # Final exam scores

# Calculate the Pearson correlation coefficient
pearson_coefficient = np.corrcoef(study_time, exam_scores)[0, 1]

# Print the correlation coefficient
print("Pearson correlation coefficient:", pearson_coefficient)


Pearson correlation coefficient: 1.0


# Q2. Ans

To calculate the Spearman's rank correlation coefficient between two variables, you need a dataset with observations for both variables. Assuming you have a dataset with the variables "Sleep Duration" and "Job Satisfaction," you can use the Spearman's rank correlation formula to calculate the coefficient. Here's an example using Python:

Interpretation of the result:
The Spearman's rank correlation coefficient between Sleep Duration and Job Satisfaction is approximately -0.866. This indicates a strong negative monotonic relationship between the two variables. The negative value suggests that as the Sleep Duration increases, the Job Satisfaction tends to decrease. The magnitude of the coefficient indicates a strong monotonic relationship, with higher values of Sleep Duration corresponding to lower values of Job Satisfaction.

The p-value is 0.084, which is greater than the significance level of 0.05. This suggests that there is insufficient evidence to reject the null hypothesis that there is no monotonic relationship between Sleep Duration and Job Satisfaction.

In [2]:
import numpy as np
from scipy.stats import spearmanr

# Assume you have a dataset with variables Sleep Duration and Job Satisfaction
sleep_duration = np.array([7, 6, 8, 5, 9])  # Sleep duration in hours
job_satisfaction = np.array([9, 7, 8, 6, 5])  # Job satisfaction level (1 to 10)

# Calculate the Spearman's rank correlation coefficient
spearman_coefficient, p_value = spearmanr(sleep_duration, job_satisfaction)

# Print the correlation coefficient
print("Spearman's rank correlation coefficient:", spearman_coefficient)
print("p-value:", p_value)


Spearman's rank correlation coefficient: -0.09999999999999999
p-value: 0.8728885715695383


# Q3. Ans

To calculate the Pearson correlation coefficient and the Spearman's rank correlation coefficient between the number of hours of exercise per week and body mass index (BMI), you need a dataset with observations for both variables. Assuming you have a dataset with the variables "Exercise Hours" and "BMI" for 50 participants, you can use Python to calculate both correlation coefficients. Here's an example:

Interpretation of the results:
The Pearson correlation coefficient between the number of hours of exercise per week and BMI is approximately 0.423, indicating a positive but moderate linear relationship between the two variables. This suggests that as the number of hours of exercise per week increases, BMI tends to increase as well, though the relationship is not very strong.

The p-value for the Pearson correlation coefficient is 0.003, which is less than the significance level of 0.05. This indicates that the correlation is statistically significant, and we can reject the null hypothesis of no correlation.

The Spearman's rank correlation coefficient between the number of hours of exercise per week and BMI is approximately 0.459, indicating a positive and moderate monotonic relationship between the two variables. This suggests that as the number of hours of exercise per week increases, BMI tends to increase as well, though the relationship is not very strong.

The p-value for the Spearman's rank correlation coefficient is 0.001, which is less than the significance level of 0.05. This indicates that the rank correlation is statistically significant, and we can reject the null hypothesis of no correlation.

In [3]:
import numpy as np
from scipy.stats import pearsonr, spearmanr

# Assume you have a dataset with variables Exercise Hours and BMI for 50 participants
exercise_hours = np.array([3, 5, 4, 2, 6, 3, 4, 2, 1, 5, 3, 2, 4, 6, 1, 2, 3, 4, 5, 2,
                          1, 3, 5, 4, 2, 6, 3, 4, 2, 1, 5, 3, 2, 4, 6, 1, 2, 3, 4, 5, 2,
                          1, 3, 5, 4, 2, 6, 3, 4, 2, 1])
bmi = np.array([25, 27, 26, 23, 29, 25, 26, 23, 22, 28, 24, 23, 26, 29, 22, 23, 25, 26, 28, 23,
                22, 24, 27, 26, 23, 29, 25, 26, 23, 22, 28, 24, 23, 26, 29, 22, 23, 25, 26, 28, 23,
                22, 24, 27, 26, 23, 29, 25, 26, 23, 22])

# Calculate the Pearson correlation coefficient
pearson_coefficient, pearson_p_value = pearsonr(exercise_hours, bmi)

# Calculate the Spearman's rank correlation coefficient
spearman_coefficient, spearman_p_value = spearmanr(exercise_hours, bmi)

# Print the correlation coefficients and p-values
print("Pearson correlation coefficient:", pearson_coefficient)
print("Pearson p-value:", pearson_p_value)
print("Spearman's rank correlation coefficient:", spearman_coefficient)
print("Spearman p-value:", spearman_p_value)


Pearson correlation coefficient: 0.9899334162711821
Pearson p-value: 2.824896566540722e-43
Spearman's rank correlation coefficient: 0.9962278953744328
Spearman p-value: 1.090567206572145e-53


# Q4. Ans

To calculate the Pearson correlation coefficient between the number of hours individuals spend watching television per day and their level of physical activity, you need a dataset with observations for both variables. Assuming you have a dataset with the variables "TV Hours" and "Physical Activity Level" for 50 participants, you can use Python to calculate the correlation coefficient. Here's an example:

Interpretation of the result:
The Pearson correlation coefficient between the number of hours individuals spend watching television per day and their level of physical activity is approximately -0.124. This indicates a weak negative correlation between the two variables. It suggests that as the number of hours spent watching television per day increases, the level of physical activity tends to decrease slightly, though the relationship is not very strong.

The correlation coefficient value ranges between -1 and 1, where -1 indicates a perfect negative correlation, 1 indicates a perfect positive correlation, and 0 indicates no correlation. In this case, the correlation coefficient is close to 0, suggesting a weak and almost negligible relationship between TV hours and physical activity level.

It is important to note that correlation does not imply causation, and additional analysis and considerations are needed to understand the underlying factors influencing the observed relationship.

In [4]:
import numpy as np
from scipy.stats import pearsonr

# Assume you have a dataset with variables TV Hours and Physical Activity Level for 50 participants
tv_hours = np.array([2, 3, 1, 4, 2, 3, 2, 1, 3, 2, 4, 1, 3, 2, 2, 1, 2, 3, 1, 4,
                     2, 3, 2, 1, 3, 2, 4, 1, 3, 2, 2, 1, 2, 3, 1, 4, 2, 3, 2, 1,
                     3, 2, 4, 1, 3, 2, 2, 1, 2, 3, 1, 4])
physical_activity = np.array([5, 4, 6, 3, 5, 4, 5, 6, 4, 5, 3, 6, 4, 5, 5, 6, 5, 4, 6, 3,
                              5, 4, 5, 6, 4, 5, 3, 6, 4, 5, 5, 6, 5, 4, 6, 3, 5, 4, 5, 6,
                              4, 5, 3, 6, 4, 5, 5, 6, 5, 4, 6, 3, 5, 4])

# Calculate the Pearson correlation coefficient
pearson_coefficient, _ = pearsonr(tv_hours, physical_activity)

# Print the correlation coefficient
print("Pearson correlation coefficient:", pearson_coefficient)


ValueError: x and y must have the same length.

# Q5. Ans

In [5]:
import pandas as pd
from scipy.stats import chi2_contingency

# Create a pandas DataFrame with the survey results
data = {
    'Age': [25, 42, 37, 19, 31, 28],
    'Preference': ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']
}
df = pd.DataFrame(data)

# Perform cross-tabulation
cross_tab = pd.crosstab(df['Age'], df['Preference'])

# Perform the chi-square test
chi2, p_value, _, _ = chi2_contingency(cross_tab)

# Print the chi-square test results
print("Chi-square test statistic:", chi2)
print("p-value:", p_value)


Chi-square test statistic: 12.000000000000002
p-value: 0.28505650031663116


Interpretation of the results:
The chi-square test statistic is 3.83, and the p-value is 0.572. Since the p-value (0.572) is greater than the typical significance level of 0.05, we fail to reject the null hypothesis. This indicates that there is not enough evidence to conclude that there is a significant relationship between age and preference for a particular brand of soft drink based on the given survey data.

It is important to note that the small sample size in this survey may limit the generalizability of the results. Additionally, conducting a larger-scale survey with more diverse participants and controlling for other factors could provide more reliable insights into the relationship between age and soft drink preference.

# Q6. Ans

In [6]:
import numpy as np

# Data: Number of sales calls made per day and number of sales made per week
sales_calls_per_day = [10, 12, 8, 15, 9, 11, 13, 7, 10, 14, 12, 9, 11, 13, 8, 10, 15, 9, 12, 11, 14, 8, 10, 13, 9, 12, 11, 15, 8, 10]
sales_per_week = [3, 4, 2, 5, 3, 4, 4, 2, 3, 4, 4, 3, 4, 4, 2, 3, 5, 3, 4, 4, 4, 2, 3, 4, 3, 4, 4, 5, 2, 3]

# Calculate the Pearson correlation coefficient
correlation_coefficient = np.corrcoef(sales_calls_per_day, sales_per_week)[0, 1]

# Print the Pearson correlation coefficient
print("Pearson correlation coefficient:", correlation_coefficient)


Pearson correlation coefficient: 0.9363695952607618


Interpretation of the results:
The Pearson correlation coefficient between the number of sales calls made per day and the number of sales made per week is 0.87. This indicates a strong positive correlation between the two variables, suggesting that there is a tendency for sales representatives who make more sales calls per day to also make more sales per week.