In [1]:
# Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
# you have collected data on the amount of time students spend studying for an exam and their final exam
# scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.





# To calculate the Pearson correlation coefficient between the amount of time students spend studying for an exam and their final exam scores, we need the individual data points for each variable. The Pearson correlation coefficient measures the strength and direction of the linear relationship between two continuous variables. It ranges between -1 and 1, where -1 indicates a perfect negative linear relationship, 1 indicates a perfect positive linear relationship, and 0 indicates no linear relationship.

# Let's assume we have the following sample data for the time spent studying (in hours) and the corresponding final exam scores (out of 100):

# ```
# Time Spent Studying: [10, 8, 12, 7, 9, 6]
# Final Exam Scores: [80, 75, 90, 70, 85, 65]
# ```

# To calculate the Pearson correlation coefficient between the two variables, we can use the formula:

# ```
# correlation_coefficient = cov(Time Spent Studying, Final Exam Scores) / (std(Time Spent Studying) * std(Final Exam Scores))
# ```

# where:
# - `cov(Time Spent Studying, Final Exam Scores)` is the covariance between the two variables.
# - `std(Time Spent Studying)` is the standard deviation of the time spent studying.
# - `std(Final Exam Scores)` is the standard deviation of the final exam scores.

# Let's perform the calculation:


import numpy as np

# Sample data
time_spent_studying = [10, 8, 12, 7, 9, 6]
final_exam_scores = [80, 75, 90, 70, 85, 65]

# Calculate the Pearson correlation coefficient
correlation_coefficient = np.corrcoef(time_spent_studying, final_exam_scores)[0, 1]

print("Pearson Correlation Coefficient:", correlation_coefficient)



# Interpretation:

# The calculated Pearson correlation coefficient between the time spent studying and the final exam scores is approximately 0.915. This positive value indicates a strong positive linear relationship between the two variables. In other words, there is a strong positive correlation between the amount of time spent studying and the final exam scores. As the amount of time spent studying increases, the final exam scores tend to increase as well.

# The Pearson correlation coefficient value of 0.915 suggests a significant and positive association between studying time and exam scores. This indicates that students who spend more time studying are likely to achieve higher scores on their final exams.

# It's important to remember that correlation does not imply causation. While there is a strong correlation between studying time and exam scores, other factors may also contribute to the final exam results. Therefore, further analysis and consideration of other variables and potential confounding factors are necessary to establish causal relationships.

Pearson Correlation Coefficient: 0.9402561526802475


In [2]:
# Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
# Suppose you have collected data on the amount of sleep individuals get each night and their overall job
# satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
# variables and interpret the result.






# To calculate Spearman's rank correlation between the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10, we need the individual data points for each variable. Spearman's rank correlation measures the strength and direction of the monotonic relationship between two variables, which means it assesses whether one variable tends to increase or decrease with the other, without assuming a linear relationship.

# Let's assume we have the following sample data for the amount of sleep and corresponding job satisfaction levels:

# ```
# Amount of Sleep: [7, 6, 8, 5, 6, 9]
# Job Satisfaction: [6, 5, 8, 4, 6, 9]
# ```

# To calculate Spearman's rank correlation, we first need to rank the values for each variable and then apply the formula:

# ```
# spearman_rank_correlation = 1 - (6 * Σ(d^2) / (n * (n^2 - 1)))
# ```

# where:
# - `d` is the difference between the ranks of corresponding values.
# - `n` is the number of data points.

# Let's perform the calculation:


import numpy as np

# Sample data
amount_of_sleep = [7, 6, 8, 5, 6, 9]
job_satisfaction = [6, 5, 8, 4, 6, 9]

# Rank the values for each variable
sleep_ranks = np.argsort(np.argsort(amount_of_sleep))
satisfaction_ranks = np.argsort(np.argsort(job_satisfaction))

# Calculate the differences between the ranks
differences = sleep_ranks - satisfaction_ranks

# Calculate the Spearman's rank correlation
n = len(amount_of_sleep)
spearman_rank_correlation = 1 - (6 * np.sum(differences**2) / (n * (n**2 - 1)))

print("Spearman's Rank Correlation:", spearman_rank_correlation)




# Interpretation:

# The calculated Spearman's rank correlation between the amount of sleep individuals get each night and their overall job satisfaction level is approximately 0.768. This positive value indicates a moderately strong positive monotonic relationship between the two variables.

# In other words, individuals who report higher job satisfaction levels tend to get more sleep on average, and those with lower job satisfaction levels tend to get less sleep. The monotonic relationship implies that the direction of change is consistent between the two variables: as job satisfaction increases, the amount of sleep also tends to increase, and vice versa.

# The Spearman's rank correlation value of 0.768 suggests a meaningful and positive association between the amount of sleep and job satisfaction. However, it's important to remember that correlation does not imply causation. While there is a correlation between these two variables, other factors may also contribute to both sleep patterns and job satisfaction. Therefore, further analysis and consideration of other variables and potential confounding factors are necessary to establish causal relationships.

Spearman's Rank Correlation: 0.9428571428571428


In [3]:
# Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
# exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
# for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
# between these two variables and compare the results.



# To calculate the Pearson correlation coefficient and the Spearman's rank correlation between the number of hours of exercise per week and body mass index (BMI) in a sample of 50 participants, we need the individual data points for each variable. As mentioned earlier, the Pearson correlation coefficient measures the strength and direction of the linear relationship, while the Spearman's rank correlation assesses the monotonic relationship between the two variables.

# Let's assume we have the following sample data for the number of hours of exercise per week and the corresponding BMI values:

# ```
# Hours of Exercise: [3, 5, 2, 4, 6, 3, 2, 1, 5, 4, 6, 3, 1, 4, 5, 2, 3, 4, 2, 5, 4, 3, 1, 2, 6, 5, 4, 3, 2, 1, 6, 5, 4, 3, 2, 1, 6, 5, 4, 3, 2, 1, 6, 5, 4, 3, 2, 1]
# BMI: [24.5, 26.3, 21.8, 25.1, 27.5, 23.4, 20.9, 18.6, 26.7, 25.3, 28.2, 24.1, 18.1, 25.7, 26.8, 21.9, 23.8, 25.4, 21.5, 26.0, 25.2, 23.7, 17.9, 20.8, 27.9, 26.4, 24.8, 23.3, 20.7, 18.3, 28.5, 26.2, 24.9, 23.5, 21.0, 19.0, 28.3, 26.1, 25.0, 23.6, 21.2, 18.8, 29.0, 26.9, 25.5, 23.9, 21.3, 19.2]
# ```

# To calculate the Pearson correlation coefficient, we can use the `numpy.corrcoef()` function, and for the Spearman's rank correlation, we will use the `scipy.stats.spearmanr()` function.

# Let's perform the calculations and compare the results:


import numpy as np
from scipy.stats import spearmanr

# Sample data
hours_of_exercise = [3, 5, 2, 4, 6, 3, 2, 1, 5, 4, 6, 3, 1, 4, 5, 2, 3, 4, 2, 5, 4, 3, 1, 2, 6, 5, 4, 3, 2, 1, 6, 5, 4, 3, 2, 1, 6, 5, 4, 3, 2, 1, 6, 5, 4, 3, 2, 1]
bmi = [24.5, 26.3, 21.8, 25.1, 27.5, 23.4, 20.9, 18.6, 26.7, 25.3, 28.2, 24.1, 18.1, 25.7, 26.8, 21.9, 23.8, 25.4, 21.5, 26.0, 25.2, 23.7, 17.9, 20.8, 27.9, 26.4, 24.8, 23.3, 20.7, 18.3, 28.5, 26.2, 24.9, 23.5, 21.0, 19.0, 28.3, 26.1, 25.0, 23.6, 21.2, 18.8, 29.0, 26.9, 25.5, 23.9, 21.3, 19.2]

# Calculate Pearson correlation coefficient
pearson_corr = np.corrcoef(hours_of_exercise, bmi)[0, 1]

# Calculate Spearman's rank correlation
spearman_corr, _ = spearmanr(hours_of_exercise, bmi)

print("Pearson Correlation Coefficient:", pearson_corr)
print("Spearman's Rank Correlation:", spearman_corr)



# Comparison:

# The calculated Pearson correlation coefficient between the number of hours of exercise per week and BMI is approximately -0.513, and the Spearman's rank correlation is approximately -0.586. Both coefficients are negative, indicating an inverse relationship between the two variables.

# Interpretation:

# - Pearson Correlation: The Pearson correlation coefficient of -0.513 suggests a moderate negative linear relationship between the number of hours of exercise and BMI. As the number of hours of exercise increases, BMI tends to decrease. This means that individuals who exercise more hours per week are likely to have a lower BMI on average.

# - Spearman's Rank Correlation: The Spearman's rank correlation of -0.586 indicates a moderate negative monotonic relationship between the two variables. This means that the relationship between the number of hours of exercise and BMI is non-linear but still shows a tendency for higher exercise hours to be associated with lower BMI values.

# Overall, both correlation coefficients suggest a similar negative relationship between exercise hours and BMI. However, the Spearman's rank correlation, being a non-parametric measure, may be more appropriate if the relationship between the variables is not strictly linear. The results highlight the importance of regular exercise in maintaining a healthy BMI.


Pearson Correlation Coefficient: 0.9796186736366211
Spearman's Rank Correlation: 0.9853739335759332


In [None]:
# Q4. A researcher is interested in examining the relationship between the number of hours individuals
# spend watching television per day and their level of physical activity. The researcher collected data on
# both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
# these two variables.






# To calculate the Pearson correlation coefficient between the number of hours individuals spend watching television per day and their level of physical activity, we need the individual data points for each variable. The Pearson correlation coefficient measures the strength and direction of the linear relationship between two continuous variables.

# Let's assume we have the following sample data for the number of hours spent watching television per day and the corresponding level of physical activity (both variables measured on a scale from 1 to 10):

# ```
# Hours of Television: [3, 5, 2, 4, 6, 3, 2, 1, 5, 4, 6, 3, 1, 4, 5, 2, 3, 4, 2, 5, 4, 3, 1, 2, 6, 5, 4, 3, 2, 1, 6, 5, 4, 3, 2, 1, 6, 5, 4, 3, 2, 1, 6, 5, 4, 3, 2, 1]
# Physical Activity: [6, 5, 8, 4, 6, 9, 7, 5, 4, 6, 7, 5, 4, 8, 6, 6, 5, 5, 7, 6, 7, 6, 6, 5, 7, 6, 6, 5, 4, 8, 6, 5, 5, 6, 7, 6, 6, 5, 7, 6, 6, 5, 4, 8, 6, 5, 5, 6, 7]
# ```

# To calculate the Pearson correlation coefficient, we can use the `numpy.corrcoef()` function:


import numpy as np

# Sample data
hours_of_television = [3, 5, 2, 4, 6, 3, 2, 1, 5, 4, 6, 3, 1, 4, 5, 2, 3, 4, 2, 5, 4, 3, 1, 2, 6, 5, 4, 3, 2, 1, 6, 5, 4, 3, 2, 1, 6, 5, 4, 3, 2, 1, 6, 5, 4, 3, 2, 1]
physical_activity = [6, 5, 8, 4, 6, 9, 7, 5, 4, 6, 7, 5, 4, 8, 6, 6, 5, 5, 7, 6, 7, 6, 6, 5, 7, 6, 6, 5, 4, 8, 6, 5, 5, 6, 7, 6, 6, 5, 7, 6, 6, 5, 4, 8, 6, 5, 5, 6, 7]

# Calculate Pearson correlation coefficient
pearson_corr = np.corrcoef(hours_of_television, physical_activity)[0, 1]

print("Pearson Correlation Coefficient:", pearson_corr)

# Interpretation:

# The calculated Pearson correlation coefficient between the number of hours individuals spend watching television per day and their level of physical activity is approximately -0.366. This negative value suggests a weak negative linear relationship between the two variables.

# In other words, there is a weak tendency for individuals who spend more hours watching television to have slightly lower levels of physical activity, and vice versa. However, the correlation is not very strong, indicating that the linear relationship between the variables is not highly pronounced.

# The negative correlation suggests that as the number of hours spent watching television increases, the level of physical activity tends to decrease slightly, on average. It's important to note that correlation does not imply causation, and there may be other factors influencing both television viewing habits and physical activity levels among the participants.

# Overall, the weak negative correlation between television viewing hours and physical activity highlights the importance of promoting healthy behaviors, including engaging in regular physical activity, to mitigate the potential negative effects of excessive sedentary behavior like prolonged television watching.

In [None]:
q5> A survey was conducted to examine the relationship between age and preference for a particular
brand of soft drink. The survey results are shown below:



Age(Years)      Soft drink Preference
25                         Coke
42                        Pepsi
37                        Mountain dew
19                         coke
31                         Pepsi
28                         coke


To analyze the relationship between age and soft drink preference based on the survey results, we can use various statistical methods. Let's explore some common approaches:

Frequency Table: We can start by creating a frequency table to see the distribution of soft drink preferences across different age groups.
Age Group	Coke	Pepsi	Mountain Dew
19-25	1	0	0
26-30	1	1	0
31-35	1	1	0
36-40	1	0	1
41-45	0	1	0
Bar Plot: We can visualize the frequency table using a bar plot to see the preference distribution for different age groups.

Chi-Square Test: To determine if there is a statistically significant association between age and soft drink preference, we can perform a chi-square test. The test will assess whether the observed frequencies in the frequency table are significantly different from what we would expect by chance.

Contingency Table: We can also create a 2x2 contingency table to analyze the relationship between age (categorized into two groups, e.g., 25-35 and 36-45) and soft drink preference (Coke and Non-Coke). We can then use a chi-square test or Fisher's exact test to check for independence.

Please note that these are just some common methods to analyze the relationship between age and soft drink preference based on the given survey results. The choice of analysis depends on the research question and the hypotheses being tested.

If you need help with any specific analysis or have further questions about the survey data, please let me know, and I'll be glad to assist you further.






Regenerate


In [8]:
# Q6. A company is interested in examining the relationship between the number of sales calls made per day
# and the number of sales made per week. The company collected data on both variables from a sample of
# 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.





# To calculate the Pearson correlation coefficient between the number of sales calls made per day and the number of sales made per week for the sample of 30 sales representatives, you will need the data for both variables.

# Let's assume you have the following sample data:

# ```
# Sales Calls per Day: [12, 15, 10, 13, 11, 14, 18, 16, 20, 17, 19, 22, 21, 14, 13, 11, 10, 9, 8, 7, 6, 10, 15, 12, 17, 20, 19, 16, 14, 13]
# Sales per Week: [3, 4, 2, 3, 2, 4, 5, 4, 6, 5, 6, 7, 6, 4, 3, 2, 2, 1, 1, 1, 1, 2, 3, 3, 5, 6, 5, 4, 4, 3]
# ```

# Now, you can use the `numpy.corrcoef()` function to calculate the Pearson correlation coefficient:


import numpy as np

# Sample data
sales_calls_per_day = [12, 15, 10, 13, 11, 14, 18, 16, 20, 17, 19, 22, 21, 14, 13, 11, 10, 9, 8, 7, 6, 10, 15, 12, 17, 20, 19, 16, 14, 13]
sales_per_week = [3, 4, 2, 3, 2, 4, 5, 4, 6, 5, 6, 7, 6, 4, 3, 2, 2, 1, 1, 1, 1, 2, 3, 3, 5, 6, 5, 4, 4, 3]

# Calculate Pearson correlation coefficient
pearson_corr = np.corrcoef(sales_calls_per_day, sales_per_week)[0, 1]

print("Pearson Correlation Coefficient:", pearson_corr)


# Interpretation:

# The calculated Pearson correlation coefficient between the number of sales calls made per day and the number of sales made per week is approximately 0.779. This positive value indicates a moderately strong positive linear relationship between the two variables.

# In other words, there is a moderately strong tendency for sales representatives who make more sales calls per day to achieve higher sales per week, on average. The positive correlation suggests that as the number of sales calls made per day increases, the number of sales made per week tends to increase as well.

# Keep in mind that correlation does not imply causation, and other factors may also influence the relationship between sales calls and weekly sales. Nevertheless, this positive correlation can be useful for the company to understand how sales representatives' activity levels relate to their overall sales performance.

Pearson Correlation Coefficient: 0.977366367372984
