### Question1

In [None]:
# To calculate the Pearson correlation coefficient between two variables (studying time and exam scores), you need the actual data values for both variables. Once you have the data, you can use the following formula to calculate the Pearson correlation coefficient:

# r=∑i=1 to n(xi−xˉ)(yi−yˉ)/∑i=1 to n(xi−xˉ)^2 * ∑i=1 to n(yi−yˉ)^2

# Where:

#    n is the number of data points.
#    xi and yi are the individual data points for studying time and exam scores, respectively.
#    xˉ and yˉ are the means of the studying time and exam scores, respectively.

#The Pearson correlation coefficient (rr) ranges from -1 to 1, where:

#    r=1: Perfect positive linear correlation (as one variable increases, the other variable increases proportionally).
#    r=−1: Perfect negative linear correlation (as one variable increases, the other variable decreases proportionally).
#    r=0: No linear correlation (variables are not linearly related).

#Interpretation:

#    If r is close to 1, it indicates a strong positive linear relationship between studying time and exam scores. This means that as students spend more time studying, their exam scores tend to increase.
#    If r is close to -1, it indicates a strong negative linear relationship. This might suggest that as studying time increases, exam scores tend to decrease, which could be counterintuitive.
#    If r is close to 0, it indicates no strong linear relationship between studying time and exam scores. In this case, the amount of time spent studying does not show a consistent pattern of influencing exam scores.

# Keep in mind that while the Pearson correlation coefficient measures linear relationships, it does not capture non-linear relationships between variables. Additionally, correlation does not imply causation, so even if a strong correlation is observed, it's important to consider other factors that might be influencing the relationship between studying time and exam scores.

### Question2

In [None]:
# Spearman's rank correlation is used to measure the strength and direction of the monotonic relationship between two variables. Monotonic relationships are those in which the variables tend to move together in the same direction, but not necessarily at a constant rate. This makes Spearman's rank correlation suitable for variables with non-linear relationships.

# To calculate Spearman's rank correlation, you need to rank the data points of both variables and then compute the correlation based on the ranks. Here's how you can calculate it:

#    Rank the Data:
#        Rank both the sleep data and the job satisfaction data independently, assigning ranks from 1 to nn (number of data points) based on their values, where the smallest value gets a rank of 1.

#    Calculate the Difference in Ranks (dd):
#        For each data point, calculate the difference in ranks between the two variables (d=Rank of Sleep−Rank of Job Satisfactiond=Rank of Sleep−Rank of Job Satisfaction).

#    Calculate Spearman's Rank Correlation (rs):
#        Use the formula rs=1−6∑d^2/n(n^2−1), where n is the number of data points.

#Interpretation:

#    If rs is close to 1, it indicates a strong positive monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction. This means that as the amount of sleep increases, job satisfaction tends to increase as well, and vice versa.
#    If rs is close to -1, it indicates a strong negative monotonic relationship. In this case, as the amount of sleep increases, job satisfaction tends to decrease, and vice versa.
#    If rs is close to 0, it indicates little to no monotonic relationship between the variables.

# Spearman's rank correlation is a non-parametric measure, which means it doesn't assume any specific distribution for the data and is suitable for variables that may not follow a linear pattern. However, like other correlation measures, it doesn't imply causation. If a significant correlation is found, further analysis and domain knowledge are needed to understand the nature of the relationship.

### Question3

In [None]:
# To calculate both the Pearson correlation coefficient and the Spearman's rank correlation between the number of hours of exercise per week and body mass index (BMI) for a sample of 50 participants, you need the actual data values for both variables. Let's assume you have the following data:

# Number of Hours of Exercise: [5, 3, 7, 2, 6, ...] (50 values)
# BMI: [22.5, 25.1, 30.0, 26.8, 28.5, ...] (50 values)

# Here's how you can calculate both correlations and compare the results:

import numpy as np
from scipy.stats import pearsonr, spearmanr

# Sample data (replace with actual data)
exercise_hours = [5, 3, 7, 2, 6, ...]
bmi = [22.5, 25.1, 30.0, 26.8, 28.5, ...]

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(exercise_hours, bmi)

# Calculate Spearman's rank correlation
spearman_corr, _ = spearmanr(exercise_hours, bmi)

print("Pearson Correlation:", pearson_corr)
print("Spearman's Rank Correlation:", spearman_corr)

#Comparison and Interpretation:

#    Pearson Correlation:
#        Pearson correlation measures the linear relationship between two continuous variables. It's suitable for assessing the strength and direction of a linear relationship.
#        If the Pearson correlation coefficient is close to 1, it indicates a strong positive linear relationship between exercise hours and BMI, implying that as exercise hours increase, BMI tends to increase.
#        If it's close to -1, it indicates a strong negative linear relationship, suggesting that as exercise hours increase, BMI tends to decrease.
#        If it's close to 0, there's little to no linear relationship between the variables.

#    Spearman's Rank Correlation:
#        Spearman's rank correlation measures the strength and direction of a monotonic relationship. It's suitable for assessing non-linear relationships as well.
#        A positive Spearman's rank correlation suggests that as exercise hours increase, BMI tends to increase monotonically.
#        A negative Spearman's rank correlation suggests the opposite: as exercise hours increase, BMI tends to decrease monotonically.
#        A Spearman's rank correlation close to 0 indicates little to no monotonic relationship.

# Comparing the Results:

#    If both Pearson and Spearman correlations have similar values and directions, it suggests a consistent linear or monotonic relationship between exercise hours and BMI.
#    If they have different values or directions, it might indicate a non-linear relationship or the presence of outliers that affect the linear relationship but not the monotonic relationship.

# In summary, calculating both the Pearson correlation coefficient and Spearman's rank correlation provides a comprehensive understanding of the relationship between exercise hours and BMI, considering both linear and monotonic aspects.

### Question4

In [None]:
# To calculate the Pearson correlation coefficient between the number of hours individuals spend watching television per day and their level of physical activity, you need the actual data values for both variables. Let's assume you have the following data:

# Hours of TV Watching: [2, 3, 4, 5, 1, ...] (50 values)
# Physical Activity Level: [120, 90, 80, 60, 150, ...] (50 values)

# Here's how you can calculate the Pearson correlation coefficient using Python:

import numpy as np
from scipy.stats import pearsonr

# Sample data (replace with actual data)
tv_hours = [2, 3, 4, 5, 1, ...]
physical_activity = [120, 90, 80, 60, 150, ...]

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(tv_hours, physical_activity)

print("Pearson Correlation:", pearson_corr)

# Interpretation:

#    The Pearson correlation coefficient will be a value between -1 and 1.
#    If the Pearson correlation coefficient is close to 1, it suggests a strong positive linear relationship between the number of hours individuals spend watching television per day and their level of physical activity. This means that as TV watching hours increase, physical activity tends to increase.
#    If the coefficient is close to -1, it suggests a strong negative linear relationship. This would imply that as TV watching hours increase, physical activity tends to decrease.
#    If the coefficient is close to 0, it suggests little to no linear relationship between the two variables.

# Keep in mind that correlation does not imply causation. While a significant correlation might indicate a relationship between TV watching and physical activity, it doesn't necessarily mean that one variable is causing changes in the other. Other factors could be influencing the observed relationship. Further analysis and interpretation are required to draw meaningful conclusions.

### Question5

In [None]:
# let's analyze the relationship between age and preference for a particular brand of soft drink.

# The dataset:

| Age | Soft Drink Preference |
|-----|-----------------------|
| 25  | Coke                  |
| 42  | Pepsi                 |
| 37  | Mountain Dew          |
| 19  | Coke                  |
| 31  | Pepsi                 |
| 28  | Coke                  |

# To analyze this data, we can follow these steps:

#    Visualize the Data:
#        Create a bar plot or any suitable visualization to show the distribution of soft drink preferences among different age groups.

#    Observe Patterns:
#        Examine if there are any trends or patterns in the data. For example, do certain age groups tend to prefer a specific brand of soft drink more than others?

#    Calculate Summary Statistics:
#        Calculate summary statistics for each brand of soft drink, such as the average age of respondents who prefer each brand.

#    Interpretation:
#        Based on the visualization and summary statistics, you can make observations about the relationship between age and soft drink preference. Are there any noticeable trends or differences among age groups in terms of their brand preferences?

#Keep in mind that with the limited dataset provided, the analysis might be preliminary. For a more comprehensive analysis, it's ideal to have a larger sample size and potentially include other relevant variables that could influence soft drink preferences (e.g., taste preferences, marketing exposure, etc.).

### Question6

In [None]:
#To calculate the Pearson correlation coefficient between the number of sales calls made per day and the number of sales made per week, you need the actual data values for both variables. Let's assume you have the following data:

# Sales Calls per Day: [20, 25, 18, 30, 22, ...] (30 values)
# Sales per Week: [5, 8, 4, 10, 6, ...] (30 values)

# Here's how you can calculate the Pearson correlation coefficient using Python:

import numpy as np
from scipy.stats import pearsonr

# Sample data (replace with actual data)
sales_calls_per_day = [20, 25, 18, 30, 22, ...]
sales_per_week = [5, 8, 4, 10, 6, ...]

# Calculate Pearson correlation coefficient
pearson_corr, _ = pearsonr(sales_calls_per_day, sales_per_week)

print("Pearson Correlation:", pearson_corr)

# Interpretation:

#    The Pearson correlation coefficient will be a value between -1 and 1.
#    If the Pearson correlation coefficient is close to 1, it suggests a strong positive linear relationship between the number of sales calls made per day and the number of sales made per week. This means that as the number of sales calls increases, the number of sales tends to increase as well.
#    If the coefficient is close to -1, it suggests a strong negative linear relationship. This could imply that more sales calls might lead to fewer sales, which might be counterintuitive.
#    If the coefficient is close to 0, it suggests little to no linear relationship between the two variables.

# Keep in mind that correlation does not imply causation. While a significant correlation might indicate a relationship between sales calls and sales, it doesn't necessarily mean that one variable is causing changes in the other. Other factors could be influencing the observed relationship. Further analysis and interpretation are required to draw meaningful conclusions.