## Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

In [None]:
The Pearson correlation coefficient, often denoted as r, is a measure of the linear relationship between two continuous
variables. It quantifies the strength and direction of the linear association between the variables. The correlation
coefficient can take values between -1 and 1, where:

    ~r=1 indicates a perfect positive linear relationship (as one variable increases, the other increases proportionally).
    ~r=−1 indicates a perfect negative linear relationship (as one variable increases, the other decreases proportionally).
    ~r=0 indicates no linear relationship; the variables are not linearly correlated.
    
To calculate the Pearson correlation coefficient between the amount of time students spend studying for an exam and their 
final exam scores, you can use the following formula:

            r =  ∑(Xi -Xˉ)(Yi-Yˉ)/ ∑(Xi - Xˉ)2 ∑(Yi − Yˉ)2

Where:

    ~Xi and Yi are individual data points for the two variables.
    ~Xˉ and Yˉ are the means (average values) of the two variables.
    
Here's a simplified explanation of the steps to calculate and interpret the Pearson correlation coefficient:

1.Calculate the mean (Xˉ) and standard deviation (SX) of the amount of time students spend studying and the mean (Yˉ) and
  standard deviation (SY) of their final exam scores.

2.Calculate the covariance between the two variables (Cov(X,Y)):

        Cov(X,Y)= ∑(Xi − Xˉ)(Yi − Yˉ) / N
        
3.Calculate the Pearson correlation coefficient (r):

        r= Cov(X,Y) / SX⋅SY

4.Interpret the result:

    ~If r is close to 1, it indicates a strong positive linear relationship, meaning that as the amount of time spent
     studying increases, final exam scores tend to increase proportionally.
    ~If r is close to -1, it indicates a strong negative linear relationship, meaning that as the amount of time spent 
     studying increases, final exam scores tend to decrease proportionally.
    ~If r is close to 0, it suggests no significant linear relationship between the two variables.
    
For example, if you calculate r and find that it is 0.75, you could interpret this as a strong positive linear relationship,
suggesting that students who spend more time studying tend to achieve higher final exam scores.

However, keep in mind that correlation does not imply causation. A high correlation does not necessarily mean that studying
more causes higher exam scores; other factors may be involved. Additionally, Pearson correlation measures only linear
relationships and may not capture more complex dependencies between variables.

## Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.

In [None]:
Spearman's rank correlation coefficient, often denoted as ρ (rho), is a non-parametric measure of the monotonic relationship
between two variables. Unlike the Pearson correlation coefficient, Spearman's rank correlation does not assume linearity; it 
assesses whether there is a consistent monotonic relationship between the variables, which means that as one variable
increases, the other tends to consistently increase or decrease.

Here's how you can calculate Spearman's rank correlation and interpret the result for the relationship between the amount of 
sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10:

1.Rank the Data:

    ~First, rank the values for both variables independently. Assign a rank of 1 to the smallest value, 2 to the second
     smallest, and so on. If there are ties (i.e., two or more values are the same), assign each tied value the average rank.
        
2.Calculate the Difference in Ranks:

    ~For each data point, calculate the difference in ranks between the two variables.
    
3.Square the Differences:

    ~Square the differences in ranks.
    
4.Calculate Spearman's Rank Correlation:

    ~Use the following formula to calculate Spearman's rank correlation (ρ):
            ρ=1−6∑(d2) / n(n2 −1)

Where:

        ~d represents the differences in ranks between the two variables for each data point.
        ~n is the number of data points.
        
5.Interpret the Result:

    ~If ρ is close to 1, it indicates a strong positive monotonic relationship, meaning that as the amount of sleep
     individuals get each night increases, their job satisfaction tends to increase monotonically.
    ~If ρ is close to -1, it indicates a strong negative monotonic relationship, meaning that as the amount of sleep
     individuals get each night increases, their job satisfaction tends to decrease monotonically.
    ~If ρ is close to 0, it suggests no significant monotonic relationship between the two variables.
    
For example, if you calculate ρ and find that it is approximately -0.85, you could interpret this as a strong negative
monotonic relationship. This suggests that individuals who get more sleep tend to report lower job satisfaction levels, and 
as they get less sleep, their job satisfaction tends to increase monotonically.

Spearman's rank correlation is a useful measure when dealing with ordinal or non-linear relationships between variables or
when data is not normally distributed. It assesses the strength and direction of monotonic associations, making it suitable 
for a wide range of data types and relationships.

## Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

In [None]:
To examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults, 
you can calculate both the Pearson correlation coefficient (r) and the Spearman's rank correlation coefficient (ρ). These two
measures will provide insights into the linear and monotonic relationships between the variables, respectively.

Here are the steps to calculate and compare both correlation coefficients:

Step 1: Collect and Organize Data

Collect data on the number of hours of exercise per week and the BMI for 50 participants.

Step 2: Calculate Pearson Correlation (r)

Use the Pearson correlation formula:

        r= ∑(Xi− Xˉ)/ ∑(Xi − Xˉ)2∑(Yi − Yˉ)2

Where:

    ~Xi represents the number of hours of exercise for participant i.
    ~Yi represents the BMI for participant i.
    ~Xˉ is the mean number of hours of exercise.
    ~Yˉ is the mean BMI.
    
Step 3: Calculate Spearman's Rank Correlation (ρ)

    1.Rank the data for both variables independently.
    2.Calculate the differences in ranks between the two variables (d).
    3.Square the differences (d2).
    
Calculate ρ using the formula:
    
    ρ=1− 6∑(d2)/n(n2−1)

Where:

    ~n is the number of participants (in this case, 50).
    ~d represents the differences in ranks.
    
Step 4: Interpret the Results

    ~If r is close to 1, it suggests a strong positive linear relationship, indicating that as the number of hours of
     exercise per week increases, BMI tends to decrease linearly.

    ~If r is close to -1, it suggests a strong negative linear relationship, indicating that as the number of hours of
     exercise per week increases, BMI tends to increase linearly.

    ~If r is close to 0, it suggests no significant linear relationship between the variables.

    ~If ρ is close to 1, it suggests a strong positive monotonic relationship, indicating that as the number of hours of 
     exercise per week increases, BMI tends to decrease monotonically.

    ~If ρ is close to -1, it suggests a strong negative monotonic relationship, indicating that as the number of hours of 
     exercise per week increases, BMI tends to increase monotonically.

    ~If ρ is close to 0, it suggests no significant monotonic relationship between the variables.

Comparing the two correlation coefficients:

    ~If both r and ρ are similar in magnitude and sign, it suggests a strong and consistent relationship between the 
     variables, whether linear or monotonic.
    ~If r and ρ differ significantly, it may indicate that while there might be a monotonic relationship, it might not be 
     strictly linear.
        
Interpreting the results will depend on the calculated values of r and ρ in your specific dataset.

## Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

In [None]:
To calculate the Pearson correlation coefficient (r) between the number of hours individuals spend watching television per 
day and their level of physical activity, you can follow these steps:

Step 1: Collect and Organize Data

    ~Collect data on the number of hours individuals spend watching television per day and their level of physical activity
     for 50 participants.

Step 2: Calculate the Mean and Standard Deviation

    ~Calculate the mean (Xˉ) and standard deviation (SX) for the number of hours of television watching and the mean (Yˉ) and
     standard deviation (SY) for the level of physical activity.

Step 3: Calculate the Covariance

    ~Calculate the covariance (Cov(X,Y)) between the number of hours of television watching and the level of physical 
     activity using the formula:

             Cov(X,Y) = 1/N ∑i=1N (Xi − Xˉ)(Yi − Yˉ)

Where:

    ~Xi represents the number of hours of television watching for participant i.
    ~Yi represents the level of physical activity for participant i.
    ~N is the number of participants (in this case, 50).
    ~Xˉ and Yˉ are the means calculated in Step 2.
    
Step 4: Calculate the Pearson Correlation Coefficient

Use the following formula to calculate the Pearson correlation coefficient (r):

        r = Cov(X,Y)/SX⋅SY

Where:

    ~SX is the standard deviation of the number of hours of television watching.
    ~SY is the standard deviation of the level of physical activity.
    
Step 5: Interpret the Result

    ~If r is close to 1, it indicates a strong positive linear relationship, suggesting that as the number of hours
     individuals spend watching television per day increases, their level of physical activity tends to decrease linearly.
    ~If r is close to -1, it indicates a strong negative linear relationship, suggesting that as the number of hours 
     individuals spend watching television per day increases, their level of physical activity tends to increase linearly.
    ~If r is close to 0, it suggests no significant linear relationship between the two variables.
    
Interpret the calculated value of r based on the data you have collected. Keep in mind that Pearson correlation measures the
strength and direction of linear association, so it may not capture more complex or non-linear relationships between the 
variables.

## Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:

In [None]:
The survey data you provided appears to consist of respondents' ages and their preferences for a particular brand of soft
drink. To analyze the relationship between age and brand preference, you can follow these steps:

1.Organize the Data: First, organize the data by separating the ages and brand preferences:
    
Ages (Years)   Brand Preference
25              Coke
42              Pepsi
37              Mountain Dew
19              Coke
31              Pepsi
28              Coke

2.Group Data by Brand: Group the data by brand preference to see how many respondents prefer each brand:

    ~Coke: 25, 19, 28
    ~Pepsi: 42, 31
    ~Mountain Dew: 37
    
3.Calculate Summary Statistics: Calculate summary statistics for the ages within each brand group. You can calculate the
 mean (average) age for each brand group to get an idea of the central tendency of ages for each brand preference.

    ~Mean age for Coke preference: (25 + 19 + 28) / 3 = 24
    ~Mean age for Pepsi preference: (42 + 31) / 2 = 36.5
    ~Mean age for Mountain Dew preference: 37
    
4.Analyze the Data: Based on the calculated mean ages for each brand preference group, you can make some observations:

    ~On average, respondents who prefer Coke have a mean age of 24 years.
    ~On average, respondents who prefer Pepsi have a mean age of 36.5 years.
    ~Respondents who prefer Mountain Dew have a mean age of 37 years.
    
Interpret the Relationship: From the data, it appears that respondents who prefer Pepsi tend to be older on average compared
to those who prefer Coke. However, this analysis is basic and does not establish a causal relationship. It's essential to
keep in mind that age and brand preference might be influenced by various factors, and this simple analysis does not account
for other variables or statistical tests to confirm the relationship.

To conduct a more rigorous analysis of the relationship between age and brand preference, you may consider using statistical 
tests like chi-squared tests or regression analysis, depending on the nature of your research question and the data you have
available. These tests can provide more insights into whether the observed differences in mean ages are statistically
significant and if there is a meaningful relationship between age and brand preference.


## Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [None]:
To calculate the Pearson correlation coefficient (r) between the number of sales calls made per day and the number of sales
made per week for 30 sales representatives, you can follow these steps:

Step 1: Collect and Organize Data

    ~Collect data on the number of sales calls made per day and the number of sales made per week for each of the 30 sales
     representatives.

Step 2: Calculate the Mean and Standard Deviation

    ~Calculate the mean (Xˉ) and standard deviation (SX) for the number of sales calls made per day and the mean (Yˉ) and 
     standard deviation (SY) for the number of sales made per week.

Step 3: Calculate the Covariance

    ~Calculate the covariance (Cov(X,Y)) between the number of sales calls made per day and the number of sales made per week
     using the formula:

                Cov(X,Y) = 1/N  ∑i=1N (Xi − Xˉ)(Yi − Yˉ)

Where:

    ~Xi represents the number of sales calls made per day for sales representative i.
    ~Yi represents the number of sales made per week for sales representative i.
    ~N is the number of sales representatives (30 in this case).
    ~Xˉ and Yˉ are the means calculated in Step 2.
    
Step 4: Calculate the Pearson Correlation Coefficient

Use the following formula to calculate the Pearson correlation coefficient (r):

        r = Cov(X,Y)/SX⋅SY

Where:

    ~SX is the standard deviation of the number of sales calls made per day.
    ~SY is the standard deviation of the number of sales made per week.
    
Step 5: Interpret the Result

Interpret the calculated value of r based on the data you have collected. The Pearson correlation coefficient measures the
strength and direction of the linear relationship between the two variables. Here are some possible interpretations:

    ~If r is close to 1, it suggests a strong positive linear relationship, indicating that as the number of sales calls made
     per day increases, the number of sales made per week tends to increase linearly.
    ~If r is close to -1, it suggests a strong negative linear relationship, indicating that as the number of sales calls 
     made per day increases, the number of sales made per week tends to decrease linearly.
    ~If r is close to 0, it suggests no significant linear relationship between the two variables.

Interpreting the results will depend on the calculated value of r in your specific dataset. If r is statistically significant,
it indicates a meaningful linear relationship between the number of sales calls and the number of sales made per week for the 
sales representatives.