## Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

In [1]:
import pandas as pd
df = pd.DataFrame({
    'Time_Studied': [4,3,5,7,6],
    'Marks_Obtained': [60 ,45, 70, 90, 83]
})
df.corr(method="pearson")

Unnamed: 0,Time_Studied,Marks_Obtained
Time_Studied,1.0,0.993678
Marks_Obtained,0.993678,1.0


## Q2.Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.


In [2]:
import pandas as pd

# Sample data
data = {'sleep' : [8, 7, 6, 5, 9, 7, 8, 6, 4, 9], 'job_satisfaction' : [7, 6, 5, 4, 8, 6, 7, 5, 3, 9]}
df = pd.DataFrame(data)

# calculate the spearman correlation coefficient
corr_coef = df['sleep'].corr(df['job_satisfaction'], method='spearman')
print('Spearman correlation coefficient:', corr_coef)

Spearman correlation coefficient: 0.996908802495909


## Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.

In [4]:
import pandas as pd
from scipy.stats import spearmanr

# create a dataframe with the data
data = {'exercise_hours': [3, 2, 4, 1, 5, 3, 2, 4, 1, 5],
        'bmi': [22.5, 24.3, 21.7, 27.8, 19.5, 23.1, 25.6, 20.4, 28.2, 18.9]}
df = pd.DataFrame(data)

# calculate the Pearson correlation coefficient
df.corr(method='pearson')



Unnamed: 0,exercise_hours,bmi
exercise_hours,1.0,-0.981848
bmi,-0.981848,1.0


In [5]:
# calculate the Spearman's rank correlation
df.corr(method='spearman')

Unnamed: 0,exercise_hours,bmi
exercise_hours,1.0,-0.984732
bmi,-0.984732,1.0


## Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.

In [9]:
import numpy as np
import pandas as pd

# generate random data for TV hours and physical activity
tv_hours = np.random.normal(4, 1.5, 50)
physical_activity = np.random.normal(5, 2, 50)

# sample data
data = {'tv_hours': tv_hours,
        'physical_activity': physical_activity}
df = pd.DataFrame(data)

# calculate the Pearson correlation coefficient
pearson_coef = df['tv_hours'].corr(df['physical_activity'], method='pearson')

print('Pearson correlation coefficient:', pearson_coef)
df.head()

Pearson correlation coefficient: 0.13988372880756791


Unnamed: 0,tv_hours,physical_activity
0,5.147582,4.35228
1,2.756517,4.623406
2,3.011273,3.199983
3,4.916685,3.137996
4,3.78398,2.554526


## Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:

![Capture.PNG](attachment:ed7bc12b-7e46-4156-a9bb-9af68ad11a2f.PNG)

In [10]:
df = pd.DataFrame({
    "Age(Years)": [25, 42, 37, 19, 31, 28],
    "Soft Drink Preference": ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']
})
df.head()

Unnamed: 0,Age(Years),Soft Drink Preference
0,25,Coke
1,42,Pepsi
2,37,Mountain Dew
3,19,Coke
4,31,Pepsi


In [13]:
##For finding Pearson correlation, we first need to convert the categorical variable to numerical variable. 
##We can do so by encoding it:
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
encoded_col = encoder.fit_transform(df['Soft Drink Preference'])
df_encoded = pd.DataFrame(encoded_col, columns=['Soft Drink Preference Enc'])
df_new = pd.concat([df, df_encoded], axis=1)
df_new = df_new.drop('Soft Drink Preference', axis=1)
df_new.head()

Unnamed: 0,Age(Years),Soft Drink Preference Enc
0,25,0
1,42,2
2,37,1
3,19,0
4,31,2


In [14]:
df_new.corr(method='pearson')

Unnamed: 0,Age(Years),Soft Drink Preference Enc
Age(Years),1.0,0.769175
Soft Drink Preference Enc,0.769175,1.0


## Q6. A company is interested in examining the relationship between the number of sales calls made per dayand the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [15]:
import pandas as pd
import numpy as np

# Generate random data for number of sales calls and sales made per week
np.random.seed(123)
num_calls = np.random.randint(50, 100, size=30)
num_sales = np.random.randint(5, 15, size=30)

# Sample Data
data = {'num_sales_calls': num_calls, 'num_sales_made': num_sales}
df = pd.DataFrame(data)
df.head()

Unnamed: 0,num_sales_calls,num_sales_made
0,95,7
1,52,9
2,78,13
3,84,5
4,88,12


In [16]:
df.corr(method="pearson")

Unnamed: 0,num_sales_calls,num_sales_made
num_sales_calls,1.0,-0.209154
num_sales_made,-0.209154,1.0
