1. You are conducting a study to explore the relationships between various factors and their potential correlation with the test scores of a group of individuals. You have gathered data from a group of individuals, and the table below represents the hours spent on different activities, along with their corresponding test scores. Your task is to analyze the data and determine the type of correlation, if any, between each variable and the test scores, without using built-in functions for calculating correlation.


In [2]:
import numpy as np
import pandas as pd
import math

In [3]:
#Function for calculating the correlation coeffeicient between each pair of attributes in the dataset.
#It takes two features and computes the correlation.

def corre_coeff(x, y):

    n = len(x)

    # Calculate sums
    sum_x = sum(x)
    sum_y = sum(y)
    sum_x2, sum_y2 = 0, 0
    sum_xy = 0

    for xi in x:
        sum_x2 += xi**2
    
    for yi in y:
        sum_y2 += yi**2
    
    
    for xi, yi in zip(x, y):
        sum_xy += xi *yi

    numerator = n * sum_xy - sum_x * sum_y
    denominator = ((n * sum_x2 - sum_x ** 2) * (n * sum_y2 - sum_y ** 2)) ** 0.5

    correlation_coefficient = numerator / denominator

    return correlation_coefficient


def make_matrix(df):
    
    #features and length of them
    features = df.columns
    n = len(features)

    #Init a zero based matrix
    corr_matrix = [[0] * n for _ in range(n)] 

    for i in range(n):
        for j in range(n):
            if i <= j:
                corr = corre_coeff(df[features[i]], df[features[j]])
                corr_matrix[i][j] = corr
                corr_matrix[j][i] = corr

    return corr_matrix


In [4]:
#Store the initial data in a dictionary.
d = {'Hours studied':[2, 3, 4, 5, 6, 7], 'Hours Watching T.V': [4, 3, 2, 1, 0, 0], 'Outdoor Activity Time':[2, 4, 6, 8, 10, 12],'Hours Listening to Music':[2,3,4,1,5,0], 'Water Consumed':[5,6,5,6,4,5], 'Test Score':[65,70,75,80,85,90]}
df = pd.DataFrame(d)
df
n = 6

In [5]:
#Correlation matrix for the above dataset
corr_matrix = make_matrix(df)
print("The Correlation matrix for the above data is:\n", np.array(corr_matrix))

The Correlation matrix for the above data is:
 [[ 1.         -0.98198051  1.         -0.2        -0.3550358   1.        ]
 [-0.98198051  1.         -0.98198051  0.06546537  0.3796283  -0.98198051]
 [ 1.         -0.98198051  1.         -0.2        -0.3550358   1.        ]
 [-0.2         0.06546537 -0.2         1.         -0.49705012 -0.2       ]
 [-0.3550358   0.3796283  -0.3550358  -0.49705012  1.         -0.3550358 ]
 [ 1.         -0.98198051  1.         -0.2        -0.3550358   1.        ]]


Based on the provided data, calculate and interpret the correlation coefficient for each variable in relation to the test scores. Identify the variables that show a positive correlation, a negative correlation, and no significant correlation with the test scores.


In [9]:

corr_df = pd.DataFrame(corr_matrix, columns=df.columns, index=df.columns)
corr_df

Unnamed: 0,Hours studied,Hours Watching T.V,Outdoor Activity Time,Hours Listening to Music,Water Consumed,Test Score
Hours studied,1.0,-0.981981,1.0,-0.2,-0.355036,1.0
Hours Watching T.V,-0.981981,1.0,-0.981981,0.065465,0.379628,-0.981981
Outdoor Activity Time,1.0,-0.981981,1.0,-0.2,-0.355036,1.0
Hours Listening to Music,-0.2,0.065465,-0.2,1.0,-0.49705,-0.2
Water Consumed,-0.355036,0.379628,-0.355036,-0.49705,1.0,-0.355036
Test Score,1.0,-0.981981,1.0,-0.2,-0.355036,1.0


In [30]:
corr_with_test_scores = corr_df['Test Score'].sort_values(ascending=False)

# Identify positive correlation
positive_corr = corr_with_test_scores[corr_with_test_scores > 0]
print("Columns with positive correlation with Test Score:\n", positive_corr)
negative_corr = corr_with_test_scores[corr_with_test_scores < 0]
print("Columns with negative correlation with Test Score:\n", negative_corr)

Columns with positive correlation with Test Score:
 Hours studied            1.0
Outdoor Activity Time    1.0
Test Score               1.0
Name: Test Score, dtype: float64
Columns with negative correlation with Test Score:
 Hours Listening to Music   -0.200000
Water Consumed             -0.355036
Hours Watching T.V         -0.981981
Name: Test Score, dtype: float64


Explain why there might be a positive correlation between the "Hours Studied" variable and the test scores. Provide a brief discussion on how this information could be valuable for improving academic performance.


Explanation: The positive correlation between the "Hours Studied" variable and test scores suggests that as students spend more time studying, their test scores tend to improve. This relationship is intuitive: more time spent on studying leads to better understanding of the material, enhanced retention, and improved problem-solving skills.

As there is a positive correlation between No of hours of study and test score, we can  help students by encouraging them to increase their no of study hours so that they can score well.

Calculate and interpret the correlation coefficient between "Hours Watching TV" and test scores. Explain the implications of this correlation in terms of academic achievement and time management.

In [31]:
print("Correlation coefficient between Hours Watching TV and test scores", corr_df['Test Score'].loc['Hours Watching T.V'])

Correlation coefficient between Hours Watching TV and test scores -0.9819805060619657


Explanation: Hours Watching T.V has a strong negative correlation, indicating that students who spend more time watching TV tend to have significantly lower test scores  which may have a negative impact on students.
So we can guide students to lower the time that they are spending on watching T.V so that they have more time to study and to make sure that they are less distracted.

Calculate and interpret the correlation coefficient between "Hours Listening to Music" and test scores. Discuss the potential impact of this correlation on concentration and study habits.

In [33]:
print("Correlation coefficient between Hours Listening Music and test scores", corr_df['Test Score'].loc['Hours Listening to Music'])

Correlation coefficient between Hours Listening Music and test scores -0.2


Explanation: The correlation between Hours Listening to Music and test scores is very weakly negative which implies that it has very little significance to the test scores of a particular student.

Calculate and interpret the correlation coefficient between "Water Consumed" and test scores. Provide a potential explanation for the observed correlation and its relevance to cognitive function.

In [35]:
print("Correlation coefficient between Water Consumed and test scores", corr_df['Test Score'].loc['Water Consumed'])

Correlation coefficient between Water Consumed and test scores -0.3550358012483632


Explanation: : The coefficient −0.355−0.355 indicates a moderate negative correlation between water consumption and test scores. This suggests that as water consumption increase or decrease.
Generally, hydration is crucial for maintaining cognitive function, attention, and memory. However, if the correlation is negative, it could point to an area where further investigation is needed to understand how water consumption interacts with other factors influencing academic performance.

Calculate and interpret the correlation coefficient between "Outdoor Activity Time" and test scores. Discuss how physical activity might influence academic performance.

In [36]:
print("Correlation coefficient between Outdoor Activity Time and test scores", corr_df['Test Score'].loc['Outdoor Activity Time'])

Correlation coefficient between Outdoor Activity Time and test scores 1.0


Explanation: A correlation coefficient of 11 between "Outdoor Activity Time" and test scores indicates a perfect positive correlation. This means that as the amount of time spent on outdoor activities increases, test scores increase in a perfectly linear relationship.

Regular physical activity can enhance concentration and attention,can lead to better performance in academic tasks.
Outdoor activities can boost overall energy levels and reduce fatigue, leading to improved performance in both physical and cognitive tasks.