Correlation Study
This notebook explores the correlation between features in the dataset.
We visualize how different variables relate to each other using a correlation matrix and heatmap.

## Load the Dataset\nLoad the cleaned dataset to analyze correlations.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import streamlit as st  # Ensure Streamlit is imported for UI rendering

# Load dataset
data = pd.read_csv("../data/final_cleaned_train.csv")  # Ensure path is correct

# Display the first few rows
st.write("### Sample Data", data.head())  


## Correlation Matrix
Correlation measures the relationship between numerical features in the dataset. 
A correlation matrix helps identify which variables influence each other the most.


In [None]:
# Compute correlation matrix
correlation_matrix = data.corr(numeric_only=True)  # Ensures only numerical columns are used

# Display correlation heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')

# Render heatmap in Streamlit
st.pyplot()


## Key Takeaways from the Correlation Matrix
- Features with high positive correlation influence each other directly.
- Features with high negative correlation move in opposite directions.
- Features with very low correlation are independent of each other.
- Understanding correlation helps in feature selection for machine learning models.
