#1. Are there any variables that do not provide information?
student_id: Each value is unique, thus it's an identifier with no predictive value.

#2. If you had to eliminate variables, which ones would you remove and why?
student_id – just an identifier, doesn't contribute to analysis.

3. Are there any variables with unusual data?
study_hours_per_day: Min is 0.0, which is possible but could be flagged.

sleep_hours: Min is 3.2, which is low but plausible for stressed students.

mental_health_rating: Range is 1–10, looks reasonable.

4. Are variables in similar ranges?
Ranges vary significantly:

exam_score: 18.4–100

attendance_percentage: 56–100

study_hours_per_day: 0–8.3

mental_health_rating: 1–10

exercise_frequency: 0–6

5. Does this affect the data analysis?
Yes. Different scales can distort visualizations (e.g., heatmaps) and algorithms like k-means or regression without normalization.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Set a style for consistency
sns.set(style="whitegrid")

# 1. Identify and potentially drop non-informative columns
non_informative_columns = ['student_id']
df_informative = df.drop(columns=non_informative_columns)

# 2. Boxplots for numeric variables
numeric_columns = df_informative.select_dtypes(include=['float64', 'int64']).columns

# Create boxplots
plt.figure(figsize=(15, 10))
for i, column in enumerate(numeric_columns, 1):
    plt.subplot(3, 3, i)
    sns.boxplot(y=df_informative[column], color="skyblue")
    plt.title(f'Boxplot of {column}')
    plt.tight_layout()

# 3. Histograms for numeric variables
plt.figure(figsize=(15, 10))
for i, column in enumerate(numeric_columns, 1):
    plt.subplot(3, 3, i)
    sns.histplot(df_informative[column], bins=20, kde=True, color='mediumseagreen')
    plt.title(f'Histogram of {column}')
    plt.xlabel(column)
    plt.tight_layout()

# 4. Correlation heatmap
correlation_matrix = df_informative.corr(numeric_only=True)
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_matrix,
            annot=True,
            cmap='coolwarm',
            fmt=".2f",
            linewidths=0.5,
            square=True)
plt.title('Heatmap of Numeric Variable Correlations')

plt.show()
