# 0. Setup and Data Information

This notebook contains common setups, library imports, and potentially helper functions that might be used across other notebooks in this project.

## Dataset Information (from HTML & Kaggle context)

**Dataset:** "Student Habits vs Academic Performance" (Synthetic)
**Source:** Kaggle (Simulated for demonstration)
**Records:** 1000 students
**Variables:** 16 (including student ID and target)
**Target Variable:** `exam_score`

### Variables:

1.  `student_id`: Unique identifier for each student.
2.  `age`: Age of the student (numeric).
3.  `gender`: Gender of the student (categorical: Male, Female, Other).
4.  `study_hours_per_day`: Average hours spent studying per day (numeric).
5.  `social_media_hours`: Average hours spent on social media per day (numeric).
6.  `netflix_hours`: Average hours spent on Netflix/streaming per day (numeric).
7.  `part_time_job`: Whether the student has a part-time job (categorical: Yes, No).
8.  `attendance_percentage`: Class attendance percentage (numeric).
9.  `sleep_hours`: Average hours of sleep per night (numeric).
10. `diet_quality`: Quality of diet (categorical: Good, Fair, Poor).
11. `exercise_frequency`: Days of exercise per week (numeric, e.g., 0-7).
12. `parental_education_level`: Highest education level of parents (categorical: None, High School, Bachelor, Master, etc.).
13. `internet_quality`: Quality of internet connection (categorical: Good, Average, Poor).
14. `mental_health_rating`: Self-rated mental health (numeric scale, e.g., 1-10).
15. `extracurricular_participation`: Participation in extracurricular activities (categorical: Yes, No).
16. `exam_score`: Final exam score (numeric, target variable).

## Common Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Plotting defaults
plt.style.use('seaborn-v0_8-whitegrid') # Using a seaborn style
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12
sns.set_palette("viridis") # A nice color palette

## Helper Functions (Example)

This function mimics the bin calculation from the HTML's JavaScript for histograms if needed, though seaborn/matplotlib often handle this well with `bins='auto'` or `bins='sqrt'`.

In [None]:
def calculate_histogram_bins_edges(data_series, num_bins_option=None):
    """Calculates bin edges for a histogram, similar to Chart.js logic."""
    values = data_series.dropna().values
    if len(values) == 0:
        return [], []
    
    min_val = np.min(values)
    max_val = np.max(values)
    
    if min_val == max_val:
        # Handle case where all values are the same
        return [min_val - 0.5, min_val + 0.5], [f"{min_val:.1f}"]
        
    if num_bins_option is None:
        num_bins = int(np.ceil(np.sqrt(len(values))))
    else:
        num_bins = num_bins_option
        
    if num_bins == 0 : num_bins = 1 # Avoid division by zero for tiny datasets
        
    bin_edges = np.linspace(min_val, max_val, num_bins + 1)
    
    # Create labels like 'start-end'
    bin_labels = []
    for i in range(num_bins):
        bin_labels.append(f"{bin_edges[i]:.1f}-{bin_edges[i+1]:.1f}")
        
    return bin_edges, bin_labels

# Example usage (will be more relevant in EDA notebook)
# sample_data = pd.Series(np.random.rand(100) * 100)
# edges, labels = calculate_histogram_bins_edges(sample_data)
# print(f"Calculated bin edges: {edges}")
# print(f"Calculated bin labels: {labels}")