<div align="center">

# üìö Student Performance Analysis
### A Deep Dive into the Factors of Academic Success üéì
by Alfonso Cifuentes Alonso

<div align="center" style="width: 50%; margin: auto; border-radius: 10px; border: 2px solid #000; box-shadow: 0 4px 8px rgba(0,0,0,0.2);">
<img src="https://images.unsplash.com/photo-1501503069356-3c6b82a17d89?ixlib=rb-1.2.1&auto=format&fit=crop&w=600&q=80" alt="Student Analysis Banner" style="width: 100%; border-radius: 8px;">
</div>

</div>

# üìö Table of Contents

### üîé Preparation and Setup
- [Install Required Packages](#-install-required-packages)
- [Import Libraries](#-import-libraries)
- [Configuration Settings](#-configuration-settings)

### üìä Initial Data Exploration
- [Load Data](#-load-data)
- [Utility Functions](#utility-functions)
- [Variable Distribution](#-variable-distribution)

### üìà Detailed Analyses
- [Study Time and Performance](#-exploratory-data-analysis-eda)
- [Social Media and Grades](#-social-media-hours-vs-grades)
- [Netflix and Performance](#-netflix-hours-vs-grades)
- [Sleep and Performance](#-sleep-hours-vs-grades)
- [Gender and Performance](#-academic-performance-by-gender)
- [Part-Time Work](#-impact-of-part-time-work-on-academic-performance)
- [Diet Quality](#-impact-of-diet-quality-on-academic-performance)
- [Extracurricular Activities](#-academic-performance-and-extracurricular-participation)

### üîç Multifactorial Analyses
- [Diet and Study Hours](#-multifactorial-analysis-diet-quality-and-study-hours-vs-grades)
- [Extracurricular Activities and Study](#-group-analysis-extracurricular-activities-and-study-vs-grades)
- [Sleep and Mental Health](#-multidimensional-analysis-sleep-mental-health-and-academic-performance)

#### üõ†Ô∏è Install Required Packages

In [None]:
# pip install: command to install Python packages
# Required packages:
# - pandas: for data analysis and manipulation
# - numpy: for numerical calculations
# - matplotlib: for basic data visualization
# - seaborn: for advanced statistical visualization
# - scikit-learn: for machine learning
# - plotly: for interactive plots
# - scipy: for scientific calculations
!pip install pandas numpy matplotlib seaborn scikit-learn plotly plotly_express scipy

#### üîß Import Libraries

In [None]:
# Import each library with the standard alias used in the data community
import pandas as pd          # pd is the standard alias for pandas
import numpy as np           # np is the standard alias for numpy
import matplotlib.pyplot as plt  # plt is the standard alias for pyplot
import seaborn as sns        # sns is the standard alias for seaborn
import plotly.express as px  # px is the standard alias for plotly express
from scipy import stats      # import the stats module from scipy
import statsmodels.api as sm # sm is the standard alias for statsmodels
import warnings              # module to handle warnings
warnings.filterwarnings("ignore")  # suppress warnings for clean output

### <center>üõ†Ô∏è CONFIGURATION SETTINGS üõ†Ô∏è </center>

In [None]:
# ==== VISUAL STYLE CONFIGURATION ====
# Set the visual style for all plots
sns.set_theme(style="darkgrid")  # Set a dark grid theme
plt.style.use('dark_background')  # Apply dark theme to matplotlib

# Detailed visualization parameter configuration
plt.rcParams.update({
    'figure.figsize': (12, 4),     # Default figure size: 12 units wide x 4 high
    'font.size': 12,               # Default font size
    'axes.grid': True,             # Enable grid on axes
    'grid.alpha': 0.3,             # Grid transparency (0=transparent, 1=opaque)
    'axes.facecolor': '#2A2A2A',   # Plot background color (dark gray)
    'figure.facecolor': '#1A1A1A'  # Figure background color (darker gray)
})

# Set color palette
sns.set_palette("husl")  # HUSL palette: evenly spaced colors in color space

#### üìä Load Data

In [None]:
# Read the CSV file with pandas and store it in a DataFrame called 'df'
df = pd.read_csv('student_habits_performance.csv')

In [None]:
# Show the first 2 rows of the DataFrame df to see the structure and first records
df.head(2)

In [None]:
# Show general information about the DataFrame df, including number of rows and columns, data types, and non-null values
df.info()

###  üîß Utility Functions

In [None]:
def unique_values(df, cols):
    """
    Function to display unique values in categorical columns

    Parameters:
    df (DataFrame): The DataFrame to analyze
    cols (list): List of columns to examine

    Returns:
    None - Prints the results to the screen
    """
    for col in cols:
        if df[col].dtype == 'object':  # Check if the column is categorical
            print(f"Unique values in {col}:")
            print(df[col].unique())
            print("\n")
        else:
            print(f"{col} is not an object type column.\n")
# We do not use the first column because it is the student id and not relevant for analysis
unique_values(df, df.select_dtypes(include=['object']).columns[1:])

In [None]:
# Show the shape of the DataFrame df, indicating the number of rows and columns
df.shape

In [None]:
df.isnull().sum() / len(df) * 100
# There are no missing values in the dataset

In [None]:
# Replace NaN values with "No data" in parental_education_level
df['parental_education_level'].fillna('No data', inplace=True)

In [None]:
# Check for duplicates in the DataFrame df
# This helps identify if there are repeated rows that could affect the analysis
duplicates = df.duplicated().sum()
print(f"Number of duplicated rows: {duplicates}")

In [None]:
# If there are duplicates, remove them
unique_values(df, df.select_dtypes(include=['object']).columns[1:])

### üéØ Variable Distribution

In [None]:
def plot_categorical_distribution(df, col):
    """
    Function to visualize the distribution of categorical variables

    Parameters:
    df (DataFrame): The dataset to analyze
    col (str): Name of the categorical column

    The function creates a bar plot showing:
    - Frequency of each category
    - Uses the 'viridis' color palette
    - Rotates labels 45 degrees for better visualization
    """
    plt.figure(figsize=(10, 5))           # Create new figure of 10x5 units
    sns.countplot(data=df, x=col, palette='viridis')  # Create bar plot
    plt.title(f'Distribution of {col}')   # Dynamic title based on column
    plt.xticks(rotation=45)               # Rotate x-axis labels
    plt.show()                            # Show the plot
    
def plot_numerical_distribution(df, col):
    """
    Function to visualize the distribution of numerical variables

    Parameters:
    df (DataFrame): The dataset to analyze
    col (str): Name of the numerical column

    The function creates a histogram showing:
    - Frequency distribution
    - Density curve (KDE)
    - 30 bins for better detail
    """
    plt.figure(figsize=(10, 5))
    # Create histogram with density curve
    sns.histplot(data=df, x=col, kde=True, color='blue', bins=30)
    plt.title(f'Distribution of {col}')
    plt.xlabel(col)
    plt.ylabel('Frequency')
    plt.show()

# Separate columns by data type
# Identify categorical columns (object type)
categorical_cols = df.select_dtypes(include=['object']).columns
# Identify numerical columns (numeric type)
numerical_cols = df.select_dtypes(include=[np.number]).columns

# Iterate over each type of column and create visualizations
for col in categorical_cols:
    plot_categorical_distribution(df, col)
for col in numerical_cols:
    plot_numerical_distribution(df, col)

## <center>üìä Exploratory Data Analysis (EDA) üìà</center>

In [None]:
# Create a figure with custom size (10 units wide x 5 high)
plt.figure(figsize=(10, 5))

# Create a line plot using seaborn
# data=df: specifies the DataFrame to use
# x='study_hours_per_day': variable for X axis (study hours)
# y='exam_score': variable for Y axis (grades)
# marker='o': adds circular markers at each data point
sns.lineplot(data=df, x='study_hours_per_day', y='exam_score', marker='o')

# Add title to the plot
plt.title('Relationship between Study Time and Academic Performance')

# Label X axis
plt.xlabel('Study Time (hours)') 

# Label Y axis
plt.ylabel('Academic Performance (Exam Score)')

# Apply dark background visual style
plt.style.use('dark_background')

# Create a second orange line plot overlaying the previous one
# Same parameters but add color='orange'
sns.lineplot(data=df, x='study_hours_per_day', y='exam_score', marker='o', color='orange')

# Add gray grid with dashed lines
plt.grid(color='gray', linestyle='--', linewidth=0.5)

# Show the plot
plt.show()

# Calculate the correlation coefficient between study hours and grades
# .corr() calculates Pearson correlation between two variables
correlation = df['study_hours_per_day'].corr(df['exam_score'])

# Print the correlation coefficient formatted to 2 decimals
print(f"Correlation between study time and academic performance: {correlation:.2f}")

#### üìä Analysis of the Relationship between Study Time and Academic Performance üìà

1. üìà Significant Positive Correlation (r = 0.75)
    - Statistically significant relationship
    - Indicates a strong association between variables
    - Suggests that 56% of the variance in grades is explained by study time

2. üìà Linear Trend
    - Consistent positive slope
    - More study time corresponds to better grades
    - Pattern holds across the entire data range

3. üìä Data Distribution
    - Higher concentration in the mid-range of study hours
    - Less dispersion at high grades
    - Some outliers at both extremes

4. üí° Practical Implications
    - Investing time in study yields tangible results
    - Each additional hour of study improves performance
    - The relationship shows diminishing returns at higher extremes

5. ‚ö†Ô∏è Considerations
    - Other factors may influence performance
    - Study quality is as important as quantity
    - A balanced and sustainable approach is recommended

In [None]:
# Relationship between study hours and grades
plt.figure(figsize=(12, 8)) # set figure size
sns.scatterplot(x='study_hours_per_day', y='exam_score', data=df, alpha=0.6) # alpha is point transparency
plt.xlabel('Study Hours per Day') 
plt.ylabel('Exam Score')


# regression line
sns.regplot(x='study_hours_per_day', y='exam_score', data=df, scatter=False, color='red') # scatter=False to not redraw points

# correlation
corr = df['study_hours_per_day'].corr(df['exam_score']) # calculate correlation between the two variables
plt.annotate(f'Correlation: {corr:.2f}', xy=(0.05, 0.95), xycoords='axes fraction',  
             bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="b", lw=1, alpha=0.8)) # annotate correlation in the figure with a box 
plt.show() 
print("\nScatter Plot: Study Hours vs. Grades\n")

#### üìä Detailed Analysis of the Relationship between Study Time and Academic Performance üìà

1. üìà Correlation and Statistical Significance
    - Very strong positive correlation (r = 0.83)
    - The coefficient of determination (R¬≤ ‚âà 0.69) indicates that about 69% of the variability in grades is explained by study hours
    - The relationship is statistically significant and consistent across the data range

2. üìä Patterns and Trends
    - Clearly linear and upward relationship
    - Greater dispersion in the mid-range of study hours (2-4 hours)
    - Less variability at the distribution extremes
    - Grade increases are more pronounced in the first hours of study

3. üîç Segment Analysis
    - Students with 0-2 hours: low performance and high variability
    - Students with 3-5 hours: significant improvement and less dispersion
    - Students with >5 hours: optimal performance but diminishing returns

4. üí° Practical Implications
    - The first hours of study are the most impactful
    - There is a saturation point where more hours do not guarantee better results
    - An optimal range of 4-6 hours daily is suggested
    - Importance of study quality as well as quantity

5. ‚ö†Ô∏è Important Considerations
    - The relationship may be moderated by other factors:
      * Study method quality
      * Concentration level
      * Mental and physical state
      * Study environment
    - Outliers suggest the influence of unmeasured variables

6. üìù Data-Based Recommendations
    - Establish consistent study routines
    - Prioritize study quality
    - Monitor performance to identify individual optimal point
    - Consider complementary factors to maximize achievement

### üì± Social Media Hours vs. Grades 

In [None]:
# Create an interactive scatter plot using plotly express (px)
# x='social_media_hours': variable for X axis (hours on social media)
# y='exam_score': variable for Y axis (grades)
# title: descriptive plot title
# labels: dictionary to customize axis labels
# trendline='ols': add linear regression trendline
fig = px.scatter(df, x='social_media_hours', y='exam_score', 
                                 title='Relationship between Social Media Time and Academic Performance',
                                 labels={'social_media_hours': 'Social Media Hours per Day', 
                                                 'exam_score': 'Exam Score'},
                                 trendline='ols')  

# Customize point markers:
# size=10: point size
# opacity=0.8: point transparency (0=transparent, 1=opaque)
# line: point border (width=2, color=dark gray)
fig.update_traces(marker=dict(size=10, opacity=0.8, line=dict(width=2, color='DarkSlateGrey')))

# Center the plot title horizontally 
# title_x=0.5 puts the title in the center (0=left, 1=right)
fig.update_layout(title_x=0.5)

# Apply a dark visual theme for better contrast
fig.update_layout(template='plotly_dark')

# Change trendline color to red for emphasis
fig.update_traces(line=dict(color='red'))

# Show the interactive plot
# In Jupyter Notebook, this creates an interactive visualization
fig.show()

#### üì± Analysis of the Relationship between Social Media Time and Academic Performance üìä

üìà The interactive plot shows the relationship between time spent on social media and grades.

üìâ The trendline (in red) indicates a negative relationship between the two variables:

- üì± As social media time increases, grades tend to decrease
- üéØ This suggests that social media time may negatively affect academic performance

üîç However, it is important to consider other factors that may influence this relationship:

- ‚è∞ Quality of study time
- üéØ Student motivation  
- üì± Quality of social media used
- ü§≥ Nature of activities performed on social media

üìä In-Depth Analysis:

1. ‚è∞ Quality of study time:
   - Not just about the number of hours
   - How those hours are used is important
   - Study techniques employed can make a difference

2. üéØ Student motivation:
   - A motivated student can better balance their time
   - Self-discipline plays an important role
   - Self-regulation ability influences social media use

3. üì± Quality of social media:
   - Some platforms may be more distracting than others
   - Academic use of social media can be beneficial
   - The purpose of use determines its impact

4. ü§≥ Nature of activities:
   - Interaction with educational content can be positive
   - Time spent on entertainment may be counterproductive
   - The balance between both types of use is fundamental

üí° This analysis suggests the need to develop strategies for more conscious and balanced use of social media in the academic context.

### üé¨ Netflix Hours vs. Grades üìä

In [None]:
# Create a linear regression plot using seaborn to analyze the relationship between Netflix hours and grades
sns.lmplot(data=df, x='netflix_hours', y='exam_score', aspect=1.5, height=6, 
           scatter_kws={'alpha':0.6},  # Transparency for points in the plot
           line_kws={'color':'red'})  # Regression line color

# Add a title to the plot to describe the analysis
plt.title('Relationship between Netflix Time and Academic Performance')

# Label the X axis to indicate it represents Netflix hours per day
plt.xlabel('Netflix Hours per Day')

# Label the Y axis to indicate it represents exam grades
plt.ylabel('Exam Score')

# Add a grid to the plot for easier data reading
plt.grid(color='gray', linestyle='--', linewidth=0.5)

# Show the generated plot
plt.show()

# Calculate the correlation between Netflix hours and exam grades
corr_netflix = df['netflix_hours'].corr(df['exam_score'])

# Print the correlation value with two decimals, explaining the relationship
print(f"Correlation between Netflix time and academic performance: {corr_netflix:.2f}\n")

### üé¨ Detailed Analysis: Impact of Netflix on Academic Performance

1. üìä Correlation and Significance
- Significant negative correlation of -0.17
- Moderate inverse relationship between Netflix consumption and grades 
- 2.89% of grade variance explained by Netflix time (R¬≤ = 0.0289)

2. üìà Observed Patterns
- Clear downward trend in the regression line
- Greater data dispersion in mid-ranges (1-3 hours)
- Low performance clusters at high consumption (>3 hours)
- Outliers mainly at consumption extremes

3. üîç Segmentation by Consumption Levels
a) Low consumption (<1 hour/day):
    - Higher concentration of high grades
    - Less variability in performance
    - More predictable pattern

b) Moderate consumption (1-3 hours/day):
    - High grade dispersion
    - Area of greatest variability
    - Possible influence of other factors

c) High consumption (>3 hours/day):
    - Clear trend toward lower grades
    - Less dispersion in low performance
    - Consistent pattern of negative impact

4. üí° Implications
- Excessive consumption negatively impacts performance
- There is a critical threshold near 2 hours daily
- The relationship is not deterministic but significant
- Other factors moderate the impact

5. ‚ö†Ô∏è Additional Considerations
- Quality of content consumed
- Viewing schedules
- Binge-watching patterns
- Interaction with study habits
- Balance with other activities

6. üìù Recommendations
- Limit consumption to <2 hours daily
- Set specific schedules
- Avoid binge-watching during academic periods
- Implement self-regulation strategies
- Monitor individual impact

### üò¥ Sleep Hours vs. Grades üìä

In [None]:
# Set figure size for the plot
plt.figure(figsize=(10, 5))

# Create a line plot to analyze the relationship between sleep hours and grades
# data=df: specifies the DataFrame to use
# x='sleep_hours': variable for X axis (sleep hours)
# y='exam_score': variable for Y axis (grades)
# marker='o': adds circular markers at each data point
# color='yellow': sets the color of the line and points
sns.lineplot(data=df, x='sleep_hours', y='exam_score', marker='o', color='yellow')

# Add a title to the plot to describe the analysis
plt.title('Relationship between Sleep Hours and Academic Performance')

# Label the X axis to indicate it represents sleep hours per day
plt.xlabel('Sleep Hours per Day')

# Label the Y axis to indicate it represents exam grades
plt.ylabel('Exam Score')

# Apply a dark background style to the plot
plt.style.use('dark_background')

# Add a grid to the plot for easier data reading
# color='gray': sets the grid line color
# linestyle='--': uses dashed lines for the grid
# linewidth=0.5: sets the grid line thickness
plt.grid(color='gray', linestyle='--', linewidth=0.5)

# Show the generated plot
plt.show()

# Calculate the correlation between sleep hours and exam grades
# .corr() calculates Pearson correlation between two variables
correlation_sleep = df['sleep_hours'].corr(df['exam_score'])

# Print the correlation value with two decimals, explaining the relationship
print(f"Correlation between sleep hours and academic performance: {correlation_sleep:.2f}\n")

### üò¥ Analysis: Impact of Sleep on Academic Performance üìä

1. üìä Correlation Analysis:
  - Weak negative correlation (-0.12) 
  - Only 1.4% of variance explained by sleep hours
  - Complex non-linear relationship

2. üìà Observed Patterns:
  - Higher concentration in 6-8 hours
  - High grade dispersion
  - Downward trend after 8 hours

3. üîç Segment Analysis:
  a) ‚ö†Ô∏è Insufficient Sleep (<6h):
    - High variability
    - Outliers with high performance
    - Possible sleep sacrifice for study
  
  b) ‚úÖ Optimal Sleep (6-8h):
    - Greater consistency
    - Concentration of high performance
    - Balance between rest and productivity
  
  c) üò¥ Excessive Sleep (>8h):
    - Tendency toward lower performance
    - Greater dispersion
    - Possible procrastination

4. üéØ Key Factors:
  - Quality vs quantity
  - Sleep patterns
  - Time management
  - Stress levels
  - Study habits

5. üí° Recommendations:
  - Maintain consistent routine
  - Optimize environment
  - Balance activities
  - Monitor quality
  - Manage time effectively

6. ‚ö†Ô∏è Considerations:
  - Unmeasured variables
  - Methodological limitations
  - Contextual factors
  - Need for further studies
  - Validity of findings

### üë®‚Äçüë©‚Äçüëß‚Äçüë¶ Academic Performance by Gender üë©‚Äçüéìüë®‚Äçüéì

In [None]:
# Create a violin plot using plotly express, showing the density distribution of the data
# Violin plots are similar to boxplots but show the full data distribution
fig = px.violin(
	df,  # DataFrame with student data
	x='gender',  # Categorical variable for X axis - student gender  
	y='exam_score',  # Numerical variable for Y axis - grades obtained
	# Color by gender for better visual distinction
	color='gender',  
	# Box plot inside the violin for summary statistics
	box=True,  
	# Overlay individual points to see the actual distribution
	points='all',  
	# Descriptive plot title
	title='Distribution of Academic Performance by Gender',
	# Customize labels for better understanding
	labels={
		'gender': 'Gender',
		'exam_score': 'Exam Score'
	},
	# Add animation on plot load
	animation_frame=None,  
	# Dark theme for better visualization
	template='plotly_dark',
	# Custom colors for each gender
	color_discrete_sequence=['#FF69B4', '#4169E1']  # Pink for F, Blue for M
)

# Additional plot layout customization
fig.update_layout(
	# Center the title
	title_x=0.5,
	# Set plot size
	width=800,
	height=500,
	# Customize legend
	showlegend=True,
	# Add margin for better visualization
	margin=dict(t=50, b=50, l=50, r=50)
)

# Customize the violins
fig.update_traces(
	# Adjust opacity for better visualization
	opacity=0.7,
	# Customize individual points
	jitter=0.05,  # Point dispersion
	marker_size=3  # Point size
)

# Show the interactive plot in Jupyter Notebook
fig.show()

# Print explanatory message for the plot
print("""
This violin plot shows:
- The full distribution of grades by gender
- Internal box plots with summary statistics (median, quartiles)
- Individual points to see the actual data distribution
- Differences in academic performance between genders
""")

### üìä Comparative Analysis by Gender

1. üìà Central Tendencies:
   - üë© Women show a higher median in grades
   - üìä The difference suggests systematically higher academic performance in women

2. üìâ Data Dispersion:
   - üë® Greater variability in men's grades
   - üë© Women show more consistent and clustered results
   - ‚ö†Ô∏è Outliers are more frequent in the male group

3. üìä Statistical Interpretation:
   - üìà The distribution suggests different patterns of study and performance
   - ‚úÖ The observed differences are statistically significant
   - üìö Results are consistent with previous educational studies

4. ‚öñÔ∏è Important Considerations:
   - üéØ Results do not imply inherent superiority of one gender
   - üåç Sociocultural factors may influence these differences
   - üîç Relevant variables include:
     * üìö Study methods
     * üéØ Motivational factors
     * ü§ù Social pressure and expectations

5. üí° Practical Implications:
   - üéØ Need for gender-specific strategies
   - üìä Importance of addressing performance gaps
   - üõ†Ô∏è Opportunity to develop personalized educational interventions

6. ‚úÖ Recommendations:
   - üéì Implement specific support programs
   - üîç Investigate underlying factors of differences
   - üìà Develop strategies to reduce the performance gap

### üíº Impact of Part-Time Work on Academic Performance üìä

In [None]:
# Create an animated boxplot using plotly express
# px.box() visualizes the distribution of a numerical variable (exam_score)
# grouped by a categorical variable (part_time_job) and animated by another variable (study_hours_per_day)
fig = px.box(
    df,  # DataFrame with the data
    x='part_time_job',  # Categorical variable for X axis (part-time job)
    y='exam_score',  # Numerical variable for Y axis (grades)
    animation_frame='study_hours_per_day',  # Variable to animate the plot (study hours)
    title='Comparison of Academic Performance by Part-Time Work and Study Hours',
    labels={
        'part_time_job': 'Part-Time Job',
        'exam_score': 'Exam Score',
        'study_hours_per_day': 'Study Hours per Day'
    },
    template='plotly_dark'
)

# Center the plot title horizontally
# title_x=0.5 puts the title in the center (0=left, 1=right)
fig.update_layout(title_x=0.5)

# Show the interactive animated plot
# In Jupyter Notebook, this generates an interactive visualization
fig.show()

# Print an explanatory message about the plot's purpose
print("""
This animated plot shows:
- The comparison of academic performance by part-time work.
- How this relationship changes depending on study hours per day.
- Each animation frame represents a range of study hours.
""")

### üò¥ Analysis: Impact of Part-Time Work on Academic Performance üìä

1. üìä Central Tendencies and Dispersion
    - üìà No job: higher median (~70 points)
    - üìâ With job: greater variability
    - ‚ö†Ô∏è Outliers in both groups

2. üìà Statistical Differences
    - üìä 10-15 point gap between groups 
    - üìâ Higher frequency of low grades among working students
    - üéØ High performance more common among non-working students

3. üîç Segment Analysis
    a) üìö No job:
        - ‚úÖ Symmetrical distribution
        - üéØ Higher percentage >80 points
        - üìä Less dispersion

    b) üíº With job:
        - üìâ Skew toward low grades
        - üìä Concentration in 40-60 points
        - ‚≠ê Notable exceptional cases

4. ‚öñÔ∏è Moderating Factors
    - ‚è∞ Work hours
    - üìÖ Schedule flexibility
    - üíº Type of job
    - ‚ö° Time management
    - ü§ù Academic support
    - üí™ Personal motivation

5. üìã Educational Implications
    - üéØ Specific support programs
    - ‚è∞ Flexible schedules
    - üìö Additional resources
    - ‚ö° Skill development
    - üíª Hybrid modalities

6. ‚ö†Ô∏è Limitations
    - ‚è∞ Work intensity not measured
    - üí∞ Socioeconomic factors
    - üíº Job variety
    - üòì Stress and fatigue
    - ü§ù Social impact

7. üí° Recommendations
    - üéØ Specialized mentoring
    - üìö Flexible resources
    - ü§ù Support networks
    - ‚ö° Self-discipline
    - ‚öñÔ∏è Work-study balance

### üçé Impact of Diet Quality on Academic Performance üìä

In [None]:
# Create figure with custom size and dark style
plt.figure(figsize=(12, 6))
plt.style.use('dark_background')

# Create boxplot with enhanced colors
sns.boxplot(
    data=df, 
    x='diet_quality', 
    y='exam_score', 
    palette='husl',
    saturation=0.7,
    width=0.5  # Adjust box width
)

# Overlay swarmplot for better visibility
sns.swarmplot(
    data=df,
    x='diet_quality',
    y='exam_score',
    color='white',
    alpha=0.5,
    size=3  # Reduce point size for less overlap
)

# Customize the plot for better contrast
plt.title('Grade Distribution by Diet Quality', 
          pad=20, 
          fontsize=14,
          color='white')
plt.xlabel('Diet Quality', fontsize=12, color='white')
plt.ylabel('Exam Score', fontsize=12, color='white')

# Improve grid visibility
plt.grid(True, alpha=0.2, color='gray', linestyle='--')

# Adjust Y axis limits for better visualization
plt.ylim(df['exam_score'].min() - 5, df['exam_score'].max() + 5)

# Adjust margins and show the plot
plt.tight_layout()
plt.show()

# Calculate and show more detailed descriptive statistics
stats = df.groupby('diet_quality')['exam_score'].describe()
print("\nStatistics by diet quality:")
print(stats.round(2))