# üìä Exploratory Data Analysis (EDA)

## Objective:
#### "To analyze how student demographics (nationality, age, qualification level), academic performance (GPA, attendance), and learning behaviors (self-study hours, prior knowledge) influence course completion and student success, in order to identify at-risk student profiles and recommend targeted academic support interventions."

---

## Purpose of This Notebook:
- Explore the data from **every angle**
- Generate **20+ exploratory charts**
- Identify the **most compelling insights**
- Select the **BEST 4 charts** for final dashboard

**Strategy:** Create many charts quickly ‚Üí Pick winners ‚Üí Polish them for submission

---
## üì¶ SETUP: Import Libraries & Load Data

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Professional dark theme colors (consistent with your reference dashboards)
COLORS = {
    'bg': '#1e2130',
    'text': '#e4e6eb',
    'teal': '#00d4aa',
    'coral': '#ff6b6b',
    'yellow': '#ffd93d',
    'grid': 'rgba(255, 255, 255, 0.1)'
}

# Color palettes
COLORS_NATIONALITY = {
    'SG Citizen': '#00d4aa',
    'SG PR': '#ffd93d',
    'Foreigner': '#ff6b6b'
}

COLORS_PERFORMANCE = ['#ff6b6b', '#ffd93d', '#00d4aa']  # Red, Yellow, Green

print("‚úÖ Libraries imported successfully")
print("üé® Color scheme loaded")

‚úÖ Libraries imported successfully
üé® Color scheme loaded


In [2]:
# Load all cleaned datasets
df_master = pd.read_csv('../cleaned_data/master_dataset.csv')
df_profiles = pd.read_csv('../cleaned_data/student_profiles_clean.csv')
df_results = pd.read_csv('../cleaned_data/student_results_clean.csv')
df_survey = pd.read_csv('../cleaned_data/student_survey_clean.csv')

print("‚úÖ Data loaded successfully\n")
print(f"Master Dataset: {df_master.shape[0]:,} rows √ó {df_master.shape[1]} columns")
print(f"Profiles: {df_profiles.shape[0]:,} students")
print(f"Results: {df_results.shape[0]:,} records")
print(f"Survey: {df_survey.shape[0]:,} responses")

# Quick data overview
print("\nüìä Master Dataset Columns:")
print(df_master.columns.tolist())

‚úÖ Data loaded successfully

Master Dataset: 520 rows √ó 28 columns
Profiles: 307 students
Results: 522 records
Survey: 511 responses

üìä Master Dataset Columns:
['STUDENT ID', 'GENDER', 'SG CITIZEN', 'SG PR', 'FOREIGNER', 'COUNTRY OF OTHER NATIONALITY', 'DOB', 'HIGHEST QUALIFICATION', 'NAME OF QUALIFICATION AND INSTITUTION', 'DATE ATTAINED HIGHEST QUALIFICATION', 'DESIGNATION', 'COMMENCEMENT DATE', 'COMPLETION DATE', 'FULL-TIME OR PART-TIME', 'COURSE FUNDING', 'CLASS', 'NATIONALITY_STATUS', 'AGE', 'COURSE_DURATION_DAYS', 'PERIOD', 'GPA', 'ATTENDANCE', 'PRIOR KNOWLEDGE', 'COURSE RELEVANCE', 'TEACHING SUPPORT', 'COMPANY SUPPORT', 'FAMILY SUPPORT', 'SELF-STUDY HRS']


In [3]:
# Quick statistics for context
print("="*80)
print("üìà KEY STATISTICS")
print("="*80)

# Remove NaN for statistics
df_stats = df_master.dropna(subset=['GPA', 'ATTENDANCE'])

print(f"\nüë• Total Students: {df_profiles['STUDENT ID'].nunique()}")
print(f"üìö Total Records: {len(df_stats)}")
print(f"\nüìä Academic Performance:")
print(f"   Average GPA: {df_stats['GPA'].mean():.2f} (range: {df_stats['GPA'].min():.1f} - {df_stats['GPA'].max():.1f})")
print(f"   Average Attendance: {df_stats['ATTENDANCE'].mean():.1f}% (range: {df_stats['ATTENDANCE'].min():.0f}% - {df_stats['ATTENDANCE'].max():.0f}%)")

# At-risk students
at_risk_gpa = (df_stats['GPA'] < 2.5).sum()
at_risk_attendance = (df_stats['ATTENDANCE'] < 75).sum()
print(f"\n‚ö†Ô∏è  At-Risk Indicators:")
print(f"   Students with GPA < 2.5: {at_risk_gpa} ({at_risk_gpa/len(df_stats)*100:.1f}%)")
print(f"   Students with Attendance < 75%: {at_risk_attendance} ({at_risk_attendance/len(df_stats)*100:.1f}%)")

print(f"\nüåç Demographics:")
print(df_profiles['NATIONALITY_STATUS'].value_counts())

print(f"\nüíº Enrollment:")
print(df_profiles['FULL-TIME OR PART-TIME'].value_counts())

üìà KEY STATISTICS

üë• Total Students: 295
üìö Total Records: 505

üìä Academic Performance:
   Average GPA: 3.12 (range: 1.6 - 4.0)
   Average Attendance: 85.8% (range: 50% - 100%)

‚ö†Ô∏è  At-Risk Indicators:
   Students with GPA < 2.5: 90 (17.8%)
   Students with Attendance < 75%: 88 (17.4%)

üåç Demographics:
NATIONALITY_STATUS
SG Citizen    239
Foreigner      36
SG PR          32
Name: count, dtype: int64

üíº Enrollment:
FULL-TIME OR PART-TIME
Part-Time    237
Full-Time     41
Part Time     29
Name: count, dtype: int64


---
# üìä SECTION 1: DEMOGRAPHICS EXPLORATION
Understanding who our students are

In [4]:
# CHART 1: Age Distribution
fig = px.histogram(
    df_profiles.dropna(subset=['AGE']),
    x='AGE',
    nbins=30,
    title='Student Age Distribution',
    labels={'AGE': 'Age (years)', 'count': 'Number of Students'},
    color_discrete_sequence=['#00d4aa']
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=400
)

fig.show()
print("üí° INSIGHT: What's the typical student age? Are there distinct age groups?")

üí° INSIGHT: What's the typical student age? Are there distinct age groups?


In [5]:
# CHART 2: Nationality Breakdown (Pie Chart)
nationality_counts = df_profiles['NATIONALITY_STATUS'].value_counts()

fig = px.pie(
    values=nationality_counts.values,
    names=nationality_counts.index,
    title='Student Nationality Distribution',
    color=nationality_counts.index,
    color_discrete_map=COLORS_NATIONALITY,
    hole=0.4  # Donut chart
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=400
)

fig.show()
print("üí° INSIGHT: Dominant nationality group? International student percentage?")

üí° INSIGHT: Dominant nationality group? International student percentage?


In [6]:
# CHART 3: Qualification Levels
qual_counts = df_profiles['HIGHEST QUALIFICATION'].value_counts()

fig = px.bar(
    x=qual_counts.index,
    y=qual_counts.values,
    title='Student Qualification Levels',
    labels={'x': 'Qualification', 'y': 'Number of Students'},
    color=qual_counts.values,
    color_continuous_scale=['#ff6b6b', '#ffd93d', '#00d4aa']
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    xaxis=dict(tickangle=-45),
    height=400,
    showlegend=False
)

fig.show()
print("üí° INSIGHT: Most common educational background? How does prior education vary?")

üí° INSIGHT: Most common educational background? How does prior education vary?


In [7]:
# CHART 4: Full-time vs Part-time by Funding
enrollment_funding = df_profiles.groupby(['FULL-TIME OR PART-TIME', 'COURSE FUNDING']).size().reset_index(name='count')

fig = px.bar(
    enrollment_funding,
    x='FULL-TIME OR PART-TIME',
    y='count',
    color='COURSE FUNDING',
    title='Enrollment Type by Funding Source',
    labels={'count': 'Number of Students'},
    barmode='group',
    color_discrete_sequence=['#00d4aa', '#ffd93d', '#ff6b6b', '#a78bfa']
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=400
)

fig.show()
print("üí° INSIGHT: Relationship between enrollment type and funding? Most common combinations?")

üí° INSIGHT: Relationship between enrollment type and funding? Most common combinations?


In [8]:
# CHART 5: Age Groups by Nationality
df_age = df_profiles.dropna(subset=['AGE', 'NATIONALITY_STATUS']).copy()
df_age['AGE_GROUP'] = pd.cut(df_age['AGE'], bins=[0, 25, 35, 45, 100], labels=['<25', '25-35', '35-45', '45+'])

fig = px.histogram(
    df_age,
    x='AGE_GROUP',
    color='NATIONALITY_STATUS',
    title='Age Distribution by Nationality',
    barmode='group',
    color_discrete_map=COLORS_NATIONALITY
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=400
)

fig.show()
print("üí° INSIGHT: Do different nationalities have different age profiles?")

üí° INSIGHT: Do different nationalities have different age profiles?


---
# üìà SECTION 2: ACADEMIC PERFORMANCE ANALYSIS
Understanding student success metrics

In [9]:
# CHART 6: GPA Distribution
df_perf = df_master.dropna(subset=['GPA'])

fig = px.histogram(
    df_perf,
    x='GPA',
    nbins=20,
    title='GPA Distribution Across All Students',
    labels={'GPA': 'Grade Point Average'},
    color_discrete_sequence=['#00d4aa']
)

# Add mean line
mean_gpa = df_perf['GPA'].mean()
fig.add_vline(x=mean_gpa, line_dash="dash", line_color="#ffd93d", annotation_text=f"Mean: {mean_gpa:.2f}")

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=400
)

fig.show()
print("üí° INSIGHT: Normal distribution? Skewed? Where's the majority?")

üí° INSIGHT: Normal distribution? Skewed? Where's the majority?


In [10]:
# CHART 7: GPA vs Attendance Scatter (THE WINNER CANDIDATE!)
df_scatter = df_master.dropna(subset=['GPA', 'ATTENDANCE', 'NATIONALITY_STATUS'])

fig = px.scatter(
    df_scatter,
    x='ATTENDANCE',
    y='GPA',
    color='NATIONALITY_STATUS',
    title='‚≠ê GPA vs Attendance Correlation (STRONG CANDIDATE!)',
    trendline='ols',
    color_discrete_map=COLORS_NATIONALITY,
    opacity=0.7
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=500
)

correlation = df_scatter['GPA'].corr(df_scatter['ATTENDANCE'])
fig.show()
print(f"üí° INSIGHT: Correlation = {correlation:.3f} - {'STRONG' if abs(correlation) > 0.5 else 'MODERATE'} relationship!")
print("‚≠ê RECOMMENDATION: This is a WINNER for your dashboard!")

üí° INSIGHT: Correlation = 0.418 - MODERATE relationship!
‚≠ê RECOMMENDATION: This is a WINNER for your dashboard!


In [11]:
# CHART 8: GPA Box Plot by Nationality
fig = px.box(
    df_master.dropna(subset=['GPA', 'NATIONALITY_STATUS']),
    x='NATIONALITY_STATUS',
    y='GPA',
    title='GPA Distribution by Nationality',
    color='NATIONALITY_STATUS',
    color_discrete_map=COLORS_NATIONALITY
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=400,
    showlegend=False
)

fig.show()
print("üí° INSIGHT: Performance differences by nationality? Outliers?")

üí° INSIGHT: Performance differences by nationality? Outliers?


In [12]:
# CHART 9: Attendance Box Plot by Period
df_attendance = df_master.dropna(subset=['ATTENDANCE', 'PERIOD'])
df_attendance = df_attendance[df_attendance['PERIOD'].isin(['Sem 1', 'Sem 2', 'Sem 3'])]

fig = px.box(
    df_attendance,
    x='PERIOD',
    y='ATTENDANCE',
    title='Attendance Patterns Across Semesters',
    color='PERIOD',
    color_discrete_sequence=['#00d4aa', '#ffd93d', '#ff6b6b']
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=400,
    showlegend=False
)

fig.show()
print("üí° INSIGHT: Does attendance decline over semesters?")

üí° INSIGHT: Does attendance decline over semesters?


In [13]:
# CHART 10: Performance Trend Over Semesters (CANDIDATE!)
df_trend = df_master.dropna(subset=['GPA', 'PERIOD', 'NATIONALITY_STATUS'])
df_trend = df_trend[df_trend['PERIOD'].isin(['Sem 1', 'Sem 2', 'Sem 3'])]

trend_data = df_trend.groupby(['PERIOD', 'NATIONALITY_STATUS'])['GPA'].mean().reset_index()

fig = px.line(
    trend_data,
    x='PERIOD',
    y='GPA',
    color='NATIONALITY_STATUS',
    title='‚≠ê GPA Trends Across Semesters by Nationality (STRONG CANDIDATE!)',
    markers=True,
    color_discrete_map=COLORS_NATIONALITY
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=450
)

fig.show()
print("üí° INSIGHT: Do students improve over time? Different patterns by group?")
print("‚≠ê RECOMMENDATION: Great for showing progression - add dropdown for interactivity!")

üí° INSIGHT: Do students improve over time? Different patterns by group?
‚≠ê RECOMMENDATION: Great for showing progression - add dropdown for interactivity!


In [14]:
# CHART 11: Performance Categories
df_cat = df_master.dropna(subset=['GPA']).copy()
df_cat['PERFORMANCE'] = pd.cut(df_cat['GPA'], 
                                bins=[0, 2.5, 3.5, 5.0],
                                labels=['At Risk', 'Satisfactory', 'Excellent'])

perf_counts = df_cat['PERFORMANCE'].value_counts()

fig = px.bar(
    x=perf_counts.index,
    y=perf_counts.values,
    title='Student Performance Categories',
    labels={'x': 'Performance Level', 'y': 'Number of Students'},
    color=perf_counts.index,
    color_discrete_map={'At Risk': '#ff6b6b', 'Satisfactory': '#ffd93d', 'Excellent': '#00d4aa'}
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=400,
    showlegend=False
)

fig.show()
print("üí° INSIGHT: How many students need intervention?")

üí° INSIGHT: How many students need intervention?


---
# üéØ SECTION 3: LEARNING BEHAVIOR & SURVEY ANALYSIS
Understanding student engagement and perceptions

In [15]:
# CHART 12: Self-Study Hours Distribution
df_study = df_master.dropna(subset=['SELF-STUDY HRS'])

fig = px.histogram(
    df_study,
    x='SELF-STUDY HRS',
    nbins=15,
    title='Self-Study Hours Distribution',
    labels={'SELF-STUDY HRS': 'Hours per Week'},
    color_discrete_sequence=['#00d4aa']
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=400
)

fig.show()
print(f"üí° INSIGHT: Average study time = {df_study['SELF-STUDY HRS'].mean():.1f} hours/week")

üí° INSIGHT: Average study time = 13.3 hours/week


In [16]:
# CHART 13: Survey Scores Overview (CANDIDATE!)
survey_cols = ['PRIOR KNOWLEDGE', 'COURSE RELEVANCE', 'TEACHING SUPPORT', 
               'COMPANY SUPPORT', 'FAMILY SUPPORT']

df_survey_scores = df_master[survey_cols].dropna()
avg_scores = df_survey_scores.mean().sort_values(ascending=True)

fig = px.bar(
    x=avg_scores.values,
    y=avg_scores.index,
    orientation='h',
    title='‚≠ê Average Survey Scores (1-5 Scale) - GOOD CANDIDATE!',
    labels={'x': 'Average Score', 'y': 'Factor'},
    color=avg_scores.values,
    color_continuous_scale=['#ff6b6b', '#ffd93d', '#00d4aa']
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=400,
    showlegend=False
)

fig.show()
print("üí° INSIGHT: Which factors score highest? Lowest?")
print("‚≠ê RECOMMENDATION: Add radio buttons to compare by different student groups!")

üí° INSIGHT: Which factors score highest? Lowest?
‚≠ê RECOMMENDATION: Add radio buttons to compare by different student groups!


In [17]:
# CHART 14: Self-Study Hours vs GPA
df_study_gpa = df_master.dropna(subset=['SELF-STUDY HRS', 'GPA'])

fig = px.scatter(
    df_study_gpa,
    x='SELF-STUDY HRS',
    y='GPA',
    title='Self-Study Hours vs GPA',
    trendline='ols',
    color_discrete_sequence=['#00d4aa'],
    opacity=0.6
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=400
)

correlation = df_study_gpa['SELF-STUDY HRS'].corr(df_study_gpa['GPA'])
fig.show()
print(f"üí° INSIGHT: Correlation = {correlation:.3f} - Does more study time = better grades?")

üí° INSIGHT: Correlation = 0.722 - Does more study time = better grades?


In [18]:
# CHART 15: Survey Scores by Performance Category
df_survey_perf = df_master.dropna(subset=survey_cols + ['GPA']).copy()
df_survey_perf['PERFORMANCE'] = pd.cut(df_survey_perf['GPA'], 
                                        bins=[0, 2.5, 3.5, 5.0],
                                        labels=['At Risk', 'Satisfactory', 'Excellent'])

# Melt for easier plotting
df_melt = df_survey_perf.melt(id_vars=['PERFORMANCE'], 
                               value_vars=survey_cols,
                               var_name='Factor', 
                               value_name='Score')

fig = px.box(
    df_melt,
    x='Factor',
    y='Score',
    color='PERFORMANCE',
    title='Survey Scores by Student Performance Level',
    color_discrete_map={'At Risk': '#ff6b6b', 'Satisfactory': '#ffd93d', 'Excellent': '#00d4aa'}
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    xaxis=dict(tickangle=-45),
    height=450
)

fig.show()
print("üí° INSIGHT: Do high performers rate factors differently than at-risk students?")

üí° INSIGHT: Do high performers rate factors differently than at-risk students?


---
# üîó SECTION 4: RELATIONSHIP & CORRELATION ANALYSIS
Finding connections between variables

In [19]:
# CHART 16: Correlation Heatmap (STRONG CANDIDATE!)
numeric_cols = ['GPA', 'ATTENDANCE', 'PRIOR KNOWLEDGE', 'COURSE RELEVANCE', 
                'TEACHING SUPPORT', 'COMPANY SUPPORT', 'FAMILY SUPPORT', 'SELF-STUDY HRS']

df_corr = df_master[numeric_cols].dropna()
corr_matrix = df_corr.corr()

fig = px.imshow(
    corr_matrix,
    text_auto='.2f',
    title='‚≠ê Correlation Heatmap (EXCELLENT CANDIDATE!)',
    color_continuous_scale='RdYlGn',
    aspect='auto'
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=600
)

fig.show()
print("üí° INSIGHT: Which factors are most strongly related to GPA?")
print("‚≠ê RECOMMENDATION: Perfect for identifying key success factors!")

üí° INSIGHT: Which factors are most strongly related to GPA?
‚≠ê RECOMMENDATION: Perfect for identifying key success factors!


In [20]:
# CHART 17: GPA by Attendance Categories
df_att_cat = df_master.dropna(subset=['GPA', 'ATTENDANCE']).copy()
df_att_cat['ATTENDANCE_CAT'] = pd.cut(df_att_cat['ATTENDANCE'], 
                                       bins=[0, 70, 85, 100],
                                       labels=['Low (<70%)', 'Medium (70-85%)', 'High (85%+)'])

fig = px.box(
    df_att_cat,
    x='ATTENDANCE_CAT',
    y='GPA',
    title='GPA Distribution by Attendance Category',
    color='ATTENDANCE_CAT',
    color_discrete_map={'Low (<70%)': '#ff6b6b', 'Medium (70-85%)': '#ffd93d', 'High (85%+)': '#00d4aa'}
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=400,
    showlegend=False
)

fig.show()
print("üí° INSIGHT: Clear performance tiers based on attendance?")

üí° INSIGHT: Clear performance tiers based on attendance?


In [22]:
# CHART 19: Age vs GPA with Attendance Color
df_age_gpa = df_master.dropna(subset=['AGE', 'GPA', 'ATTENDANCE']).copy()
df_age_gpa['ATTENDANCE_CAT'] = pd.cut(df_age_gpa['ATTENDANCE'], 
                                       bins=[0, 75, 90, 100],
                                       labels=['<75%', '75-90%', '90%+'])

fig = px.scatter(
    df_age_gpa,
    x='AGE',
    y='GPA',
    color='ATTENDANCE_CAT',
    title='Age vs GPA (colored by Attendance)',
    color_discrete_map={'<75%': '#ff6b6b', '75-90%': '#ffd93d', '90%+': '#00d4aa'},
    opacity=0.6
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=450
)

fig.show()
print("üí° INSIGHT: Does age affect performance? Attendance patterns by age?")

üí° INSIGHT: Does age affect performance? Attendance patterns by age?


In [24]:
df_sunburst = df_master.dropna(subset=['NATIONALITY_STATUS', 'FULL-TIME OR PART-TIME', 'GPA']).copy()

df_sunburst['PERFORMANCE'] = pd.cut(
    df_sunburst['GPA'],
    bins=[0, 2.5, 3.5, 5.0],
    labels=['At Risk', 'Satisfactory', 'Excellent']
)

# Convert categorical to string to avoid Plotly error
df_sunburst['PERFORMANCE'] = df_sunburst['PERFORMANCE'].astype(str)

# Count students in each segment
sunburst_data = df_sunburst.groupby(
    ['NATIONALITY_STATUS', 'FULL-TIME OR PART-TIME', 'PERFORMANCE']
).size().reset_index(name='Count')

fig = px.sunburst(
    sunburst_data,
    path=['NATIONALITY_STATUS', 'FULL-TIME OR PART-TIME', 'PERFORMANCE'],
    values='Count',
    title='‚≠ê Student Segmentation: Nationality ‚Üí Enrollment ‚Üí Performance (ADVANCED!)',
    color='PERFORMANCE',
    color_discrete_map={'At Risk': '#ff6b6b', 'Satisfactory': '#ffd93d', 'Excellent': '#00d4aa'}
)

fig.update_layout(
    plot_bgcolor=COLORS['bg'],
    paper_bgcolor=COLORS['bg'],
    font=dict(color=COLORS['text']),
    height=600
)

fig.show()

print("üí° INSIGHT: Multi-dimensional view of student profiles!")
print("‚≠ê RECOMMENDATION: Excellent for showing complex relationships visually!")

üí° INSIGHT: Multi-dimensional view of student profiles!
‚≠ê RECOMMENDATION: Excellent for showing complex relationships visually!


---
# üèÜ CHART SELECTION GUIDE
## Which 4 charts should YOU choose?

Based on the exploration above, here are my **TOP RECOMMENDATIONS** for your 4 charts:

### ‚≠ê **TIER S (MUST HAVE):**
1. **Chart 7: GPA vs Attendance Scatter** 
   - Type: Plotly Express ‚úÖ
   - Why: Clear correlation, addresses your objective directly
   - Score potential: 8/8

2. **Chart 16: Correlation Heatmap**
   - Type: Plotly Graph Objects ‚úÖ
   - Why: Shows ALL relationships at once, very professional
   - Interactive: Add hover tooltips, maybe toggles
   - Score potential: 8/8

### ‚≠ê **TIER A (EXCELLENT CHOICES):**
3. **Chart 10: GPA Trends Over Semesters**
   - Type: Plotly Graph Objects ‚úÖ
   - Why: Shows progression, multiple dimensions
   - Interactive: Dropdown to switch view (by Nationality/Funding/Type)
   - Score potential: 8/8

4. **Chart 18: Demographics √ó Performance Matrix**
   - Type: Plotly Graph Objects ‚úÖ
   - Why: Identifies at-risk profiles (your objective!)
   - Interactive: Radio buttons to change demographic dimension
   - Score potential: 8/8

### üåü **ALTERNATIVE TIER A:**
- **Chart 13: Survey Scores** - Good for behavior analysis
- **Chart 20: Sunburst** - Very advanced, impressive visually

---

## üí° SELECTION CRITERIA:
‚úÖ Addresses your objective directly  
‚úÖ Shows clear, actionable insights  
‚úÖ Mix of chart types (scatter, heatmap, line, bar)  
‚úÖ Good for interactivity (dropdowns, radio buttons)  
‚úÖ Professional appearance  
‚úÖ Easy to explain (3 insights per chart)  

---

## üéØ YOUR TEAMMATE SHOULD CHOOSE FROM:
- Chart 13: Survey Scores Analysis (with radio buttons)
- Chart 14: Self-Study Hours vs GPA
- Chart 17: GPA by Attendance Categories  
- Chart 20: Sunburst Hierarchy

This ensures NO OVERLAP between you two!

---

## üìä NEXT STEPS:
1. Review all 20 charts above
2. Pick YOUR 4 favorites (I recommend Charts 7, 16, 10, 18)
3. Coordinate with teammate (they pick different 4)
4. Tell me your choices
5. I'll create POLISHED, INTERACTIVE versions for submission!

---

**What do you think? Ready to pick your 4?** üöÄ