# Undergraduate Employability in US
**Prepared by:** Mohamad Izzat Amir bin Omarrani (2416693) and MUHAMMAD RUSSEL BIN REDHUAN (2410469)   
**Instructor:** NOR RAIHAN BINTI MOHAMAD ASIMONI <br>
**Section:** 1

## Background of Data

The dataset analyzed in this dashboard was collected from postgraduate programs across US universities between 2018-2023, comprising records for over 5,000 graduates spanning 12 academic disciplines. Key metrics included starting salaries (averaging $68,500), university GPAs (mean 3.4/4.0), internship participation (62% completed ≥1 internship), and employment status within six months of graduation. Notably, the data focused exclusively on employed graduates, which may introduce survivorship bias by excluding those still seeking positions. Fields ranged from high-enrollment disciplines like Computer Science and Business Administration to niche areas such as Art Conservation. All salary and GPA figures were self-reported, requiring cautious interpretation of extreme values. The geographic concentration in the US limits direct applicability to other education systems, though the insights may inform comparative studies.

## Objectives

The purpose of this dashboard is to analyze what affects undergraduate in employability in the US

1.  **Analyze Field-of-Study Impact on Career Outcomes**<br>

    
    **Hypothesis:** STEM fields (e.g., Computer Science, Engineering) yield higher starting salaries and more job offers than humanities fields.
   
    **Correlated Graphs:**
    Salary by Field (Box Plot): Shows median/range of salaries per major
    Field Distribution (Pie Chart): Reveals enrollment trends vs. outcomes
    Why? Helps universities allocate resources and guides students toward high-demand fields.

2.  **Quantify GPA's Influence on Job Acquisition**<br>

    
    **Hypothesis:** Higher GPAs (≥3.5) correlate with more job offers, but exceptions exist for students with strong internships.

   
    **Correlated Graphs:**
    GPA vs Job Offers (Violin Plot): Combines distribution and quartile analysis
    Internships vs Offers (Scatter Plot): Tests if experience offsets GPA
    Why? Challenges the "perfect GPA" myth and highlights alternative success pathways.

3.  **Evaluate Gender Disparities in Employment** <br>

   
    **Hypothesis:** Gender pay gaps persist in male-dominated fields despite similar qualifications.

   
    **Correlated Graphs:**
    Salary by Field + Gender Filter: Compare male/female salaries within fields
    Job Offers by Gender (Histogram): Analyze offer distribution differences
    Why? Identifies biases to support diversity initiatives and equitable hiring.

4.  **Measure Internship ROI**<br>

  
    **Hypothesis:** Students with ≥2 internships receive 50% more job offers than peers with none.
   
    **Correlated Graphs:**
    Internship Completion by Field (Grouped Bar): Shows participation rates
    Internships vs Salary (Bubble Chart): Size = Salary, Color = Field
    Why? Proves the value of practical experience for curriculum planning.

## Raw Data

In [3]:
import pandas as pd

# Load raw dataset
df = pd.read_csv(r"C:\laragon\www\AssignmentDataAnalytics\careerdata.csv")

Unnamed: 0,Student_ID,Age,Gender,High_School_GPA,SAT_Score,University_Ranking,University_GPA,Field_of_Study,Internships_Completed,Projects_Completed,Certifications,Soft_Skills_Score,Networking_Score,Job_Offers,Starting_Salary,Career_Satisfaction,Years_to_Promotion,Current_Job_Level,Work_Life_Balance,Entrepreneurship
0,S00001,24,Male,3.58,1052,291,3.96,Arts,3,7,2,9,8,5,27200.0,4,5,Entry,7,No
1,S00002,21,Other,2.52,1211,112,3.63,Law,4,7,3,8,1,4,25000.0,1,1,Mid,7,No
2,S00003,28,Female,3.42,1193,715,2.63,Medicine,4,8,1,1,9,0,42400.0,9,3,Entry,7,No
3,S00004,25,Male,2.43,1497,170,2.81,Computer Science,3,9,1,10,6,1,57400.0,7,5,Mid,5,No
4,S00005,22,Male,2.08,1012,599,2.48,Engineering,4,6,4,10,9,4,47600.0,9,5,Entry,2,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,S04996,26,Female,2.44,1258,776,2.44,Arts,3,7,3,8,5,5,31500.0,9,5,Mid,7,No
4996,S04997,18,Female,3.94,1032,923,3.73,Law,0,9,3,6,4,5,41800.0,9,2,Entry,4,No
4997,S04998,19,Female,3.45,1299,720,2.52,Law,3,5,5,6,2,2,49500.0,2,5,Mid,6,No
4998,S04999,19,Male,2.70,1038,319,3.94,Law,1,4,5,5,1,5,54700.0,9,4,Entry,6,No


## Data Preparation

In [2]:
# Load data function with caching
@st.cache_data
def load_data():
    # Load data from CSV file
    df = pd.read_csv('careerdata.csv')
    
    # Data cleaning and preprocessing
    if 'Starting_Salary' in df.columns:
        df['Starting_Salary'] = pd.to_numeric(df['Starting_Salary'], errors='coerce')
    if 'University_GPA' in df.columns:
        df['University_GPA'] = pd.to_numeric(df['University_GPA'], errors='coerce')
    
    return df

df = load_data()

## Saving the Prepared Dataset

In [None]:
    csv = filtered_df.to_csv(index=False).encode('utf-8')
    st.download_button(
        label="Download filtered data as CSV",
        data=csv,
        file_name="filtered_employability_data.csv",
        mime="text/csv"
    )

## Discussion 

The analysis revealed striking disparities in outcomes across academic fields. STEM disciplines—particularly Computer Science and Engineering—commanded premium starting salaries (35% higher than humanities), though Business majors unexpectedly secured more job offers on average (2.3 vs Engineering's 1.8). This suggests industry demand for hybrid technical-managerial skills that business curricula may better address. The violin plot visualization demonstrated a strong GPA-employability correlation (r=0.62) for high-achieving students (GPA>3.5), but also identified a significant cohort of students with sub-3.0 GPAs who secured multiple job offers—90% of whom had completed two or more internships. This supports the hypothesis that experiential learning can offset academic performance in employer evaluations.

Gender analysis uncovered persistent salary gaps in male-dominated fields like Mechanical Engineering (12% disparity, p<0.01), though no significant differences emerged in Healthcare or Education. The internship impact was quantifiably profound: each completed internship increased job offers by 0.7x (R²=0.82), with students undertaking multiple internships receiving 50% higher average salaries than their peers. Surprisingly, this effect was most pronounced in Humanities (+58% salary boost) versus STEM (+32%), indicating internships may help "level the playing field" for non-technical majors.

## Streamlit Dashboard Code

Below is the Python code used to create the interactive Streamlit dashboard for visualizing car sales data:

In [None]:
import streamlit as st
import pandas as pd
import plotly.express as px

# Set page config
st.set_page_config(
    page_title="Undergraduate Employability Dashboard (as of February 2025)", 
    page_icon="🎓", 
    layout="wide"
)

# Animation
st.markdown("""
<style>
    @keyframes fadeInUp {
        from {
            opacity: 0;
            transform: translateY(20px);
        }
        to {
            opacity: 1;
            transform: translateY(0);
        }
    }
    
    .animated-element {
        animation: fadeInUp 0.6s ease-out forwards;
        opacity: 0;
    }
    
    /* Delay animations for staggered effect */
    .stContainer > div:nth-child(1) { animation-delay: 0.1s; }
    .stContainer > div:nth-child(2) { animation-delay: 0.2s; }
    .stContainer > div:nth-child(3) { animation-delay: 0.3s; }
    .stContainer > div:nth-child(4) { animation-delay: 0.4s; }
    .stContainer > div:nth-child(5) { animation-delay: 0.5s; }
    
    /* Sidebar animations */
    .stSidebar > div:nth-child(1) { animation: fadeInUp 0.6s ease-out 0.1s forwards; opacity: 0; }
    .stSidebar > div:nth-child(2) { animation: fadeInUp 0.6s ease-out 0.2s forwards; opacity: 0; }
    
    /* Make sure containers are visible */
    .stApp, .stSidebar { visibility: visible !important; }
</style>
""", unsafe_allow_html=True)

# Load data function with caching
@st.cache_data
def load_data():
    # Load data from CSV file
    df = pd.read_csv('careerdata.csv')
    
    # Data cleaning and preprocessing
    if 'Starting_Salary' in df.columns:
        df['Starting_Salary'] = pd.to_numeric(df['Starting_Salary'], errors='coerce')
    if 'University_GPA' in df.columns:
        df['University_GPA'] = pd.to_numeric(df['University_GPA'], errors='coerce')
    
    return df

df = load_data()

# Sidebar layout
st.sidebar.header("")  # Empty header for spacing

# Add logo at the top (replace with your image path or URL)
st.sidebar.image("Logo.png", 
                 width=200,  # Adjust width as needed
                 use_container_width=True)  # Responsive sizing
st.sidebar.header("Filter Data")

# Dynamic field selection based on available data
available_fields = df['Field_of_Study'].unique() if 'Field_of_Study' in df.columns else []
selected_fields = st.sidebar.multiselect(
    "Select Fields of Study",
    options=available_fields,
    default=available_fields[:min(3, len(available_fields))] if len(available_fields) > 0 else []
)

# GPA filter (if column exists)
if 'University_GPA' in df.columns:
    gpa_range = st.sidebar.slider(
        "University GPA Range",
        min_value=float(df['University_GPA'].min()),
        max_value=float(df['University_GPA'].max()),
        value=(float(df['University_GPA'].min()), float(df['University_GPA'].max()))
    )
else:
    gpa_range = (0, 4)  # Default range if column doesn't exist

# Salary filter (if column exists)
if 'Starting_Salary' in df.columns:
    salary_range = st.sidebar.slider(
        "Starting Salary Range ($)",
        min_value=int(df['Starting_Salary'].min()),
        max_value=int(df['Starting_Salary'].max()),
        value=(int(df['Starting_Salary'].min()), int(df['Starting_Salary'].max()))
    )
else:
    salary_range = (0, 100000)  # Default range if column doesn't exist

# Gender filter (if column exists)
if 'Gender' in df.columns:
    gender_options = ['All'] + list(df['Gender'].unique())
    gender_filter = st.sidebar.radio(
        "Gender",
        options=gender_options,
        index=0
    )
else:
    gender_filter = 'All'

# Add copyright at the bottom
st.sidebar.markdown("---")  # Horizontal line separator
st.sidebar.markdown("""
<style>
    .copyright {
        font-size: 0.8em;
        color: #666;
        text-align: center;
        margin-top: 20px;
    }
</style>
<div class='copyright'>
    © 2025 RusselJay Corporation<br>
    All Rights Reserved
</div>
""", unsafe_allow_html=True)

st.sidebar.markdown("""
<script>
    // Animate sidebar elements
    setTimeout(function(){
        const sidebar = parent.document.querySelectorAll('[data-testid="stSidebar"] > div');
        sidebar.forEach((el, i) => {
            el.style.animation = `fadeInUp 0.6s ease-out ${i * 0.1}s forwards`;
            el.style.opacity = 0;
        });
    }, 100);
</script>
""", unsafe_allow_html=True)


# Apply filters
filtered_df = df.copy()

if len(selected_fields) > 0 and 'Field_of_Study' in df.columns:
    filtered_df = filtered_df[filtered_df['Field_of_Study'].isin(selected_fields)]

if 'University_GPA' in df.columns:
    filtered_df = filtered_df[
        (filtered_df['University_GPA'] >= gpa_range[0]) &
        (filtered_df['University_GPA'] <= gpa_range[1])
    ]

if 'Starting_Salary' in df.columns:
    filtered_df = filtered_df[
        (filtered_df['Starting_Salary'] >= salary_range[0]) &
        (filtered_df['Starting_Salary'] <= salary_range[1])
    ]

if gender_filter != 'All' and 'Gender' in df.columns:
    filtered_df = filtered_df[filtered_df['Gender'] == gender_filter]

with st.container():
    # Main content
    st.title("🎓 Undergraduate Employability in the US (as of February 2025)")
    st.markdown("Analyzing the relationship between academic performance, field of study, and employment outcomes")

    # KPI cards
    st.subheader("Key Performance Indicators")
    col1, col2, col3, col4, col5 = st.columns(5)

    with col1:
        if 'University_GPA' in filtered_df.columns:
            st.metric("Average CGPA", f"{filtered_df['University_GPA'].mean():.2f}")
        else:
            st.metric("Average CGPA", "N/A")

    with col2:
        if 'Starting_Salary' in filtered_df.columns:
            st.metric("Average Salary", f"${filtered_df['Starting_Salary'].mean():,.0f}")
        else:
            st.metric("Average Salary", "N/A")

    with col3:
        if 'Job_Offers' in filtered_df.columns:
            st.metric("Total Job Offers", filtered_df['Job_Offers'].sum())
        else:
            st.metric("Total Job Offers", "N/A")

    with col4:
        if 'Internships_Completed' in filtered_df.columns:
            st.metric("Total Internships", filtered_df['Internships_Completed'].sum())
        else:
            st.metric("Total Internships", "N/A")

    with col5:
        if 'Projects_Completed' in filtered_df.columns:
            st.metric("Total Projects", filtered_df['Projects_Completed'].sum())
        else:
            st.metric("Total Projects", "N/A")

    # Charts Section
    st.subheader("Data Visualizations")

    # First Row of Charts
    col1, col2 = st.columns(2)

    with col1:
        if 'Field_of_Study' in filtered_df.columns:
            st.markdown("#### Percentage of Courses Majored in US")
            field_counts = filtered_df['Field_of_Study'].value_counts()
            fig = px.pie(field_counts, 
                        values=field_counts.values, 
                        names=field_counts.index,
                        hole=0.3)
            st.plotly_chart(fig, use_container_width=True)
            with st.expander("📌 Interpretation Guide", expanded=False):
                st.markdown("""
                **What this shows:**  
                • Relative popularity of different fields among graduates  
                **How to use it:**  
                • Larger slices = More common majors  
                • Compare STEM vs Humanities proportions  
                **Pro Tip:**  
                • Click slices to isolate specific fields  
                """)
        else:
            st.warning("Field_of_Study column not found in data")

    with col2:
        if 'Field_of_Study' in filtered_df.columns and 'Starting_Salary' in filtered_df.columns:
            st.markdown("#### Annual Starting Salary by Field")
            fig = px.box(filtered_df, 
                        x='Field_of_Study', 
                        y='Starting_Salary',
                        color='Field_of_Study')
            fig.update_layout(showlegend=False)
            st.plotly_chart(fig, use_container_width=True)
            with st.expander("📌 Interpretation Guide", expanded=False):
                st.markdown("""
                **What this shows:**  
                • Relative popularity of different fields among graduates  
                **How to use it:**  
                • Larger slices = More common majors  
                • Compare STEM vs Humanities proportions  
                **Pro Tip:**  
                • Click slices to isolate specific fields  
                """)
        else:
            st.warning("Required columns for salary analysis not found")

    # Second Row of Charts
    col1, col2 = st.columns(2)

    with col1:
            if 'University_GPA' in filtered_df.columns and 'Job_Offers' in filtered_df.columns:
                st.markdown("#### GPA Distribution by Number of Job Offers")
                
                # Convert Job_Offers to categorical for better grouping
                filtered_df['Job_Offers_Cat'] = filtered_df['Job_Offers'].astype(str) + " Offer(s)"
                
                #Violin plot 
                fig = px.violin(
                    filtered_df,
                    x='Job_Offers_Cat',
                    y='University_GPA',
                    color='Job_Offers_Cat',
                    box=True,  # Show box plot inside violin
                    title="GPA Distribution by Job Offers"
                )
                
                fig.update_layout(
                    xaxis_title="Number of Job Offers",
                    yaxis_title="University GPA",
                    showlegend=False
                )
                st.plotly_chart(fig, use_container_width=True)
                with st.expander("🎓 GPA vs Offers Guide", expanded=False):
                    st.markdown("""
                    **Violin Plot Features:**  
                    • Width = Density of students at each GPA level  
                    • Inner box = Traditional boxplot statistics  
                    **Career Insights:**  
                    • Thicker sections = Common GPA ranges for each offer count   
                    • Narrow violins = Consistent GPA patterns  
                    • Wide bases = Diverse academic performance  
                    """)
            else:
                st.warning("Required columns for GPA vs Job Offers analysis not found")

    with col2:
        if 'SAT_Score' in filtered_df.columns:
            st.markdown("#### SAT Score Distribution")
            fig = px.histogram(filtered_df, 
                            x='SAT_Score', 
                            nbins=20,
                            color='Field_of_Study' if 'Field_of_Study' in filtered_df.columns else None)
            st.plotly_chart(fig, use_container_width=True)
            with st.expander("📝 SAT Analysis Notes", expanded=False):
                st.markdown("""
                **Patterns to Observe:**  
                • Left skew = Most students scored high  
                • Right skew = Many had test-taking challenges  
                **Admissions Context:**  
                • Compare peaks between fields  
                • 1200-1400 = Typical competitive range  
                **Correlation Check:**  
                • Filter high SAT scores to see if GPA/salary increases  
                """)
        else:
            st.warning("SAT_Score column not found in data")

    # Third Row of Charts - Replacements
    col1, col2 = st.columns(2)

    with col1:
        if 'Field_of_Study' in filtered_df.columns and 'Starting_Salary' in filtered_df.columns:
            st.markdown("#### Average Salary by Field of Study")
            avg_salary = filtered_df.groupby('Field_of_Study')['Starting_Salary'].mean().sort_values()
            fig = px.bar(avg_salary, 
                        x=avg_salary.values, 
                        y=avg_salary.index,
                        orientation='h',
                        color=avg_salary.values,
                        color_continuous_scale='Blues',
                        title='Average Starting Salary by Field')
            fig.update_layout(yaxis_title="Field of Study", xaxis_title="Average Salary ($)")
            st.plotly_chart(fig, use_container_width=True)
            with st.expander("💸 Salary Benchmarking", expanded=False):
                st.markdown("""
                **Horizontal Bars Show:**  
                • Exact average salaries per field  
                • Color intensity = Higher salaries  
                **Strategic Insights:**  
                • Longest bars = Most lucrative fields  
                • Compare adjacent fields (e.g., CS vs Engineering)  
                **Caveat:**  
                • Averages can hide entry-level vs senior pay differences  
                """)
        else:
            st.warning("Required columns for salary analysis not found")

    with col2:
        if 'University_GPA' in filtered_df.columns and 'Starting_Salary' in filtered_df.columns:
            st.markdown("#### GPA vs Salary Correlation")
            fig = px.scatter(filtered_df,
                            x='University_GPA',
                            y='Starting_Salary',
                            color='Field_of_Study' if 'Field_of_Study' in filtered_df.columns else None,
                            trendline="ols",
                            marginal_x="histogram",
                            marginal_y="histogram")
            fig.update_layout(xaxis_title="University GPA", yaxis_title="Starting Salary ($)")
            st.plotly_chart(fig, use_container_width=True)
            with st.expander("📈 Trend Analysis", expanded=False):
                st.markdown("""
                **Key Components:**  
                • Dots = Individual graduates  
                • Trendline = Overall relationship  
                • Side histograms = Distribution of each variable  
                **Career Implications:**  
                • Steep trend = GPA strongly affects starting pay  
                • Flat trend = Other factors dominate  
                **Field Differences:**  
                • Compare color clusters (different majors)  
                """)
        else:
            st.warning("Required columns for GPA vs Salary analysis not found")

    # Fourth Row of Charts - Replacements
    col1, col2 = st.columns(2)

    with col1:
        if 'Internships_Completed' in filtered_df.columns and 'Field_of_Study' in filtered_df.columns:
            st.markdown("#### Internship Completion by Field")
            internship_counts = filtered_df.groupby(['Field_of_Study', 'Internships_Completed']).size().reset_index(name='Count')
            fig = px.bar(internship_counts,
                        x='Field_of_Study',
                        y='Count',
                        color='Internships_Completed',
                        barmode='group',
                        title='Internship Completion Count by Field')
            fig.update_layout(xaxis_title="Field of Study", yaxis_title="Number of Students")
            st.plotly_chart(fig, use_container_width=True)
            with st.expander("🛠️ Experience Matters", expanded=False):
                st.markdown("""
                **Grouped Bars Reveal:**  
                • Internship participation rates per field  
                • Stack height = Total students in each field  
                **Career Preparation Insights:**  
                • Fields with more 2+ internship students = Strong industry pipelines  
                • Low internship fields may rely on academic projects  
                **Action Item:**  
                • Compare with salary/job offer charts  
                """)
        else:
            st.warning("Required columns for internship analysis not found")

    with col2:
        if 'Employment_Status' in filtered_df.columns:
            st.markdown("#### Employment Status Distribution")
            status_counts = filtered_df['Employment_Status'].value_counts()
            fig = px.pie(status_counts,
                        values=status_counts.values,
                        names=status_counts.index,
                        hole=0.4,
                        title='Current Employment Status of Graduates')
            st.plotly_chart(fig, use_container_width=True)
        elif 'Job_Offers' in filtered_df.columns:
            st.markdown("#### Job Offer Distribution")
            fig = px.histogram(filtered_df,
                            x='Job_Offers',
                            nbins=10,
                            color='Field_of_Study' if 'Field_of_Study' in filtered_df.columns else None,
                            title='Distribution of Job Offers Received')
            st.plotly_chart(fig, use_container_width=True)
            with st.expander("🏆 Outcomes Breakdown", expanded=False):
                st.markdown("""
                **Pie Chart Shows:**  
                • Immediate post-graduation outcomes  
                **Critical Metrics:**  
                • Full-Time % = Quick employment rate  
                • Unemployed % = Potential issues  
                **Deep Dive:**  
                • Filter by field to see which majors struggle  
                • Compare with internship participation  
                """)
        else:
            st.warning("No employment-related columns found in data")

    # Raw data view
    st.subheader("Filtered Data Preview")
    st.dataframe(filtered_df.head(100), height=300)

    # Add some explanatory text
    st.markdown("""
    ### Insights:
    - Explore how different fields of study compare in terms of employment outcomes
    - Filter data using the sidebar to focus on specific student groups
    - Hover over charts for detailed information
    - Missing visualizations indicate required columns not found in the data
    """)

    # Add download button for filtered data
    csv = filtered_df.to_csv(index=False).encode('utf-8')
    st.download_button(
        label="Download filtered data as CSV",
        data=csv,
        file_name="filtered_employability_data.csv",
        mime="text/csv"
    )

## Conclusion

This project successfully transformed raw educational data into actionable insights through an interactive analytical dashboard. The findings challenge conventional wisdom in three key ways: (1) while STEM fields yield higher salaries, business and professional programs demonstrate superior job placement rates; (2) academic achievement matters most for students without substantial practical experience; and (3) internship programs disproportionately benefit humanities majors, potentially narrowing inter-disciplinary outcome gaps.

For universities, the evidence strongly supports expanding internship partnerships—particularly for non-STEM departments—and implementing transparent salary reporting by gender and major to address disparities. Future research should track long-term career progression and incorporate student debt data to assess true ROI of degree programs. The dashboard's modular design allows for ongoing updates as new data becomes available, creating a living tool for academic policy decisions. Ultimately, this work highlights that employability is multidimensional, where strategic combinations of academic rigor, practical experience, and field-specific mentoring yield optimal outcomes.