### Analyzing the Gender Gap in Undergraduate Degrees

I have incorporated concepts like interactive plots using Plotly Dash and multithreading with Flask.

1. I visualized the comparison between degrees earned by women and men (100 - number of degrees earned by women) in line charts. These line charts can be viewed by choosing the appropriate category and degree from radio items.
2. The average of degrees earned by women and men is ranked in descending order in an interactive bar graph with sliders. You can slide right and left to see as many ranks as you want.

### About the Dataset:

The US Department of Education released a dataset containing the percentage of bachelor's degrees granted to women from 1970 to 2012. The dataset I am using below was cleansed by Randal Olson, a data scientist at the University of Pennsylvania.

In [None]:
# To build interactive dashboards using python
pip install dash

In [7]:
from threading import Thread
from flask import Flask
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from dash import Dash, dcc, html, Input, Output, callback
import plotly.graph_objects as go
%matplotlib inline

In [19]:
filepath='women-bachelors.csv'
df = pd.read_csv(filepath, header=0)
df.head()

In [18]:
# Replacing missing values with NaN
df.replace("?",np.nan,inplace = True)

# Checks if there's null values and return true count, else false
# There are no missing values here
missing_data = df.isnull()
for column in missing_data.columns.values.tolist():
      print (missing_data[column].value_counts())
    print("")

In [23]:
# Lists for degree categories
Stem = [
    'Agriculture', 'Architecture', 'Biology', 'Computer Science', 'Engineering',
    'Health Professions', 'Math and Statistics', 'Physical Sciences'
]

Humanities_and_Arts = [
    'Art and Performance', 'Communications and Journalism', 'English', 'Foreign Languages'
]

Social_Sciences_and_Professional_Studies = [
    'Business', 'Education', 'Psychology', 'Public Administration', 'Social Sciences and History'
]


# Add columns for men's degrees 100 - no of degrees earned by women
for col in stem + humanities_and_arts + social_sciences_and_professional_studies:
    if col in df.columns:
        df[col + '-men'] = 100 - df[col]

In [39]:
# Interactive visualization of gender gap using flask, dash, plotly

# Flask server for app1
app1 = Dash(__name__, external_stylesheets=['https://codepen.io/chriddyp/pen/bWLwgP.css'])

# Create a dictionary that includes all categories 
all_options = {
    'Stem': Stem,
    'Humanities_and_Arts': Humanities_and_Arts,
    'Social_Sciences_and_Professional_Studies': Social_Sciences_and_Professional_Studies 
}

# Set up the app layout with 2 radio items and 1 graph
app1.layout = html.Div([
    dcc.RadioItems(list(all_options.keys()), 'Stem', id='categories-radio'),
    html.Hr(),
    dcc.RadioItems(id='degrees-radio'),
    html.Hr(),
    dcc.Graph(id='display-selected-plots')
])

# Callbacks are chainable elements that are automatically called whenever a UI element changes

# 1st callback gets selected category 'stem' as input, returns list of stem degrees as output

@app1.callback(
    Output('degrees-radio', 'options'),
    Input('categories-radio', 'value')
)
def set_degrees_options(selected_category):
    return [{'label': i, 'value': i} for i in all_options[selected_category]]

# 2nd callback gets list of stem degrees as input, returns selected degree as output 

@app1.callback(
    Output('degrees-radio', 'value'),
    Input('degrees-radio', 'options')
)
def set_degrees_value(available_options):
    return available_options[0]['value']

# 3rd callback gets selected degree and returns the respective line graph

@app1.callback(
    Output('display-selected-plots', 'figure'),
    Input('categories-radio', 'value'),
    Input('degrees-radio', 'value')
)
def set_display_plots(selected_category, selected_degree):
    return plot_degrees(df, selected_degree)

def plot_degrees(df, degree):
    fig = go.Figure()
    if degree in df.columns:
        fig.add_trace(go.Scatter(x=df['Year'], y=df[degree], mode='lines+markers', name=f'{degree} (Women)'))
        fig.add_trace(go.Scatter(x=df['Year'], y=df[degree + '-men'], mode='lines+markers', name=f'{degree} (Men)', line=dict(dash='dash')))

        fig.update_layout(
            title=f'Degrees in {degree}',
            xaxis_title='Year',
            yaxis_title='Number of Degrees',
            legend_title='Gender'
        )
    return fig

# Running the app in Jupyter
app1.run_server(mode='inline')


In [37]:
# Ranking the degree averages in descending order in interactive bar graphs and sliders

# Calculate the average. iloc manipulates data in tabular format, : includes all rows, 1: skips first column 'year'
means = df.iloc[:, 1:].mean(axis=0)
sortedwomen = means.sort_values(ascending=False)

# Separate women's averages ~ negates the -men
women_averages = sortedwomen[~sortedwomen.index.str.contains('-men')]

# Separate men's averages
men_averages = sortedwomen[sortedwomen.index.str.contains('-men')]

# Create 2nd app 
app2 = Dash(__name__)

app2.layout = html.Div([
    html.H4('Interactive plot'),
    dcc.Graph(id="graph"),
    html.P("Number of bars:"),
    dcc.Slider(id="slider", min=2, max=len(women_averages), value=4, step=1),
])

# Callback to get the slider as input, bargraph as output
@app2.callback(
    Output("graph", "figure"), 
    Input("slider", "value"))
def update_bar_chart(size):
    women_rounded = women_averages.round(2)
    women_data = women_rounded.iloc[:size] 
    men_trimmed_name = [name.replace('-men', '') for name in men_averages.index] # Remove '-men' suffix from degree names
    men_rounded = men_averages.round(2)
    men_data = men_rounded.iloc[:size] 

    fig = go.Figure()
    fig.add_trace(go.Bar(x=women_data.index,y=women_data,name='Women',marker=dict(color='#8FFFC2'),text=women_data, hovertemplate='Degree: %{x}<extra></extra>'))
    fig.add_trace(go.Bar(x=men_trimmed_name,y=men_data,name='Men',marker=dict(color='#8FD6FF'),text=men_data, hovertemplate='Degree: %{x}<extra></extra>'))

    fig.update_layout(
        title="Interactive plot ranking average degrees earned by women and men",
        barmode='group'  # Group bars together
    )

    return fig

# Running the app in Jupyter
app2.run_server(mode='inline')


### Conclusion: 

From the line charts, the analysis of a few degrees stood out to me. There seems to be an inverse relationship between genders in degrees like Biology, Communications and Journalism, and Psychology. While the number of degrees earned by men decreases, the number earned by women increases.

On average, over the years, women-dominated degrees include health professions, education, and public administration, while men-dominated degrees are engineering, computer science, and physical sciences.

There are some interesting observations to be made from these interactive graphs. Feel free to give them a try and leave any feedback. Enjoy!