## 📘 Data Challenge 11 – Dash App with a Pie or Scatter Plot

Assignment Type: Partner/Group 
Estimated Time: 45–60 minutes

---
### 🎯 Targeted KSBs (Knowledge, Skills, and Behaviors)
S6 – Creates dynamic visualizations using Python (Plotly)

S7 – Designs clear dashboards using Dash

K10 – Chooses appropriate chart types for data and audience

B6 – Applies structured thinking to turn analysis into a dashboard app

---

### 📊 Scenario:
You’re helping an athletic director create a quick dashboard for just **five sports**. She wants to better understand how men’s and women’s revenue compare across a few programs. Your job is to make a simple Dash app with a pie chart or scatter plot showing the difference.

---
### ✅ Your Task (Step-by-Step)

- Load the sports.csv file using pandas.

- Drop missing values in the sports, rev_men, and rev_women columns.

- Pick 5 unique sports (your choice!) and filter the DataFrame to only include those.

    - Example: "Basketball", "Tennis", "Soccer", "Volleyball", "Golf"

- Create a new column called "Total_Revenue" by adding men’s and women’s revenue.

- Create either a pie chart or a scatter plot:

In [2]:
# Import packages 

import dash
from dash import html, dcc
import pandas as pd
import plotly.express as px

In [3]:
# Load and filter the data
df = pd.read_csv("../data/sports.csv")
df = df[["sports", "rev_men", "rev_women"]].dropna()

  df = pd.read_csv("../data/sports.csv")


In [4]:
df["sports"].unique()

array(['Basketball', 'All Track Combined', 'Tennis', 'Golf', 'Soccer',
       'Lacrosse', 'Swimming and Diving', 'Track and Field, X-Country',
       'Track and Field, Indoor', 'Track and Field, Outdoor', 'Skiing',
       'Rodeo', 'Volleyball', 'Archery', 'Wrestling', 'Swimming',
       'Other Sports', 'Water Polo', 'Fencing', 'Gymnastics', 'Rowing',
       'Sailing', 'Ice Hockey', 'Squash', 'Bowling', 'Table Tennis',
       'Equestrian', 'Diving', 'Rifle', 'Beach Volleyball',
       'Weight Lifting'], dtype=object)

In [5]:
df.head()

Unnamed: 0,sports,rev_men,rev_women
1,Basketball,1211095.0,748833.0
2,All Track Combined,183333.0,315574.0
7,Tennis,78274.0,131145.0
11,Basketball,4189826.0,1966556.0
13,Golf,407728.0,346987.0


In [6]:
# Pick 5 sports
top5 = ['Basketball',"Tennis","Golf","Soccer","Swimming"]
#Copying the dataframe to not overwrite the original 
df_5 = df[df["sports"].isin(top5)].copy()

In [7]:
# Create new column called Total_Revenue that adds up the men and women's revenue columns
df_5["Total_Revenue"] = df_5["rev_men"].sum() + df_5["rev_women"].sum()

In [8]:
df_5.head()

Unnamed: 0,sports,rev_men,rev_women,Total_Revenue
1,Basketball,1211095.0,748833.0,21962250000.0
7,Tennis,78274.0,131145.0,21962250000.0
11,Basketball,4189826.0,1966556.0,21962250000.0
13,Golf,407728.0,346987.0,21962250000.0
15,Soccer,1062855.0,944819.0,21962250000.0


In [None]:
# Make your pie or scatteplot using plotly 
fig = px.pie(df_5, names="sports", values="Total_Revenue", title="Total Revenue by Sport",)
fig.show()


In [12]:
# Make the App -- DO NOT RUN THIS CELL YET It may give you a "port already in use error if you do"

app = dash.Dash(__name__)
app.title = "My Dash App"

app.layout = html.Div([
    html.H1("Revenue Analysis for 5 Sports", style={'textAlign': 'center',
                                                    "color":"blue"}),
    dcc.Graph(figure=fig)
])

if __name__ == '__main__':
    app.run(debug=True)

### Copy and paste the code in this notebook into a file called `app.py` and run that file; then go to your localhost address:  http://localhost:8050/ to see the updated visual