# Final Project, Part 2

The purpose of this assignment is to create a "Viz for Experts" with an interactive dashboard interface for exploring your data.

For this submission option, you will submit your work through this Workspace.
    
**Please see Homework Prompt in PrairieLearn interface for more details on the requirements for this assignment.**

A rough outline of elements of code and write-up is shown below:

In [2]:
import pandas as pd
import numpy as np


data_1 = pd.read_csv("Lobbyist_Entity_Client_Data_Daily_20251203.csv")
data_2 = pd.read_csv("Active_Lobbying_Entities_and_Their_Clients_20251203.csv")

df = pd.merge(data_1, data_2, how = "left", left_on = "CLIENT_ID", right_on = "CLIENT_ID")


df["ENT_REG_YEAR"] = df["ENT_REG_YEAR_x"]
df["CLIENT_NAME"] = df["CLIENT_NAME_x"]


df = df.drop(columns = ["CLIENT_NAME_y", "ENT_REG_YEAR_y", "LOBBYIST_LNAME", "LOBBYIST_FNAME", "LOBBYIST_MNAME", "LOBBYIST_PHONE", "ENT_ID",
                       "ENT_NAME", "CLIENT_NAME_x", "ENT_REG_YEAR_x", "LOBBYIST_ADDR1", "LOBBYIST_ADDR2", "LOBBYIST_ST_ABBR",
                       "LOBBYIST_ZIP", "ENT_ADDR1", "ENT_ADDR2", "ENT_ST_ABBR", "ENT_ZIP", "CLIENT_ADDR1", "CLIENT_ADDR2",
                       "CLIENT_ST_ABBR", "CLIENT_ZIP"])

city_corrections = {
    "CHICAG": "CHICAGO",
    "CHICAGO,": "CHICAGO",
    "CHICGO": "CHICAGO",
    "CHICGO": "CHICAGO",
    "FT LAUDERDALE": "FORT LAUDERDALE",
    "FT. LAUDERDALE": "FORT LAUDERDALE",
    "HICKORY  HILLS": "HICKORY HILLS",
}

df["CLIENT_CITY"] = df["CLIENT_CITY"].replace(city_corrections)

df = df[df["CLIENT_STATUS"] == "ACTIVE"]
df.head(20)

Unnamed: 0,LOBBYIST_ID,LOBBYIST_EMAIL,LOBBYIST_CITY,LOBBYIST_STATUS,ENT_CITY,CLIENT_ID,CLIENT_CITY,CLIENT_STATUS,ENTITY_ID,ENTITY_NAME,ENT_REG_YEAR,CLIENT_NAME
97,799,JKELLY@ALLCIRCO.COM,OAK BROOK,ACTIVE,OAK BROOK,1022,SPRINGFIELD,ACTIVE,44.0,"ALL-CIRCO, INC.",2021,"MARQUARDT, ROGER C. & COMPANY, INC."
98,799,JKELLY@ALLCIRCO.COM,OAK BROOK,ACTIVE,OAK BROOK,1022,SPRINGFIELD,ACTIVE,6656.0,JOHN J. MILLNER AND ASSOCIATES INC.,2021,"MARQUARDT, ROGER C. & COMPANY, INC."
99,799,JKELLY@ALLCIRCO.COM,OAK BROOK,ACTIVE,OAK BROOK,1022,SPRINGFIELD,ACTIVE,5295.0,LIZ BROWN-REEVES CONSULTING,2021,"MARQUARDT, ROGER C. & COMPANY, INC."
100,799,JKELLY@ALLCIRCO.COM,OAK BROOK,ACTIVE,OAK BROOK,1022,SPRINGFIELD,ACTIVE,44.0,"ALL-CIRCO, INC.",2021,"MARQUARDT, ROGER C. & COMPANY, INC."
101,799,JKELLY@ALLCIRCO.COM,OAK BROOK,ACTIVE,OAK BROOK,1022,SPRINGFIELD,ACTIVE,6656.0,JOHN J. MILLNER AND ASSOCIATES INC.,2021,"MARQUARDT, ROGER C. & COMPANY, INC."
102,799,JKELLY@ALLCIRCO.COM,OAK BROOK,ACTIVE,OAK BROOK,1022,SPRINGFIELD,ACTIVE,5295.0,LIZ BROWN-REEVES CONSULTING,2021,"MARQUARDT, ROGER C. & COMPANY, INC."
103,799,JKELLY@ALLCIRCO.COM,OAK BROOK,ACTIVE,OAK BROOK,1022,SPRINGFIELD,ACTIVE,44.0,"ALL-CIRCO, INC.",2021,"MARQUARDT, ROGER C. & COMPANY, INC."
104,799,JKELLY@ALLCIRCO.COM,OAK BROOK,ACTIVE,OAK BROOK,1022,SPRINGFIELD,ACTIVE,6656.0,JOHN J. MILLNER AND ASSOCIATES INC.,2021,"MARQUARDT, ROGER C. & COMPANY, INC."
105,799,JKELLY@ALLCIRCO.COM,OAK BROOK,ACTIVE,OAK BROOK,1022,SPRINGFIELD,ACTIVE,5295.0,LIZ BROWN-REEVES CONSULTING,2021,"MARQUARDT, ROGER C. & COMPANY, INC."
106,799,JKELLY@ALLCIRCO.COM,OAK BROOK,ACTIVE,OAK BROOK,1022,SPRINGFIELD,ACTIVE,44.0,"ALL-CIRCO, INC.",2021,"MARQUARDT, ROGER C. & COMPANY, INC."


## Code:

 * An interactive dashboard within your Workspace that helps an expert explore your dataset thoroughly.
 * There should be a "dashboard" type aspect to this - i.e. a linked view exploring your dataset in an interactive way (like in Lab \#4) with [bqplot](https://bqplot.github.io/bqplot/).
 * Do not delete any cells, *just comment them out*. Show your work.



In [3]:
import bqplot as bq
from ipywidgets import interact
import ipywidgets as widgets

city_list = sorted(df["CLIENT_CITY"].dropna().unique())

@interact(
    year = sorted(df["ENT_REG_YEAR"].dropna().unique()),
    city = widgets.Dropdown(options = city_list, description = "City:", value="CHICAGO"))

def top_5_entities(year, city):
    subset = df[df["ENT_REG_YEAR"] == year]
    subset = subset[subset["CLIENT_CITY"] == city]
    grouped = (subset.groupby(["ENTITY_ID", "ENTITY_NAME"])["CLIENT_NAME"]
        .nunique()
        .reset_index()
        .sort_values("CLIENT_NAME", ascending=False)
        .head(5)
    )

    x_sc = bq.OrdinalScale()
    y_sc = bq.LinearScale()

    bars = bq.Bars(x = grouped["ENTITY_NAME"].tolist(), y = grouped["CLIENT_NAME"].tolist(), scales = {"x": x_sc, "y": y_sc})

    x_ax = bq.Axis(scale = x_sc, tick_rotate = 360, label = "Entities")
    y_ax = bq.Axis(scale = y_sc, orientation = "vertical", label = "Client Number")

    fig = bq.Figure(marks = [bars], axes = [x_ax, y_ax], title = f"Top 5 Entities by Number of Clients City: {city} {year}")
    fig.layout.width = "1500px"
    fig.layout.height = "500px"

    display(fig)


interactive(children=(Dropdown(description='year', options=(2021, 2022, 2023, 2024, 2025), value=2021), Dropdo…

In [9]:
import bqplot as bq
import ipywidgets as widgets
from ipywidgets import interact
import numpy as np
import json
from ipywidgets.embed import dependency_state

city_list = sorted(df["CLIENT_CITY"].dropna().unique())

@interact(
    Year = sorted(df["ENT_REG_YEAR"].dropna().unique()),
    City = widgets.Dropdown(options = city_list, description = "City:", value = "CHICAGO"))

def top_5_entities(Year, City):
    subset = df[(df["ENT_REG_YEAR"] == Year) & (df["CLIENT_CITY"] == City)]
    grouped = (subset.groupby(["ENTITY_ID", "ENTITY_NAME"])["CLIENT_NAME"]
               .nunique()
               .reset_index()
               .sort_values("CLIENT_NAME", ascending=False)
               .head(5))

    x_sc = bq.OrdinalScale()
    y_sc = bq.LinearScale()

    bars = bq.Bars(
        x = grouped["ENTITY_NAME"].tolist(),
        y = grouped["CLIENT_NAME"].tolist(),
        scales = {"x": x_sc, "y": y_sc},
        interactions = {"click": "select"},
        selected_style = {"fill": "green"}
    )

    x_ax = bq.Axis(scale = x_sc, tick_rotate = 365, label="Entities")
    y_ax = bq.Axis(scale = y_sc, orientation = "vertical", label="Client Count")

    main_fig = bq.Figure(
        marks = [bars], 
        axes = [x_ax, y_ax],
        title = f"Top 5 Entities in {City} {Year}"
    )
    main_fig.layout.width = "800px"
    main_fig.layout.height = "400px"

    years_x_sc = bq.OrdinalScale()
    years_y_sc = bq.LinearScale()

    detail_bars = bq.Bars(x = [], y = [], scales = {"x": years_x_sc, "y": years_y_sc})

    detail_x_ax = bq.Axis(scale = years_x_sc, tick_rotate = 360, label = "Year")
    detail_y_ax = bq.Axis(scale = years_y_sc, orientation = "vertical", label = "Number of Clients")

    years_bar = bq.Figure(marks = [detail_bars],axes = [detail_x_ax, detail_y_ax])
    years_bar.layout.width = "800px"
    years_bar.layout.height = "400px"

    def on_selection(change):
        sel = change["new"]
        if not sel:
            return
        selected_bar = sel[0]
        
        entity = grouped.iloc[selected_bar]["ENTITY_NAME"]

        detail_df = (df[df["ENTITY_NAME"] == entity]
                     .groupby("ENT_REG_YEAR")["CLIENT_NAME"]
                     .nunique()
                     .reset_index(name = "count")
                     .sort_values("ENT_REG_YEAR"))

        detail_bars.x = detail_df["ENT_REG_YEAR"].astype(str).tolist()
        detail_bars.y = detail_df["count"].tolist()

        years_bar.title = f"{entity}'s clients"

    bars.observe(on_selection, names=["selected"])

    display(widgets.HBox([main_fig, years_bar]))

    widget_state = dependency_state(widgets.HBox([main_fig, years_bar]))
    with open("interactive_graph.json", "w") as f:
        json.dump(widget_state, f)
    print("interactive_graph.json saved!")

interactive(children=(Dropdown(description='Year', options=(2021, 2022, 2023, 2024, 2025), value=2021), Dropdo…

## Prose:

* One paragraph explaining how to use the dashboard you created, to help someone who is not an expert understand your dataset.
* A list of 1 or more contextual datasets you have identified, links to where they reside, and a sentence about why they might be useful in telling the final story.
  * by "contextual dataset" here means a dataset that would add context to your chosen dataset. For example, if your dataset is the Champaign bus routes, some interesting contextual datasets could be the Chicago bus routes, or the Springfield bus routes, or the Amtrak routes in Champaign
  * you do not have to do anything with this dataset at the moment beyond writing a bit about why it would be useful. Looking forward, you will want to include "contextual visualizations" (which you may or may not generate on your own) in your Final Project, Part 3 and identifying a possibly useful dataset is a great way to start looking for contextual visualizations.
* If you have identified your dataset as a "large one" (i.e. larger than the GitHub file upload limit) comment on if you want to revise your plan for hosting this data or not. If this does not apply to your dataset please explicitly state this.
* Additionally, please note that as of writing, it is not possible to embed images within Starboard. Be sure to address how you plan on including your contextual dataset to add context to your main dataset given that you won"t be able to directly embed images if you plan on using Starboard for Part 3.1 of the Final Project.


# Group: Jaylin Chen and York Li
# ------------------------------------
# For this plot, I used two datasets from the illinois governmental data website. The first dataset is about 
# lobbying entities in the state of illinois, the second one is about their clients. I joined both datasets by
# the client key, after extensieve data cleaning, such as eliminating extra and renaming certain columns
# We ended up with a dataset of companies and entities that lobby for them, and the amount of lobbists they have
# grouping by company name, we were able to see how many lobbing entities, lobbist there are for each company

## Plot Summary

Summarize the characteristics of the dataset in words: what does it represent, what are the fields/columns/rows, what data types are they, etc.

In [5]:
#df["CLIENT_NAME"].nunique()

In [6]:
#client_count = df.groupby("CLIENT_NAME").size().reset_index(name = "lobbist_count").sort_values("lobbist_count", 
                                                                                                  #ascending = False)

# For these graphs I made, I decided to make two bargraphs. How the graph works is that it is made with the x-axis being
# the list of "Entities" per city selected with the y-axis being the client count. I created an interactive bargraph where you
# are able to select a bar on the main graph (left) which allows the second graph (right) to display years from 2021-2025.
# The x-axis is a qualitative data type while the y-axis is quantitative

In [8]:
import json
from ipywidgets.embed import embed_minimal_html, dependency_state

# Save widget state to JSON
state = dependency_state(main_fig)
with open("interactive_graph.json", "w") as f:
    json.dump(state, f)

print("Saved → interactive_graph.json")


NameError: name 'main_fig' is not defined