In [10]:
import pandas as pd
import altair as alt

alt.data_transformers.disable_max_rows()

url = "https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_data/main/licenses_fall2022.csv"
df = pd.read_csv(url)


df.head()

Unnamed: 0,_id,License Type,Description,License Number,License Status,Business,Title,First Name,Middle,Last Name,...,Specialty/Qualifier,Controlled Substance Schedule,Delegated Controlled Substance Schedule,Ever Disciplined,LastModifiedDate,Case Number,Action,Discipline Start Date,Discipline End Date,Discipline Reason
0,1189509,DETECTIVE BOARD,PERMANENT EMPLOYEE REGISTRATION,129446286.0,NOT RENEWED,N,,EILEEN,,SANTACRUZ,...,,,,N,03/18/2022,,,,,
1,801037,DETECTIVE BOARD,FIREARM CONTROL CARD,229030294.0,NOT RENEWED,N,,DAGMAR,J,NORDLUND,...,,,,N,08/16/2006,,,,,
2,365129,COSMO,LICENSED COSMETOLOGIST,11053076.0,NOT RENEWED,N,,RADOJE,,ZELENOVIC,...,,,,N,05/26/2006,,,,,
3,595427,COSMO,LICENSED COSMETOLOGIST,11295645.0,ACTIVE,N,,BECKY SUE,L,BURROUGHS,...,,,,N,11/12/2021,,,,,
4,653668,COSMO,LICENSED NAIL TECHNICIAN,169006247.0,NOT RENEWED,N,,BILL G,L,LETNER,...,,,,N,05/30/2006,,,,,


In [11]:
df.columns


Index(['_id', 'License Type', 'Description', 'License Number',
       'License Status', 'Business', 'Title', 'First Name', 'Middle',
       'Last Name', 'Prefix', 'Suffix', 'Business Name', 'BusinessDBA',
       'Original Issue Date', 'Effective Date', 'Expiration Date', 'City',
       'State', 'Zip', 'County', 'Specialty/Qualifier',
       'Controlled Substance Schedule',
       'Delegated Controlled Substance Schedule', 'Ever Disciplined',
       'LastModifiedDate', 'Case Number', 'Action', 'Discipline Start Date',
       'Discipline End Date', 'Discipline Reason'],
      dtype='object')

In [15]:
license_counts = df.groupby(['License Type', 'License Status']).size().reset_index(name='Count')

chart1 = alt.Chart(license_counts).mark_bar().encode(
    x=alt.X('License Type:N', sort='-y', axis=alt.Axis(labelAngle=-45)),
    y='Count:Q',
    color='License Status:N',
    tooltip=['License Type', 'License Status', 'Count']
).properties(
    width=600,
    height=400,
    title='License Distribution by Type and Status'
)

chart1.save('chart1.html')
chart1

## Plot 1 Write-Up

This first chart shows how many business licenses belong to each license type and how those licenses are separated by status. I used a bar chart because it clearly shows differences between license types. The x-axis lists the license types, and the y-axis shows the number of licenses in each group. I added color to represent the different license statuses, which helps show how many licenses within each type are Active, Not Renewed, or in another status. Before making the chart, I grouped the data by both License Type and License Status and counted how many records were in each category so the chart would display accurate totals. 



In [18]:
county_counts = df.groupby('County').size().reset_index(name='Count').nlargest(15, 'Count')

selection = alt.selection_point(fields=['County'])

chart2 = alt.Chart(county_counts).mark_bar().encode(
    x=alt.X('Count:Q'),
    y=alt.Y('County:N', sort='-x'),
    color=alt.condition(selection, alt.value('steelblue'), alt.value('lightgray')),
    tooltip=['County', 'Count']
).add_params(
    selection
).properties(
    width=600,
    height=400,
    title='Top 15 Counties by Number of Licenses (Click to Highlight)'
)

chart2.save('chart2.html')
chart2

## Plot 2 Write-Up

The second chart shows the top 15 counties with the highest number of business licenses. I used a horizontal bar chart because it makes it easy to compare which counties have the most licenses. The x-axis shows the number of licenses, and the y-axis lists each county. I added color that changes when you click on a bar, which highlights the selected county. I also kept tooltips so you can hover and see the exact count. Before plotting, I grouped the data by County, counted the number of licenses in each one, and then selected the 15 counties with the highest totals.

The interactivity in my second chart lets you click on a county to highlight it. This helps make the visualization clearer because it focuses your attention on the specific county you want to examine instead of showing all bars equally. It also makes it easier to compare values since the selected bar stands out while the others fade. This is a direct interactive feature that helps pinpoint specific visuals for the user.