In [1]:
import altair as alt
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt # shorthand

In [2]:
brush = alt.selection_interval(encodings=['x','y'])

### Preprocess the data

In [3]:
data_url = 'https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_bcubcg_fall2022/main/data/licenses_fall2022.csv'
licenses = pd.read_csv(data_url,na_values = "0")

In [4]:
licenses

Unnamed: 0,_id,License Type,Description,License Number,License Status,Business,Title,First Name,Middle,Last Name,...,Specialty/Qualifier,Controlled Substance Schedule,Delegated Controlled Substance Schedule,Ever Disciplined,LastModifiedDate,Case Number,Action,Discipline Start Date,Discipline End Date,Discipline Reason
0,1189509,DETECTIVE BOARD,PERMANENT EMPLOYEE REGISTRATION,129446286,NOT RENEWED,N,,EILEEN,,SANTACRUZ,...,,,,N,03/18/2022,,,,,
1,801037,DETECTIVE BOARD,FIREARM CONTROL CARD,229030294.0,NOT RENEWED,N,,DAGMAR,J,NORDLUND,...,,,,N,08/16/2006,,,,,
2,365129,COSMO,LICENSED COSMETOLOGIST,11053076.0,NOT RENEWED,N,,RADOJE,,ZELENOVIC,...,,,,N,05/26/2006,,,,,
3,595427,COSMO,LICENSED COSMETOLOGIST,11295645.0,ACTIVE,N,,BECKY SUE,L,BURROUGHS,...,,,,N,11/12/2021,,,,,
4,653668,COSMO,LICENSED NAIL TECHNICIAN,169006247,NOT RENEWED,N,,BILL G,L,LETNER,...,,,,N,05/30/2006,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,888281,DETECTIVE BOARD,PERMANENT EMPLOYEE REGISTRATION,129002843.0,NOT RENEWED,N,,JENNIFER,,DARROW,...,,,,N,08/03/2006,,,,,
9996,766623,DETECTIVE BOARD,FIREARM CONTROL CARD,229014180,TERMINATED CARD RETURNED,N,,BRYAN,,WILLIAMS,...,,,,N,08/07/2006,,,,,
9997,399398,COSMO,LICENSED COSMETOLOGIST,11120249,NOT RENEWED,N,,EUGENE,,HENDERSON JR,...,,,,N,05/26/2006,,,,,
9998,486713,COSMO,LICENSED COSMETOLOGIST,11193270,ACTIVE,N,,MAHLON DOUGLAS,,CLIFT,...,,,,N,12/17/2021,,,,,


### use data_transformers.enable

In [5]:
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

In [7]:
rect1 = alt.Chart.from_dict({
  "data":{"url":"https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_bcubcg_fall2022/main/data/licenses_fall2022.csv"},
  "mark":"bar",
  "height":400,
  "encoding":{
    "x":{"field": "License Type", "type": "ordinal"},
    "y":{"aggregate": "count", "title": "Count", "field": "_id", "type": "quantitative"}
  }
})
rect1

In [7]:
rect1.save('hw10_1.json')

#### The first graph explores the total number of License types. The X-axis is the name of each License Type, which is nominal data, while the Y-axis records the total number of License types, which is quantitative data. The encoding type we use is bar chart y. The encoding Type we use is bar chart Y. Bar Chart can clearly and intuitively display the number of different License types.
#### In this figure, we did the data conversion because MaxRowsError appears when I try to create a drawing that will directly embed a dataset with more than 5000 rows, so we simply disable MaxRowsError.

In [8]:
rect2 = alt.Chart.from_dict({
  "data":{"url":"https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_bcubcg_fall2022/main/data/licenses_fall2022.csv"},
  "mark":"rect",
  "height":400,
  "width":300,
  "encoding":{
    "x":{"field":"License Status","type":"ordinal"},
    "y":{"field":"State", "type":"ordinal"},
    "color":{"aggregate":"count", "type":"quantitative"} 
  }
}).add_selection(
    brush
)
rect2

In [9]:
hist2 = alt.Chart.from_dict({
  "data": {"url": "https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_bcubcg_fall2022/main/data/licenses_fall2022.csv"},
  "mark":"bar",
  "encoding": {
    "x": {"aggregate": "count", "field": "_id"},
    "y": {"field": "License Type","type":"ordinal"},
    "color":{"aggregate":"count", "type":"quantitative"} 
  }
}).transform_filter(
    brush
)

In [10]:
chart = alt.HConcatChart(hconcat=[rect2,hist2])
chart.save('sidebyside.json')
dashboard = rect2 | hist2
dashboard

#### The heatmap shows how many licenses each state and status, and the bar chart shows how many licenses each type. We used 'ordinal' in encoding type because it is easy for audiences to find the category they are interested in. The heatmap uses color gradients to show the number of licenses. The bar chart used Phoebe Ling's homework #9 plot2, but we changed the color display to the heatmap's color display. Because nowhere can we show the number in the heatmap, we use color to express the amount. We are interested in all kinds of licenses instead of a specific category, so we didn't use Python to filter the data. We use the original dataset to make the plot.   
#### We are not satisfied with just knowing the number of licenses. We want to understand the amount of licenses in different states and their status. The interactive dashboard can help us see the amount of each type under specific state and status.