The link of dataset used for this project:
https://sds-platform-private.s3-us-east-2.amazonaws.com/uploads/P1-US-Cities-Population.csv

In [354]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [355]:
df = pd.read_csv('P1-US-Cities-Population.csv', encoding = 'latin1')

In [356]:
state_abbr_dict = {'New York':'NY','California':'CA','Illinois':'IL','Texas':'TX','Pennsylvania':'PA',\
                   'Arizona':'AZ','Florida':'FL','Indiana':'IN','Ohio':'OH','North Carolina':'NC',\
                  'Washington':'WA','Colorado':'CO','Michigan':'MI','District of Columbia':'DC','Massachusetts':'MA',\
                  'Tennessee':'TN','Oregon':'OR','Oklahoma':'OK','Nevada':'NV','Maryland':'MD','Kentucky':'KY',\
                  'Wisconsin':'WI','New Mexico':'NM','Missouri':'MO','Georgia':'GA','Virginia':'VA','Nebraska':'NE',\
                  'Minnesota':'MN','Kansas':'KS','Louisiana':'LA',"Hawai'i":'HI','Alaska':'AK','New Jersey':'NJ',\
                  'Idaho':'ID','Alabama':'AL','Iowa':'IA','Arkansas':'AR','Utah':'UT','Rhode Island':'RI','South Dakota':'SD',\
                  'Mississippi':'MS','Connecticut':'CT','South Carolina':'SC','North Dakota':'ND','Montana':'MT','New Hampshire':'NH'}

In [357]:
df['State_ABBR'] = df['State'].map(state_abbr_dict)

In [358]:
df['latitude'] = df['Location'].str.split(' ').apply(lambda x: x[0])
df['longitude'] = df['Location'].str.split(' ').apply(lambda x: x[1])

In [359]:
df['latitude'] = df['latitude'].apply(lambda x:x[:x.find('°N')]).astype('float')
df['longitude'] = df['longitude'].apply(lambda x:x[:x.find('°W')]).astype('float')
df['2010 population density'] = df['2010 population density'].apply(lambda x:x[:x.find('p')].strip().replace(',','')).astype('float')

In [360]:
lat_bins = np.linspace(df['latitude'].min(),df['latitude'].max(),10)
long_bins = np.linspace(df['longitude'].min(),df['longitude'].max(),10)
hist2d, long_edges, lat_edges = np.histogram2d(df['longitude'], 
                                                   df['latitude'], 
                                                   weights=df['2010 population density'],
                                                  bins = [long_bins,lat_bins])
hist2d[hist2d <= 0] = np.nan
hist2d = np.log10(hist2d)
long_centers = (long_edges[:-1] + long_edges[1:]) / 2
lat_centers = (lat_edges[:-1] + lat_edges[1:]) / 2

In [361]:
i,j = 5, 1 

x_scs = bqplot.LinearScale()
y_scs = bqplot.LinearScale() 

x_axs = bqplot.Axis(label='2015 estimate', scale=x_scs)
y_axs = bqplot.Axis(label='2010 Census', scale=y_scs, 
                   orientation = 'vertical')


scatters = bqplot.Scatter(x=df['2015 estimate'][( (df['latitude'] >= lat_edges[i]) & (df['latitude'] <= lat_edges[i+1]) &\
                      (df['longitude']>=long_edges[j]) & (df['longitude'] <= long_edges[j+1]))], 
                         y =df['2010 Census'][( (df['latitude'] >= lat_edges[i]) & (df['latitude'] <= lat_edges[i+1]) &\
                      (df['longitude']>=long_edges[j]) & (df['longitude'] <= long_edges[j+1]))],
                         scales={'x':x_scs, 'y':y_scs})
fig1 = bqplot.Figure(marks=[scatters], axes=[x_axs,y_axs])

In [364]:
col_sc = bqplot.ColorScale(scheme='RdYlGn')
x_sc = bqplot.LinearScale() # for numerical data -- longitude
y_sc = bqplot.LinearScale() # numerical data -- latitude


col_ax = bqplot.ColorAxis(scale=col_sc, orientation='vertical', side='right')
x_ax = bqplot.Axis(scale=x_sc, label='Longitude')
y_ax = bqplot.Axis(scale=y_sc, orientation='vertical', label='Latitude')


heat_map = bqplot.GridHeatMap(color=hist2d, 
                             row= lat_centers, column = long_centers, 
                             scales={'color':col_sc, 'row':y_sc, 'column':x_sc}, 
                             interactions={'click':'select'}, 
                             anchor_style={'fill':'blue'})

def on_selected(change):
    if len(change['owner'].selected) == 1: 
        i,j = change['owner'].selected[0] 
        scatters.x = df['2015 estimate'][( (df['latitude'] >= lat_edges[i]) & (df['latitude'] <= lat_edges[i+1]) &\
                      (df['longitude']>=long_edges[j]) & (df['longitude'] <= long_edges[j+1]))]
        scatters.y = df['2010 Census'][( (df['latitude'] >= lat_edges[i]) & (df['latitude'] <= lat_edges[i+1]) &\
                      (df['longitude']>=long_edges[j]) & (df['longitude'] <= long_edges[j+1]))]

heat_map.observe(on_selected, 'selected')

fig = bqplot.Figure(marks=[heat_map], axes=[col_ax, x_ax, y_ax])

In [365]:
fig.layout.min_width='500px'
fig1.layout.min_width = '500px'
figures = ipywidgets.HBox([fig, fig1])
figures

HBox(children=(Figure(axes=[ColorAxis(orientation='vertical', scale=ColorScale(), side='right'), Axis(label='L…

For the above interactive dashboard, you can use it in the following methods:
1. If you want to know the population of cities within the region around (30°N,110°W), you can click the corresponding grid. You can tell the population density by the color of the grid(green means dense and red means sparse)

2. The left and right graphs are linked, if you click the grid of region around (30°N,110°W) in the left graph, the right graph will also show the 2010 population and 2015 estimated population within this region. By the right graph, you can tell how the population of cities within a certain changed by observing how the plots scatter around the line `y=x`

The dataset boosting my storytelling:
This dataset includes city population data from 2010-2019, which have more years than the currently used. I believe more broader time range will help to explain the change of population better. The download link for this dataset is https://www2.census.gov/programs-surveys/popest/tables/2010-2019/cities/totals/SUB-IP-EST2019-ANNRNK.xlsx