Christian Basso

Intro to Data Science

Lab 3: Bokeh Dashboard

September 25, 2023

# Lab 3: Bokeh Dashboard

## Introduction

## Part 1: Display Real Estate on a Scatter Plot

First, many bokeh imports are needed for the various plots and data augmentation we are doing.

In [1]:
from bokeh.layouts import column, row
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, CheckboxButtonGroup, CustomJS, RangeSlider
from bokeh.io import output_notebook
from bokeh.transform import factor_cmap, factor_mark
import pandas as pd

In [2]:
df = pd.read_csv("sacramento.csv")
df.head(10)

Unnamed: 0,street,city,zip,state,beds,baths,sq__ft,type,sale_date,price,latitude,longitude,empty_lot,street_type
0,3526 HIGH ST,SACRAMENTO,95838,CA,2,1,836,Residential,Wed May 21 00:00:00 EDT 2008,59222,38.631913,-121.434879,False,ST
1,51 OMAHA CT,SACRAMENTO,95823,CA,3,1,1167,Residential,Wed May 21 00:00:00 EDT 2008,68212,38.478902,-121.431028,False,CT
2,2796 BRANCH ST,SACRAMENTO,95815,CA,2,1,796,Residential,Wed May 21 00:00:00 EDT 2008,68880,38.618305,-121.443839,False,ST
3,2805 JANETTE WAY,SACRAMENTO,95815,CA,2,1,852,Residential,Wed May 21 00:00:00 EDT 2008,69307,38.616835,-121.439146,False,WAY
4,6001 MCMAHON DR,SACRAMENTO,95824,CA,2,1,797,Residential,Wed May 21 00:00:00 EDT 2008,81900,38.51947,-121.435768,False,DR
5,5828 PEPPERMILL CT,SACRAMENTO,95841,CA,3,1,1122,Condo,Wed May 21 00:00:00 EDT 2008,89921,38.662595,-121.327813,False,CT
6,6048 OGDEN NASH WAY,SACRAMENTO,95842,CA,3,2,1104,Residential,Wed May 21 00:00:00 EDT 2008,90895,38.681659,-121.351705,False,WAY
7,2561 19TH AVE,SACRAMENTO,95820,CA,3,1,1177,Residential,Wed May 21 00:00:00 EDT 2008,91002,38.535092,-121.481367,False,AVE
8,11150 TRINITY RIVER DR Unit 114,RANCHO CORDOVA,95670,CA,2,2,941,Condo,Wed May 21 00:00:00 EDT 2008,94905,38.621188,-121.270555,False,DR
9,7325 10TH ST,RIO LINDA,95673,CA,3,2,1146,Residential,Wed May 21 00:00:00 EDT 2008,98937,38.700909,-121.442979,False,ST


### make_plot Method

This method will take in a ColumnDataSource and return the complete plot. The plot has a couple features beyond a normal scatter plot. First, the points are colored by residential type. Additionally, hovering over each point will show various other data for the entry. 

In [3]:
def make_plot(data_source):

    #For the colors
    TYPE = sorted(df["type"].unique())
    MARKERS = ['hex', 'circle_x', 'triangle']

    #Tooltips (hover functionality)
    TOOLTIPS = [
    ("Price", "@price"),
    ("Square Ft.", "@sq__ft"),
    ("Address", "@street"),
    ("Zipcode", "@zip"),
    ("Beds", "@beds"),
    ("Baths", "@baths"),
    ]

    #Make the scatter plot
    p = figure(width=800, height=800, tooltips = TOOLTIPS)
    p.scatter(x='latitude', y='longitude', source=data_source, size=8, fill_alpha = .4, legend_group = 'type',
              marker=factor_mark('type', MARKERS, TYPE),
              color=factor_cmap('type', 'Category10_3', TYPE))
    return p

In [4]:
source = ColumnDataSource(df)
p = make_plot(source)


#Lengend, title, and axses
p.legend.location = "top_left"
p.legend.title = "Property Type"  
p.title.text = "Latitude vs Longitude"
p.xaxis.axis_label = "Latitude"
p.yaxis.axis_label = "Longitude"


In [5]:
output_notebook()

In [6]:
show(p)

The graph above is a graph of latitude vs longitude. Hovering over each point will show various other data for the entry. This allows for easy access to outlier and edge case data.

## Part 2: Refine ColumnDataSource Object based on Search Criteria

### make_dataset method
This function will take in a ColumnDataSource and take in seperate lists for Residential type, price, baths, beds, or square feet to ajust the range seen in the scatter plot. The method will use default arguments for each variable to keep the full range for each unadressed variable. Call this method using make_dataset(type = ("residential", "condo"), square feet = (0, 2000)). 

This method will be used in part 3 to ajust the interactive plot with sliders

First lets see the max of every variable to set default ranges

In [7]:
print(df["price"].max())
print(df["baths"].max())
print(df["beds"].max())
print(df["sq__ft"].max())

884790
5
8
5822


In [8]:
def make_dataset(data, res_type = ("Residential", "Condo", "Multi-Family"), 
                 price = (0, 884790), baths = (0, 5), beds = (0, 8), sqft = (0, 5822)):
    new_df = data.copy()
    new_df = new_df[(new_df['price'] >= price[0]) & (new_df['price'] <= price[1])]
    new_df = new_df[(new_df['baths'] >= baths[0]) & (new_df['baths'] <= baths[1])]
    new_df = new_df[(new_df['beds'] >= beds[0]) & (new_df['beds'] <= beds[1])]
    new_df = new_df[(new_df['sq__ft'] >= sqft[0]) & (new_df['sq__ft'] <= sqft[1])]
    new_df = new_df[new_df['type'].isin(res_type)]
    return ColumnDataSource(new_df)


In [9]:
ranged_source = make_dataset(df, baths = (1, 2), res_type = ("Residential", ""), 
                             price = (50000, 75000), sqft = (1000, 2000), beds = (1, 2))
p = make_plot(ranged_source)
p.legend.location = "top_left"
p.legend.title = "Property Type"  
p.title.text = "Latitude vs Longitude"
p.xaxis.axis_label = "Latitude"
p.yaxis.axis_label = "Longitude"
show(p)

The graph above is a test of the make_dataset method. It is a latitude and logitude graph like the one from part 1, but it only shows properties with the following ranges:
- Price between 50k and 75k
- 1 - 2 Beds
- 1 - 2 Baths
- 1000 - 2000 sqft
- Only Residential

Hovering over each of the points reveals that they fit within the specified ranges.

## Part 3: Add Widgets and Create Interactive Visualization

In [10]:
def modify_doc(doc):
    # Instantiate here the CheckboxGroup and RangeSlider objects
    LABELS = ["Residential", "Condo", "Multi-Family"]
    housing_checkbox_group = CheckboxButtonGroup(labels=LABELS, active=[0, 1, 2])
    slider_price = RangeSlider(start=0, end=884790, value=(0, 884790), step=1000, title="Price")
    slider_baths = RangeSlider(start=0, end=5, value=(0, 5), step=1, title="Baths")
    slider_beds = RangeSlider(start=0, end=8, value=(0, 8), step=1, title="Beds")
    slider_sqft = RangeSlider(start=0, end=5822, value=(0, 5822), step=100, title="Square Feet")

    # Check the update method below to make sure you choose the same identifiers for the objects
    # create the data source by calling the method make_dataset
    source = make_dataset(df)
    # call the method make_plot
    figure_object = make_plot(source)


    # Update function takes three default parameters
    def update(attr, old, new):
        # Get the list of selected types
        selected_types = [housing_checkbox_group.labels[i] for i in housing_checkbox_group.active]
        # Make a new column source according to the selected properties
        source2 = make_dataset(df, res_type=selected_types,
        price=[slider_price.value[0], slider_price.value[1]],
        baths=[slider_baths.value[0], slider_baths.value[1]],
        beds=[slider_beds.value[0], slider_beds.value[1]],
        sqft=[slider_sqft.value[0],slider_sqft.value[1]]
        )
        # Update the data of the main source
        source.data.update(source2.data)


    housing_checkbox_group.on_change('active', update)
    slider_price.on_change('value', update)
    slider_beds.on_change('value', update)
    slider_baths.on_change('value', update)
    slider_sqft.on_change('value', update)
    controls = column(housing_checkbox_group, slider_baths, slider_beds, slider_price, slider_sqft) #pass in to the column the slider objects and the checkbox_group object
    doc.add_root(row(figure_object, controls))
    

In [11]:
import os
os.environ['BOKEH_ALLOW_WS_ORIGIN']='1b26b1l962rofu7eoubbu4j45dvs3ad2sn2u36577coorbhop2li'
os.environ['BOKEH_ALLOW_WS_ORIGIN']='localhost:8888'

In [None]:
show(modify_doc)

ERROR:bokeh.server.views.ws:Refusing websocket connection from Origin 'vscode-webview://1b26b1l962rofu7eoubbu4j45dvs3ad2sn2u36577coorbhop2li';                       use --allow-websocket-origin=1b26b1l962rofu7eoubbu4j45dvs3ad2sn2u36577coorbhop2li or set BOKEH_ALLOW_WS_ORIGIN=1b26b1l962rofu7eoubbu4j45dvs3ad2sn2u36577coorbhop2li to permit this; currently we allow origins {'localhost:8888'}
ERROR:bokeh.server.views.ws:Refusing websocket connection from Origin 'vscode-webview://1b26b1l962rofu7eoubbu4j45dvs3ad2sn2u36577coorbhop2li';                       use --allow-websocket-origin=1b26b1l962rofu7eoubbu4j45dvs3ad2sn2u36577coorbhop2li or set BOKEH_ALLOW_WS_ORIGIN=1b26b1l962rofu7eoubbu4j45dvs3ad2sn2u36577coorbhop2li to permit this; currently we allow origins {'localhost:8888'}
