# Basics of Bokeh

Bokeh is a library for creating interactive data visualizations in a web browser. It offers a concise, human-readable syntax, which allows for rapidly presenting data in an aesthetically pleasing manner. If you’ve worked with visualization in Python before, it’s likely that you have used matplotlib. It’s worth briefly mentioning how Bokeh differs from matplotlib, and when one might be preferred to the other.

Matplotlib has existed since 2002 and has long been a standard of Python data visualization. Bokeh emerged in 2013. This difference in age means that Matplotlib matured long before Bokeh was released; however, in a short period of time, Bokeh has reached a high level of maturity.

The intended uses of matplotlib and Bokeh are quite different. Matplotlib creates static graphics that are useful for quick and simple visualizations, or for creating publication quality images. Bokeh creates visualizations for display on the web (whether locally or embedded in a webpage) and most importantly, the visualizations are meant to be highly interactive. Matplotlib does not offer either of these features.

If would you like to visually interact with your data in an exploratory manner or you would like to distribute interactive visual data to a web audience, Bokeh is the library for you! If your main interest is producing finalized visualizations for publication, matplotlib may be better, although Bokeh does offer a way to create static graphics.

With this differences in mind, as we work through the lesson, I’ll emphasize the interactive aspects that make Bokeh useful for exploring and disseminating historical data and that set it apart from other libraries like matplotlib.

In [1]:
# Bokeh Libraries
import pandas as pd
import bokeh
import pandas_bokeh
from bokeh.io import output_notebook, show
from bokeh.plotting import figure, output_file, show

In [2]:
df = pd.read_csv('/Users/anusha/Downloads/diamonds.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,1,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,2,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,3,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,4,0.29,Premium,I,VS2,62.4,58.0,334,4.2,4.23,2.63
4,5,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75


In [3]:
df['volume'] = round((df['x'] * df['y'] * df['z']),1)
df = df.drop(columns=['Unnamed: 0', 'x', 'y', 'z'])


In [4]:
df.head()

Unnamed: 0,carat,cut,color,clarity,depth,table,price,volume
0,0.23,Ideal,E,SI2,61.5,55.0,326,38.2
1,0.21,Premium,E,SI1,59.8,61.0,326,34.5
2,0.23,Good,E,VS1,56.9,65.0,327,38.1
3,0.29,Premium,I,VS2,62.4,58.0,334,46.7
4,0.31,Good,J,SI2,63.3,58.0,335,51.9


In [5]:
pd.set_option('plotting.backend', 'pandas_bokeh')
pandas_bokeh.output_notebook()

In [6]:
df.plot_bokeh(kind = 'scatter', x = 'carat', y = 'price')

In [7]:
df.head().plot_bokeh.bar(x = 'cut', y = 'price', title = 'Cut Vs Price')

In [8]:
test_plot = figure()
test_plot.circle(df['table'], df['price'], line_width=4, line_color="navy")

test_plot.square(df['depth'], df['price'], line_width=4, line_color="red")
output_notebook()

In [9]:
show(test_plot)

In [10]:
test_plot = figure(width=600, height=600)
test_plot.circle(df['price'], df['table'], line_width=4, line_color="red")
test_plot.circle(df['price'], df['depth'], line_width=4, line_color="navy")
output_notebook()

In [11]:
show(test_plot)

In [12]:
df = df.head(50)

In [13]:
test_plot = figure(width=600, height=600)
test_plot.asterisk(df['price'], df['table'], line_width=4, line_color="red")
test_plot.diamond(df['price'], df['depth'], line_width=4, line_color="navy")
output_notebook()

In [14]:
show(test_plot)

In [15]:
plot = figure(plot_width=300, plot_height=300)
plot.annulus(x=df['volume'], y=df['price'], color="#7FC97F",
             inner_radius=0.2, outer_radius=0.5)

show(plot)


In [16]:
test_plot = figure(width=500, height=500)
test_plot.asterisk(df['price'], df['table'], line_width=4, size=20, color="#F0027F")
test_plot.diamond(df['price'], df['depth'], line_width=4, size=20, color="#1C9099")
output_notebook()

In [17]:
show(test_plot)

In [18]:
test_plot = figure(width=600, height=600)
test_plot.circle(df['price'], df['table'], line_width=4, size=20, color="#F0027F")
output_notebook()

In [19]:
show(test_plot)

In [20]:
#Connecting to redis

In [21]:
import os
import pandas as pd
import pyarrow as pa
import redis
r = redis.Redis(host='127.0.0.1', 
                port = 6379,
                decode_responses=True)

In [22]:
r

Redis<ConnectionPool<Connection<host=127.0.0.1,port=6379,db=0>>>

In [23]:
pool = redis.ConnectionPool(host='localhost', port=6379, db=0)
r = redis.Redis(connection_pool=pool)

In [24]:
#storing data into Redis

In [25]:
def storeInRedis(alias, df):
    df_compressed = pa.serialize(df).to_buffer().to_pybytes()
    res = r.set(alias,df_compressed)
    if res == True:
        print(f'{alias} cached')
storeInRedis('data', df)

data cached


In [26]:
#getting data from redis

In [27]:
def loadFromRedis(alias):
    df = r.get(alias)
    try:
        return pa.deserialize(df)
    except:
        print("No data")

In [28]:
storeInRedis('diamond_dataset', df)

diamond_dataset cached


In [29]:
loadFromRedis('diamond_dataset')

Unnamed: 0,carat,cut,color,clarity,depth,table,price,volume
0,0.23,Ideal,E,SI2,61.5,55.0,326,38.2
1,0.21,Premium,E,SI1,59.8,61.0,326,34.5
2,0.23,Good,E,VS1,56.9,65.0,327,38.1
3,0.29,Premium,I,VS2,62.4,58.0,334,46.7
4,0.31,Good,J,SI2,63.3,58.0,335,51.9
5,0.24,Very Good,J,VVS2,62.8,57.0,336,38.7
6,0.24,Very Good,I,VVS1,62.3,57.0,336,38.8
7,0.26,Very Good,H,SI1,61.9,55.0,337,42.3
8,0.22,Fair,E,VS2,65.1,61.0,337,36.4
9,0.23,Very Good,H,VS1,59.4,61.0,338,38.7
