## misc bokeh notes

**Documentation**
- Documentation and sample code for various charts at: https://bokeh.pydata.org/
        
**Time series data**
- *df = pandas.read_csv('fname.csv', **parse_dates=['name']**)* to specify column name containing datetime data when loading dataframe
- *f = figure(**x_axis_type='datetime'**)* to specify that axis contains datetime data when creating figure


## misc pandas notes

**Tips to help size up new data:**
- *df = pandas.load_csv('fname.csv')* to load data into dataframe
- *df.ftypes* to list column names, data types, and whether columns lightly/densely populated
- *len(df)* to get total number of rows of data (not including column names)
- *df.head(n)* to see values in first n rows
- *df.tail(n)* to see values in last n rows (includes row index...so also shows number of rows)

**getting columns of data:**
- *df['name']* using column name
- *df.iloc[:,index]* using column index

**operate on columns rather than rows & values:**
- e.g. if *x is df['x']* and *y is df['y']*, then *z = zip(x,y)* will zip values of x & y for all rows in dataframe 

## line chart with .csv data

In [1]:
from bokeh.plotting import figure
from bokeh.io import output_file, show
import pandas

df_bach = pandas.read_csv('bachelors.csv', parse_dates=['Year'])
df_bach.ftypes


Year                             datetime64[ns]:dense
Agriculture                             float64:dense
Architecture                            float64:dense
Art and Performance                     float64:dense
Biology                                 float64:dense
Business                                float64:dense
Communications and Journalism           float64:dense
Computer Science                        float64:dense
Education                               float64:dense
Engineering                             float64:dense
English                                 float64:dense
Foreign Languages                       float64:dense
Health Professions                      float64:dense
Math and Statistics                     float64:dense
Physical Sciences                       float64:dense
Psychology                              float64:dense
Public Administration                   float64:dense
Social Sciences and History             float64:dense
dtype: object

In [180]:
len(df_bach)


42

In [181]:
df_bach.head(3)


Unnamed: 0,Year,Agriculture,Architecture,Art and Performance,Biology,Business,Communications and Journalism,Computer Science,Education,Engineering,English,Foreign Languages,Health Professions,Math and Statistics,Physical Sciences,Psychology,Public Administration,Social Sciences and History
0,1970-01-01,4.229798,11.921005,59.7,29.088363,9.064439,35.3,13.6,74.535328,0.8,65.570923,73.8,77.1,38.0,13.8,44.4,68.4,36.8
1,1971-01-01,5.452797,12.003106,59.9,29.394403,9.503187,35.5,13.6,74.149204,1.0,64.556485,73.9,75.5,39.0,14.9,46.2,65.5,36.2
2,1972-01-01,7.42071,13.214594,60.4,29.810221,10.558962,36.6,14.9,73.55452,1.2,63.664263,74.6,76.9,40.2,14.8,47.6,62.6,36.1


In [2]:
f = figure(width=1000, x_axis_type="datetime", x_axis_label='Year', #sizing_mode="scale_width",
           height=500, y_axis_label='Engineering Degrees',
           title = 'Engineering Degree Over Time')

f.line('Year', 'Engineering', source=df_bach)

output_file("line.html")
show(f)


## scatter plot with .xlsx data

In [3]:
from bokeh.plotting import figure
from bokeh.io import output_file, show
import pandas

df_xlsx = pandas.read_excel('verlegenhuken.xlsx')
df_xlsx.ftypes


Year           int64:dense
Month          int64:dense
Day            int64:dense
Hour           int64:dense
Temperature    int64:dense
Pressure       int64:dense
dtype: object

In [4]:
len(df_xlsx)

33102

In [5]:
df_xlsx.head(3)


Unnamed: 0,Year,Month,Day,Hour,Temperature,Pressure
0,2010,10,6,3,-30,9984
1,2010,10,6,9,-28,9992
2,2010,10,6,10,-24,9987


In [6]:
f = figure(width=1000, x_axis_label='Temperature (C)', #sizing_mode="scale_width",
           height=500, y_axis_label='Pressure (hPa)',
           title = 'Temperature and Air Pressure')

x = df_xlsx['Temperature'] / 10
y = df_xlsx['Pressure'] / 10

f.circle(x, y, size=1)

output_file("scatter.html")
show(f)


## time series QCOM stock data (.csv from Yahoo Finance)


In [25]:
from bokeh.io import output_file, show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, HoverTool

import pandas

df_qcom = pandas.read_csv('qcom.csv', parse_dates=['Date'])
df_qcom.ftypes


Date         datetime64[ns]:dense
Open                float64:dense
High                float64:dense
Low                 float64:dense
Close               float64:dense
Adj Close           float64:dense
Volume                int64:dense
dtype: object

In [26]:
len(df_qcom)

1259

In [27]:
df_qcom.head(3)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2014-02-18,76.25,76.25,75.529999,75.599998,63.435242,8850100
1,2014-02-19,75.370003,75.989998,75.110001,75.769997,63.577892,8527600
2,2014-02-20,75.959999,76.199997,75.68,75.949997,63.728931,7194200


In [29]:
df_qcom.max()['Close']


81.599998

In [74]:
#used for tooltip values
cds = ColumnDataSource(df_qcom)

f = figure(width=1200, x_axis_type="datetime", x_axis_label='Date', #sizing_mode="scale_width",
           height=500, y_axis_label='Closing Price ($)',
           title = 'QCOM Closing Price Over Time')

f.line(x='Date', y='Close', line_width=1, line_color='grey', source=cds)
f.circle(x='Date', y='Close', size=3, color='blue', source=cds)

#min-max bounding lines
f.line(x='Date', y=df_qcom.max()['Close'], line_width=1, line_color='green', source=cds)
f.line(x='Date', y=df_qcom.min()['Close'], line_width=1, line_color='red', source=cds)

#f.line(x='Date', y='RelativeVol', source=cds)
#f.vbar(x='Date', top='RelativeVol', width=5, source=cds)

#f.y_range.start = 0
f.xaxis.major_label_orientation = 1.2
f.xgrid.visible = False
f.ygrid.visible = True

#tooltip info
mean_volume = int(df_qcom['Volume'].mean())
df_qcom['RelativeVol'] = 100 - (df_qcom['Volume'] - mean_volume) / mean_volume * 100
df_qcom['DateStr'] = df_qcom['Date'].dt.strftime('%Y-%m-%d')

hoover = HoverTool(tooltips = [('Date','@DateStr'),
                               ('Close','$@Close'),
                               ('Range','$@Low - $@High'),
                               ('Volume','@Volume (@RelativeVol%)')])
f.add_tools(hoover)

output_file("qcom.html")
show(f)
