<a href="https://colab.research.google.com/github/MazRadwan/data_science/blob/main/1_Introduction_to_Bokeh.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python and Data Visualization 1 - Introduction to Bokeh

**Goal:** The goal of this project is to construct a basic bar/column plot in Python using Bokeh.

**Description:** You are the manager of three schools: Elementary, Middle, and High. Each school has grade levels in the range 1-12, and each grade level has recorded Male, Female, and Total enrollment. **You want to build a bar/column chart to compare enrolment by grade.** We will use the following tools:
 - *Pandas:* Pandas allows us to pull data into our application, manipulate data within our application, and export it out of our application
 - *Bokeh:* Bokeh allows us to create interactive graphs and visualizations in Python

## Data

Our data is stored in *CSV* (*Comma Separated Values*) files. Take a look at `ClassData.csv`. Each *cell* is separated from others by commas and line numbers. We use Pandas to import the data into Python.

In [41]:
import pandas as pd                 # Tell Python we will be using the Pandas set of tools, and nickname it to pd so we can type it quicker
df = pd.read_csv("ClassData.csv")   # Create a DataFrame, call it df, and set its value to the content of our CSV data
df.index += 1                       # Tells the DataFrame to start index labels at 1

Pandas also allows us to work with data in *DataFrames*. A DataFrame is simply a 2-D table, with *columns* and *rows*.

In [40]:
print(df)                           # Display our dataframe

        School  Grade  Male  Female  Total
1   Elementary      1    16      15     31
2   Elementary      2    12      15     27
3   Elementary      3    10      18     28
4   Elementary      4    17      13     30
5   Elementary      5    15      15     30
6       Middle      6    11      12     23
7       Middle      7    14      12     26
8       Middle      8    15      11     26
9         High      9    13      14     27
10        High     10    12      16     28
11        High     11    16      14     30
12        High     12    14      14     28


In our DataFrame, we have 5 columns: School, Grade, Male, Female, and Total. We have 12 rows, each given a label 1-12. These numbers on the leftmost side are known as the *index* and are not considered a column. Each index label allows us to uniquely identify each row in our data.

## Basic Bar/Column Plot

**Our task is to build a bar/column chart to compare enrollment by grade**

### Step 1: Bokeh Setup

Bokeh lets us tranform data into beautiful visualizations. We need to tell Python we are using it (`from` and `import`), and where the plots should be displayed (`output_notebook`).

In [None]:
from bokeh.plotting import figure, show    # Tells Python we will use figure and show from Bokeh
from bokeh.io import output_notebook       # Tells Python we will need the output_notebook function
from bokeh.models import ColumnDataSource  # We will need this when preparing our data for a bar/column plot

output_notebook()                          # Tells Python to present Bokeh plots in the notebook

In [6]:
from bokeh.plotting import figure,show
from bokeh.io import output_notebook
from bokeh.models import ColumnDataSource

output_notebook()


### Step 2: Select Data

We need to determine x-axis (horizontal) and y-axis (vertical) values. We want a bar chart to compare enrollment by grade. Each grade will have its own bar, and the height of that bar will represent the total enrollment. Therefore, our x-axis is Grade, and our y-axis is Total (enrollment).

In [7]:
# Prepare data , determine what goes on x axis and y axis
# x-axis = grade , y-axis = Total enrollment

grades = (df['Grade']).apply(str)
totals = df['Total']
source = ColumnDataSource(data=dict(grades=grades, totals=totals))

In [8]:
grades = (df['Grade']).apply(str)                                  # X-axis is the Grade column; we convert it to a string so that it can be read easily by Bokeh
totals = df['Total']                                               # Y-axis is the Total column
source = {'grades': grades, 'totals' : totals}
# source = ColumnDataSource(data=dict(grades=grades, totals=totals)) # We combine our grade and total columns in a structure (dictionary) which Bokeh will understand

### Step 3: Plot Data

A visual graphic in Bokeh is known as a *figure*. We use `figure` to create and initialize key properties using the `=` (assignment) operator. The code below sets the following properties:
 - `title`: The title of our visualization
 - `x_range`: The labels for each column on the x-axis
 - `y_range`: The upper and lower bounds for the y-axis
 - `x_axis_label`: The title of the x-axis
 - `y_axis_label`: The title of the y-axis
 - `height`: Height of the visualization
 - `width`: Width of the visualization

NOTE: ```plot_height``` and ```plot_width``` were the arguments to define dimension previously, however they are now **deprecated**.

In [9]:
# create a figure

visual = figure(title="Total Enrollment by Grade", x_range=grades, y_range=(0,40),
                x_axis_label = "Grade", y_axis_label= "Total Enrollment"
                height=300, width=800 )

In [None]:
visual = figure(title="Total Enrollment by Grade", x_range=grades, y_range=(0,40),
                x_axis_label = "Grade", y_axis_label = "Total Enrollment",
                height=300, width=800)

Now, we can add our columns to the empty figure using `vbar`. We must specify the following properties:
 - `x`: Specifies the x coordinates of the centers of the bars (x-axis data)
 - `top`: Specifies the top point of each bar (y-axis data)
 - `width`: Specifies the width of each bar
 - `legend_field`: Specifies the values that should be used for the legend
 - `source`: Specifies the source of the data (the dictionary created earlier)

In [10]:
# veritical bar chart, pass in the x-axis , top is the y axis (the key in the data source,
# the tops of the bars -- vbar just adds the data to the figure , it doesn't show it

visual.vbar(x='grades', top='totals', width=0.7, legend_field='grades', source=source)

In [11]:
# make sure you run output_notebook
show(visual)

In [48]:
visual.vbar(x='grades', top='totals', width=0.7, legend_field='grades', source=source)

Finally, we can add customizations to clean up our visualization.

In [16]:
#customize the plot - add a legend

visual.xgrid.grid_line_color = None
visual.legend.orientation = "horizontal"
visual.legend.location = "top_center"

show(visual)

ERROR:bokeh.core.validation.check:E-1006 (NON_MATCHING_DATA_SOURCES_ON_LEGEND_ITEM_RENDERERS): LegendItem.label is a field, but renderer data sources don't match: LegendItem(id='p1054', ...)


In [39]:
visual.xgrid.grid_line_color = None            # Sets vertical gridlines to transparent, useful for column plot
visual.legend.orientation = "horizontal"       # Tells Python to use a horizontal legend
visual.legend.location = "top_center"          # Tells Python to put the legend at the top and center of the plot

**Importantly, we must call `show(name_of_figure)` in order for it to display**

In [43]:
show(visual)

ERROR:bokeh.core.validation.check:E-1006 (NON_MATCHING_DATA_SOURCES_ON_LEGEND_ITEM_RENDERERS): LegendItem.label is a field, but renderer data sources don't match: LegendItem(id='p1054', ...)


Bokeh visualizations are interactive! Experiment with the tools at the side to explore your visualization in greater detail.

### Step 4 (Optional): Colorize the Visualization

In order to add a different color for each bar, we need to create an array of colors that is the same length as the number of bars (12). You can find programs to generate this array automatically, at websites like this http://vrl.cs.brown.edu/color.

In [19]:
# import factor_cmap to add color , colors are in hexidecimal

from bokeh.transform import factor_cmap
colors = ['#256676', '#5cdac5', '#277a35', '#70de63',
          '#333a9e', '#e057e1', '#d0bcfe', '#6a7fd2',
          '#99ceeb', '#6a10a6', '#991c64', '#573f56']


ERROR:bokeh.core.validation.check:E-1006 (NON_MATCHING_DATA_SOURCES_ON_LEGEND_ITEM_RENDERERS): LegendItem.label is a field, but renderer data sources don't match: LegendItem(id='p1054', ...)


In [49]:
from bokeh.transform import factor_cmap                # We need to import the factor_cmap tool because we will use it to color the bars

colors = ['#256676', '#5cdac5', '#277a35', '#70de63',
          '#333a9e', '#e057e1', '#d0bcfe', '#6a7fd2',
          '#99ceeb', '#6a10a6', '#991c64', '#573f56']

We then use the exact same code as before, except with a minor addition on line 6: `fill_color = factor_cmap('grades', palette=colors, factors=grades)`. This tells Python to assign a different color from `colors` to each bar in the plot.

In [55]:
visual_colored = figure(title="Total Enrollment by Grade", x_range=grades, y_range=(0,40),
                       x_axis_label = "Grade", y_axis_label = "Total Enrollment",
                       height=300, width=800)

cmap = factor_cmap('grades', palette=colors, factors=grades)
visual_colored.vbar(
    x='grades',
    top='totals',
    source=source,
    width=0.7,
    legend_field='grades',
    fill_color=cmap
)

visual_colored.xgrid.grid_line_color = None
visual_colored.legend.orientation = 'horizontal'
visual_colored.legend.location = 'top_center'

show(visual_colored)







Finally, we call `show` to display our plot as before

In [46]:
show(visual_colored) # Make sure to call show() on your visualization for it to display!

In [66]:
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource
from bokeh.transform import factor_cmap

# Prepare data
grades = df['Grade'].apply(str).tolist()  # Convert grades to strings for x-axis
female = df['Female'].apply(str).tolist()  # Convert female counts to strings (if categorical)
#totals = df['Total'].tolist()  # Total enrollment numbers

source = ColumnDataSource(data=dict(grades=grades, female=female))

# Create the figure
visual_colored = figure(
    title="Total Female Enrollment by Grade",
    x_range=grades,  # Grades as the x-axis
    y_range=(0, max(totals) + 10),  # Adjust y-range to fit the data
    x_axis_label="Grade",
    y_axis_label="Total Female Enrollment",
    height=300,
    width=800
)

# Create the color map
colors = ["#c9d9d3", "#718dbf", "#e84d60", "#ddb7b1",'#6a7fd2',
          '#99ceeb', '#6a10a6', '#991c64', '#573f56','#277a35', '#70de63',
          '#333a9e', '#e057e1', '#d0bcfe',]  # Define color palette
cmap = factor_cmap('grades', palette=colors, factors=grades)

# Add vertical bars for female enrollment
visual_colored.vbar(
    x='grades',  # Grades on the x-axis
    top='female',  # Total female enrollment on the y-axis
    source=source,
    width=0.7,
    legend_field='grades',  # Use grades for the legend
    fill_color=cmap
)

# Customize grid and legend
visual_colored.xgrid.grid_line_color = None
visual_colored.legend.orientation = 'horizontal'
visual_colored.legend.location = 'top_center'

# Show the plot
show(visual_colored)


## Exercise

To test your understanding, try creating a similar graph to the one above, except plotting Total Female Enrollment by Grade instead. Try using different colors, and experiment with the size, bounds, and column width.

For more information, check out the documentation at: https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html.