In [620]:
#BOKEH LIBRARY FOR MAKING DASHBOARD.

import pandas as pd
from bokeh.models import HoverTool
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource
from bokeh.layouts import row,gridplot,column
from bokeh.palettes import Category20
from bokeh.transform import cumsum
from bokeh.models.widgets import Div
from bokeh.palettes import Spectral6






# **BOKEH LIBRARY EXPLANATION:**

bokeh.models.HoverTool: A Bokeh model for creating interactive tooltips. It allows you to define tooltips that show additional information when hovering over specific data points in your plots.

bokeh.plotting.figure: The main plotting object in Bokeh. It represents a figure or plot where you can add various glyphs (geometric shapes, such as bars, lines, or patches) to visualize your data.

bokeh.models.ColumnDataSource: A fundamental data structure in Bokeh that holds your data and makes it available to be used in plots. It allows for easy linking between data and visual attributes.

The ColumnDataSource is a fundamental data structure of Bokeh. It's essentially a way to store the data that will be used to generate your plots.
In Bokeh, most (if not all) data that is used to plot glyphs is stored in ColumnDataSource objects. They behave very much like a dictionary in which each key-value pair corresponds to a column in the underlying data.One of the powerful features of the ColumnDataSource is that it allows you to use the same source of data across multiple plots and widgets. It also facilitates interactivity, such as hover tooltips and selection tools.Moreover, if you update the data in a ColumnDataSource, all the plots and widgets that use it will automatically update, which is useful for creating interactive dashboards or apps with Bokeh.

bokeh.layouts.row, bokeh.layouts.gridplot, bokeh.layouts.column: Functions for arranging multiple plots or widgets in a grid or in a vertical or horizontal layout. These layout functions help you structure and organize your dashboard.

bokeh.palettes.Category20, bokeh.palettes.Spectral6: Pre-defined color palettes in Bokeh. These palettes provide a range of colors that can be used to differentiate data points in your plots.

bokeh.transform.cumsum: A utility function in Bokeh for cumulative sum transformations. It is often used in conjunction with pie or donut charts to calculate the start and end angles of each segment.

bokeh.models.widgets.Div: A Bokeh widget for displaying text or HTML content. It can be used to add titles, subtitles, or any other textual content to your dashboard.

# **Introduction to Bokeh**


Yes, Bokeh is indeed a powerful data visualization library in Python. It provides versatile and interactive visualization tools that are particularly useful for web-based dashboards and applications.

Key features of Bokeh include:

Interactivity: Bokeh allows users to create plots that can zoom, pan, select, and more, providing a dynamic user experience.

Versatility: Bokeh is able to produce a wide range of visualizations, from simple scatter plots and line graphs to complex, interactive plots.

High-level and low-level interfaces: Bokeh offers both a high-level interface for creating common types of plots quickly, as well as a low-level interface for creating complex, custom visualizations.

Integration with web technologies: Bokeh plots can be embedded in HTML websites and can interact with other web technologies, like JavaScript and CSS. This makes it particularly suitable for building data visualization into web applications.

Streaming and real-time data: Bokeh supports streaming and real-time data, which is useful for building dashboards that need to display data updates in real-time.

Large datasets: Bokeh can handle large and streaming datasets efficiently.

Like any library or tool, whether Bokeh is the "right" tool depends on the specifics of the project, including the requirements of the data, the necessary interactivity, the need for web integration, and the skill set of the team. Other popular data visualization libraries in Python include Matplotlib and Seaborn for static visualizations, and Plotly and Dash

The Bokeh figure plotting methods like vbar, circle, line, etc., can accept data directly in several forms like Python lists, NumPy arrays, Pandas series, or DataFrame columns. This is why your code works even without explicitly converting your DataFrame to a ColumnDataSource.

However, using a ColumnDataSource does have several advantages:

Sharing data between glyphs: If you use the same ColumnDataSource across multiple glyphs, or even across multiple plots, any changes to the object are immediately reflected everywhere the ColumnDataSource is used.

Linked selection: If you use the same ColumnDataSource across multiple plots, then selections of points on one plot will also be highlighted in the others.

Expressive Hover Tooltips: ColumnDataSource allows you to utilize any of your columns in hover tooltips in an easy and flexible way.
Server Applications: When building an application using the Bokeh server, the ColumnDataSource can be efficiently updated in response to user interactions or other events, to cause just the necessary parts of the plot to update.

Even if you don't use these features in your current plot, it's generally a good idea to use ColumnDataSource in Bokeh, as it's the standard way to pass data into a plot, and it provides the most flexibility.

However, for simple, standalone plots where the above advantages are not needed, using pandas DataFrame directly as in your example can be simpler and more straightforward.

# **Dictionary:**

**Length1: vertical length in cm**

**Length2: diagonal length in cm**

**Length3: cross length in cm**



In [621]:
#import dataset
df = pd.read_csv('Fish.csv')
df




Unnamed: 0,Species,Weight,Length1,Length2,Length3,Height,Width
0,Bream,242.0,23.2,25.4,30.0,11.5200,4.0200
1,Bream,290.0,24.0,26.3,31.2,12.4800,4.3056
2,Bream,340.0,23.9,26.5,31.1,12.3778,4.6961
3,Bream,363.0,26.3,29.0,33.5,12.7300,4.4555
4,Bream,430.0,26.5,29.0,34.0,12.4440,5.1340
...,...,...,...,...,...,...,...
154,Smelt,12.2,11.5,12.2,13.4,2.0904,1.3936
155,Smelt,13.4,11.7,12.4,13.5,2.4300,1.2690
156,Smelt,12.2,12.1,13.0,13.8,2.2770,1.2558
157,Smelt,19.7,13.2,14.3,15.2,2.8728,2.0672


# **OVERVIEW DATA**

In [622]:
df.isna().sum()

Species    0
Weight     0
Length1    0
Length2    0
Length3    0
Height     0
Width      0
dtype: int64

In [623]:
display(df['Species'].value_counts())
display(df['Weight'].value_counts())
display(df['Length1'].value_counts())
display(df['Length2'].value_counts())
display(df['Length3'].value_counts())
display(df['Height'].value_counts())
display(df['Width'].value_counts())

Species
Perch        56
Bream        35
Roach        20
Pike         17
Smelt        14
Parkki       11
Whitefish     6
Name: count, dtype: int64

Weight
300.0     6
1000.0    5
500.0     5
120.0     5
700.0     5
         ..
60.0      1
55.0      1
800.0     1
306.0     1
19.9      1
Name: count, Length: 101, dtype: int64

Length1
19.0    6
20.0    5
22.0    4
20.5    4
25.4    3
       ..
33.7    1
25.6    1
24.1    1
22.1    1
13.2    1
Name: count, Length: 116, dtype: int64

Length2
22.0    7
35.0    6
22.5    5
40.0    5
21.0    4
       ..
19.6    1
21.3    1
22.7    1
24.6    1
14.3    1
Name: count, Length: 93, dtype: int64

Length3
23.5    5
25.0    3
22.5    3
34.0    3
45.5    3
       ..
27.9    1
26.8    1
26.7    1
27.2    1
15.2    1
Name: count, Length: 124, dtype: int64

Height
11.1366    2
5.6925     2
2.2139     2
6.1100     2
9.6000     2
          ..
8.8768     1
8.5680     1
9.4850     1
8.3804     1
2.9322     1
Name: count, Length: 154, dtype: int64

Width
3.5250    3
1.1484    2
4.3350    2
4.1440    2
6.1440    2
         ..
3.9060    1
4.4968    1
4.7736    1
5.3550    1
1.8792    1
Name: count, Length: 152, dtype: int64

In [624]:
df.describe()

Unnamed: 0,Weight,Length1,Length2,Length3,Height,Width
count,159.0,159.0,159.0,159.0,159.0,159.0
mean,398.326415,26.24717,28.415723,31.227044,8.970994,4.417486
std,357.978317,9.996441,10.716328,11.610246,4.286208,1.685804
min,0.0,7.5,8.4,8.8,1.7284,1.0476
25%,120.0,19.05,21.0,23.15,5.9448,3.38565
50%,273.0,25.2,27.3,29.4,7.786,4.2485
75%,650.0,32.7,35.5,39.65,12.3659,5.5845
max,1650.0,59.0,63.4,68.0,18.957,8.142


In [625]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 159 entries, 0 to 158
Data columns (total 7 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Species  159 non-null    object 
 1   Weight   159 non-null    float64
 2   Length1  159 non-null    float64
 3   Length2  159 non-null    float64
 4   Length3  159 non-null    float64
 5   Height   159 non-null    float64
 6   Width    159 non-null    float64
dtypes: float64(6), object(1)
memory usage: 8.8+ KB


In [626]:
average_weight = df.groupby('Species')['Weight'].mean()

average_height = df.groupby('Species')['Height'].mean()

average_width = df.groupby('Species')['Width'].mean()



# **BARCHART OF AVG_WEIGHT BY SPECIES**

In [627]:

# Convert the series to a DataFrame
df_avg_weight = average_weight.reset_index()

df_avg_weight = df_avg_weight.sort_values('Weight', ascending=False)
# Create a ColumnDataSource from df_avg_weight
source = ColumnDataSource(df_avg_weight)
source

hover_tool = HoverTool(tooltips=[
    ("Species", "@Species"),
    ("Average Weight", "@Weight")
])

# Initialize Bokeh plot
Avg_weight_species= figure(x_range=df_avg_weight['Species'].values, height=350, title="Average Weights by Species", toolbar_location=None, tools=[hover_tool]
)

Avg_weight_species.vbar(x='Species', top='Weight', width=0.9, source=source, color='#F7E6C4')

Avg_weight_species.xgrid.grid_line_color = None
Avg_weight_species.y_range.start = 0
Avg_weight_species.y_range.end = df_avg_weight['Weight'].max() + 5 # Add some padding to the y range

Avg_weight_species.xaxis.axis_label = "Species"
Avg_weight_species.yaxis.axis_label = "Weight (cm)"

# Center align the title
Avg_weight_species.title.align = "center"




# **PIE CHART OF SPECIES**

In [628]:
pie = df['Species'].value_counts()
data1 = pd.DataFrame({'Species': pie.index, 'value': pie.values})
data1['angle'] = data1['value']/data1['value'].sum() * 2*np.pi
data1['color'] = Category20[len(pie)]

# Create a new Bokeh figure
val_counts= figure(height=350, title="Fish Species Distribution", toolbar_location=None,
           tools="hover", tooltips="@Species: @value", x_range=(-0.5, 1.0))

# Add a wedge glyph to the figure
val_counts.wedge(x=0, y=1, radius=0.35,
        start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
        line_color="white", fill_color='color', legend_field='Species', source=data1)

# Modify some properties of the figure
val_counts.axis.axis_label=None
val_counts.axis.visible=False
val_counts.grid.grid_line_color = None

# Center align the title
val_counts.title.align = "center"





# **BARCHART OF AVG_HEIGHT BY SPECIES**

In [629]:
# Convert the series to a DataFrame
df_avg_height = average_height.reset_index()

df_avg_height = df_avg_height.sort_values('Height', ascending=False)
# Create a ColumnDataSource from df_avg_weight
source1 = ColumnDataSource(df_avg_height)
source1

hover_tool = HoverTool(tooltips=[
    ("Species", "@Species"),
    ("Average Height", "@Height")
])
# Initialize Bokeh plot
Avg_height_species= figure(x_range=df_avg_height['Species'].values, height=350, title="Average Height by Species", toolbar_location=None, tools=[hover_tool])

Avg_height_species.vbar(x='Species', top='Height', width=0.9, source=source1, color='#99DBF5')

Avg_height_species.xgrid.grid_line_color = None
Avg_height_species.y_range.start = 0
Avg_height_species.y_range.end = df_avg_height['Height'].max() + 5 # Add some padding to the y range

Avg_height_species.xaxis.axis_label = "Species"
Avg_height_species.yaxis.axis_label = "Height (cm)"


# Center align the title
Avg_height_species.title.align = "center"



# **BARCHART OF AVG_WIDTH BY SPECIES**

In [630]:
# Convert the series to a DataFrame
df_avg_width = average_width.reset_index()

# Sort the DataFrame by the 'Width' column in descending order
df_avg_width = df_avg_width.sort_values('Width', ascending=False)

# Create a ColumnDataSource from df_avg_width
source2 = ColumnDataSource(df_avg_width)

# Initialize HoverTool
hover_tool = HoverTool(tooltips=[
    ("Species", "@Species"),
    ("Average Width", "@Width")
])

# Initialize Bokeh plot and add HoverTool
Avg_width_species = figure(x_range=df_avg_width['Species'].values, height=350, 
                           title="Average Width by Species", toolbar_location=None, 
                           tools=[hover_tool])

# Add vertical bars to the figure with color '#A0C49D'
Avg_width_species.vbar(x='Species', top='Width', width=0.9, source=source2, color='#A0C49D')

Avg_width_species.xgrid.grid_line_color = None
Avg_width_species.y_range.start = 0
Avg_width_species.y_range.end = df_avg_width['Width'].max() + 5  # Add some padding to the y range

# Add axis labels
Avg_width_species.xaxis.axis_label = "Species"
Avg_width_species.yaxis.axis_label = "Width (cm)"

# Center align the title
Avg_width_species.title.align = "center"





# **COMBINE THIS INTO A DASHBOARD**

In [631]:
# Create a title for the dashboard
title = Div(text='<h1>Fish Supermarket Analysis</h1>',align = 'center',width=800)


# Add the title to your layout
dashboard = column(title, gridplot([[Avg_height_species, Avg_width_species], [Avg_weight_species, val_counts]]))

# Show the dashboard
show(dashboard)

