Plotly library in Python is an open-source library that can be used for data visualization and understanding data simply and easily. Plotly supports various types of plots like line charts, scatter plots, histograms, box plots, etc. The best visualization library.


#### Primary modules and their roles:

1. Plotly Express (plotly.express)<br>
Use Cases: Ideal for straightforward tasks like creating line plots, bar charts, scatter plots, and more complex visualizations like treemaps and sunburst charts. It automatically handles much of the layout and design, making it user-friendly for those new to data visualization.
2. Graph Objects (plotly.graph_objects)<br>
Use Cases: Suitable for customized visualizations where specific adjustments to elements like legends, axes, grids, and more are required. It's perfect for advanced use cases that need detailed control or when combining multiple plot types.
3. Figure Factory (plotly.figure_factory)<br>
Use Cases: Useful when you need specialized plots that require additional computation before rendering, such as Gantt charts for project management timelines or detailed annotated heatmaps.
4. Subplots (plotly.subplots)<br>
Overview: This module manages the input/output functionality of Plotly, including rendering and exporting figures to various formats.
Use Cases: Critical for saving figures to disk in formats like PNG, JPEG, SVG, or PDF, or converting figures to JSON format for integration with web applications.
5. Offline (plotly.offline)<br>
Use Cases: Allows embedding interactive plots directly in offline environments like Jupyter notebooks or saving them to include in presentations or static web pages.


In [3]:
%pip install -U kaleido

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [4]:
import plotly.express as px
df = px.data.iris()  # Load a sample dataset
fig = px.scatter(df, x='sepal_width', y='sepal_length', color='species')
fig.show()

In [5]:
import plotly.graph_objects as go
fig = go.Figure(data=go.Bar(x=['Apples', 'Bananas', 'Oranges'], y=[400, 300, 500]))
fig.update_layout(title='Fruit Production', xaxis_title='Fruit', yaxis_title='Production')
fig.show()

A dendrogram is a type of diagram that is commonly used to illustrate the arrangement of the clusters produced by hierarchical clustering. Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. The primary goal of a dendrogram is to display these relationships in a tree-like structure that helps interpret the data by showing how individual elements are grouped together in clusters.

Key Features of Dendrograms:
1. Tree Structure: The tree structure of a dendrogram shows how each cluster is composed by branching out into its child nodes, which represent smaller clusters made up of the elements grouped together at each step of the hierarchical clustering process.
2. Height of Splits: The vertical lines represent clusters being split or merged. The height of each split in the tree indicates the relative distance or dissimilarity between separate clusters; taller splits represent greater distances.
3. Leaf Nodes: Each leaf node (at the bottom of the dendrogram) represents an individual data point. The arrangement of leaf nodes gives insight into the data’s underlying structure.

In [6]:
import plotly.figure_factory as ff
import numpy as np

data = np.random.randn(100, 3)
fig = ff.create_dendrogram(data)
fig.update_layout(width=800, height=500)
fig.show()

In [7]:
import plotly.figure_factory as ff
import numpy as np

# Sample data: Random Gaussian distribution data
data = np.random.rand(10,10)
data = np.dot(data, data.T)  # Create a symmetric matrix

# Create a dendrogram
fig = ff.create_dendrogram(data, labels=["A", "B", "C", "D", "E", "F", "G", "H", "I", "J"])
fig.update_layout(width=800, height=500, title_text="Dendrogram of Random Data",
                  xaxis_title="Data Points", yaxis_title="Distance")

# Style enhancements
fig.update_traces(marker=dict(color='royalblue', size=12),
                  line=dict(width=2))
fig.show()


In [8]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

fig = make_subplots(rows=1, cols=2)
fig.add_trace(go.Scatter(x=[1, 2, 3], y=[4, 5, 6]), row=1, col=1)
fig.add_trace(go.Bar(x=[1, 2, 3], y=[6, 5, 4]), row=1, col=2)
fig.show()


In [9]:
# import plotly.express as px
# import plotly.io as pio

# df = px.data.iris()  # Load a sample dataset
# fig = px.scatter(df, x='sepal_width', y='sepal_length')  # Create a scatter plot
# pio.write_image(fig, 'scatter_plot.png')  # Save the figure as a PNG file


In [10]:
from plotly.offline import plot
import plotly.graph_objects as go

fig = go.Figure(data=go.Line(x=[1, 2, 3], y=[3, 1, 6]))
plot(fig)  # This will create a local HTML file and open it in your browser



plotly.graph_objs.Line is deprecated.
Please replace it with one of the following more specific types
  - plotly.graph_objs.scatter.Line
  - plotly.graph_objs.layout.shape.Line
  - etc.




'temp-plot.html'

## Installation

In [11]:
# %pip install plotly

## Package Structure of Plotly

There are three main modules in Plotly.

1. plotly.plotly
2. plotly.graph.objects
3. plotly.tools

1. plotly.plotly acts as the interface between the local machine and Plotly. It contains functions that require a response from Plotly’s server.

2. plotly.graph_objects module contains the objects (Figure, layout, data, and the definition of the plots like scatter plot, line chart) that are responsible for creating the plots.  The Figure can be represented either as dict or instances of plotly.graph_objects.Figure and these are serialized as JSON before it gets passed to plotly.js.

Note: plotly.express module can create the entire Figure at once. It uses the graph_objects internally and returns the graph_objects.Figure instance.

1. Load the dataset.
2. Explore the dataset.
3. Create visualizations using Plotly.


## Example

plot a histogram for this and add the title label x="sepal_length", y="petal_width"

In [12]:
import plotly.express as px 

# using the iris dataset
df = px.data.iris() 

# plotting the histogram
fig = px.histogram(df, x="sepal_length", y="petal_width") 

# showing the plot
fig.show()


A scatter plot is a set of dotted points to represent individual pieces of data in the horizontal and vertical axis. A graph in which the values of two variables are plotted along X-axis and Y-axis, the pattern of the resulting points reveals a correlation between them.

A bubble plot is a scatter plot with bubbles (color-filled circles). Bubbles have various sizes dependent on another variable in the data. It can be created using the scatter() method of plotly.express.

Plot a scatter plot for species and petal_width

In [13]:
 # plotting the scatter chart
fig = px.scatter(df, x="species", y="petal_width") 

# showing the plot
fig.show()


In [14]:
# plotting the bubble chart
fig = px.scatter(df, x="species", y="petal_width", 
				size="petal_length", color="species") 

# showing the plot
fig.show()


 load a sample dataset named tips into a DataFrame called df using Plotly's px.data module.

Plot a violin plot using the day(x) and tota_bill columns(y)

In [15]:
# using the tips dataset
df = px.data.tips() 

# plotting the violin chart
fig = px.violin(df, x="day", y="total_bill")

# showing the plot
fig.show()


In [16]:
df.columns

Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'], dtype='object')

### Step 1: Load the Dataset

In [17]:
import pandas as pd
import numpy as np
import plotly.express as px

# Load the dataset
file_path = 'Titanic Data.csv'
titanic_df = pd.read_csv(file_path)

In [18]:
# Display the first few rows of the dataset
titanic_df.head()

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Survived
0,1,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,0
1,2,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,1
2,3,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,1
3,4,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,1
4,5,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,0


### Step 2: Explore the Dataset

In [19]:
# Display basic information about the dataset
titanic_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Pclass       891 non-null    int64  
 2   Name         891 non-null    object 
 3   Sex          891 non-null    object 
 4   Age          714 non-null    float64
 5   SibSp        891 non-null    int64  
 6   Parch        891 non-null    int64  
 7   Ticket       891 non-null    object 
 8   Fare         891 non-null    float64
 9   Cabin        204 non-null    object 
 10  Embarked     889 non-null    object 
 11  Survived     891 non-null    int64  
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


In [20]:
# Display summary statistics
titanic_df.describe()

Unnamed: 0,PassengerId,Pclass,Age,SibSp,Parch,Fare,Survived
count,891.0,891.0,714.0,891.0,891.0,891.0,891.0
mean,446.0,2.308642,29.699118,0.523008,0.381594,32.204208,0.383838
std,257.353842,0.836071,14.526497,1.102743,0.806057,49.693429,0.486592
min,1.0,1.0,0.42,0.0,0.0,0.0,0.0
25%,223.5,2.0,20.125,0.0,0.0,7.9104,0.0
50%,446.0,3.0,28.0,0.0,0.0,14.4542,0.0
75%,668.5,3.0,38.0,1.0,0.0,31.0,1.0
max,891.0,3.0,80.0,8.0,6.0,512.3292,1.0


### Step 3: Create Visualizations


We'll use Plotly to create various visualizations to uncover insights from the dataset. Here are some examples:

1. **Bar Chart**: Number of passengers by class.
2. **Pie Chart**: Survival rate.
3. **Histogram**: Age distribution of passengers.
4. **Box Plot**: Fare distribution by class.
5. **Scatter Plot**: Age vs. Fare, colored by survival status.

In [21]:
titanic_df.columns

Index(['PassengerId', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch',
       'Ticket', 'Fare', 'Cabin', 'Embarked', 'Survived'],
      dtype='object')

A correlation matrix is a table showing correlation coefficients between variables. Each cell in the table shows the correlation between two variables. A correlation matrix is used to summarize data, as an input into a more advanced analysis, and as a diagnostic for advanced analyses.


In [22]:
# Selecting only the numeric columns for correlation calculation
numeric_cols = titanic_df.select_dtypes(include=[np.number])
correlation_matrix = numeric_cols.corr()

In [23]:
# Creating the heatmap
fig = px.imshow(correlation_matrix,
                text_auto=True,  # Display correlation values on the heatmap
                labels=dict(x="Features", y="Features", color="Correlation"),
                x=correlation_matrix.columns,
                y=correlation_matrix.columns)

# Update layout for better readability
fig.update_layout(title="Correlation Matrix Heatmap",
                  xaxis_title="Features",
                  yaxis_title="Features")

# Show the plot
fig.show()

In [24]:
import plotly.graph_objects as go
fig = go.Figure(data=go.Heatmap(
        z=correlation_matrix.values,
        x=correlation_matrix.columns,
        y=correlation_matrix.index,
        colorscale='Viridis'))

fig.update_layout(title='Heatmap of Correlations',
                  xaxis_title='Features',
                  yaxis_title='Features')

fig.show(renderer='browser')



#### Bar Chart:  Class Distribution among survivors

In [25]:
class_survival_counts = titanic_df.groupby(['Pclass', 'Survived']).size().reset_index(name='Counts')
fig = px.bar(class_survival_counts, x='Pclass', y='Counts', color='Survived',
             title='Class Distribution Among Survivors',
             labels={'Pclass': 'Passenger Class', 'Counts': 'Number of Passengers', 'Survived': 'Survival Status'},
             barmode='group',
             category_orders={'Survived': [0, 1]})
fig.update_layout(xaxis_title='Passenger Class', yaxis_title='Number of Passengers', legend_title='Survived')
#fig.show(renderer='browser')
fig.show()


#### Pie Chart: Survival Rate

In [26]:
# Pie chart of the survival rate
fig = px.pie(titanic_df, names='Survived', title='Survival Rate', 
             labels={'0': 'Did Not Survive', '1': 'Survived'})
fig.show()

In [27]:
# Filter data for survivors only
survived_df = titanic_df[titanic_df['Survived'] == 1]

# Create a pie chart to show the proportion of survivors by gender
fig_pie = px.pie(survived_df, names='Sex', title='Proportion of Survivors by Gender',
                 labels={'Sex': 'Gender'})
#fig_pie.show(renderer='browser')
fig_pie.show()

#### Histogram: Age Distribution

In [28]:
# Histogram of the age distribution of passengers
fig = px.histogram(titanic_df, x='Age', title='Age Distribution of Passengers', nbins=20)
fig.show()

#### Box Plot: Fare Distribution by Class


In [29]:
# Box plot of fare distribution by class
fig = px.box(titanic_df, x='Pclass', y='Fare', title='Fare Distribution by Class')
fig.show()


#### Scatter Plot: Age vs. Fare, Colored by Survival Status

In [30]:
# Scatter plot of Age vs. Fare, colored by survival status
fig = px.scatter(titanic_df, x='Age', y='Fare', color='Survived', 
                 title='Age vs. Fare (Colored by Survival Status)')
fig.show()

In [31]:
# Create a 3D scatter plot to visualize the relationship between Age, Fare, and Passenger Class
fig = px.scatter_3d(titanic_df, x='Age', y='Fare', z='Pclass', color='Survived',
                    title='3D Scatter Plot of Age, Fare, and Passenger Class',
                    labels={'Age': 'Age', 'Fare': 'Fare', 'Pclass': 'Passenger Class', 'Survived': 'Survived'})

fig.update_layout(scene=dict(
    xaxis_title='Age',
    yaxis_title='Fare',
    zaxis_title='Passenger Class'),
    legend_title='Survived')
#fig.show(renderer='browser')  # Use 'browser' renderer for VS Code
fig.show()