## What is Plotly?

Plotly is an open-source graphing library for creating interactive, publication-quality graphs. It offers:

- **Interactivity:** Easily zoom, pan, and hover to inspect details.
- **Ease of Use:** High-level Plotly Express API for quick plotting, and lower-level Graph Objects for fine control.
- **Web-Ready Visualizations:** Graphs that render in web browsers, ideal for dashboards and shared reports.
- **Integration:** Works well with Jupyter Notebooks, Python scripts, and web frameworks like Dash.





Topics:

- **Introduction to Plotly** – What it is and why it's so powerful.
- **Installation and Setup** – How to install Plotly.
- **Basic Examples with Plotly Express** – Scatter plots, line charts, histograms, and pie charts.
- **Advanced Visualizations** – Box and violin plots, scatterplot matrices (SPLOMs), contour plots, 3D scatter plots, and sunburst charts.
- **Interactive Features and Exporting** – Zooming, panning, hover tooltips, animations, and exporting as HTML.

Let's dive in!

## Installation and Setup

To install Plotly, run one of the following commands in your terminal:

```bash
pip install plotly
```

or using conda:

```bash
conda install -c plotly plotly
```

After installation, you can import Plotly in your Python scripts or Jupyter Notebook.

## Basic Plotly Express Examples

We'll start with some basic visualizations using Plotly Express (imported as `px`).

In [1]:
# Import Plotly Express
import plotly.express as px

# Scatter Plot Example using the Iris dataset
df_iris = px.data.iris()

fig_scatter = px.scatter(
    df_iris, 
    x='sepal_width', 
    y='sepal_length',
    size='petal_length',
    color='species',  
    title='Iris Dataset: Sepal Dimensions by Species', 
    labels={'sepal_width': 'Sepal Width (cm)', 'sepal_length': 'Sepal Length (cm)'}
)

fig_scatter.show()



### Line Chart Example

Line charts are great for visualizing trends over time. In the example below, we use the Gapminder dataset to plot life expectancy over the years for a specific country.

In [2]:
# Line Chart Example using Gapminder data
df_gapminder = px.data.gapminder().query("country=='Canada'")

fig_line = px.line(
    df_gapminder,
    x='year',
    y='lifeExp',
    title='Life Expectancy in Canada Over the Years',
    labels={'lifeExp': 'Life Expectancy', 'year': 'Year'}
)

fig_line.show()

### Histogram Example

Histograms help you understand the distribution of your data. Below is an example of a histogram using the Iris dataset.

In [14]:
# Histogram Example using the Iris dataset
fig_hist = px.histogram(
    df_iris,
    x='sepal_length',
    color='species',
    title='Distribution of Sepal Length by Species',
    labels={'sepal_length': 'Sepal Length (cm)'}
)

fig_hist.show()

### Pie Chart Example

Pie charts are useful for showing proportions. Here's an example using the Iris dataset to show the proportion of each species.

In [15]:
# Pie Chart Example using the Iris dataset
fig_pie = px.pie(
    df_iris,
    names='species',
    title='Proportion of Iris Species'
)

fig_pie.show()

## Visualizations with Box and Violin Plots

Box and violin plots are effective for visualizing data distributions. The examples below use the Iris dataset.

### Box plots:
They provide a visual summary of the distribution of a dataset. They show key statistical measures such as the median, quartiles, and range. This makes it easy to understand the spread, central tendency, and variability of the data.

In [4]:
# Box Plot Example
fig_box = px.box(
    df_iris, 
    x='species', 
    y='sepal_length', 
    color='species',
    title='Box Plot of Sepal Length by Species'
    
)
fig_box.show()

### What is a Violin Plot?
A violin plot is a combination of a box plot and a kernel density estimate (KDE):

The box plot inside the violin shows the basic statistical measures, such as the median, quartiles, and outliers.
The violin shape itself is a density plot. It shows the distribution of the data.  A wider part of the violin indicates a higher density of points at that value.

In [6]:
# Violin Plot Example
fig_violin = px.violin(
    df_iris, 
    x='species', 
    y='sepal_length', 
    color='species',
    box=True,  # draw box plot inside the violin plot
    points='all',  # show all points
    title='Violin Plot of Sepal Length by Species'
)
fig_violin.show()

## Scatterplot Matrix (SPLOM)

A scatterplot matrix displays pairwise relationships among multiple variables. Below is an example using the Iris dataset.

In [5]:
# Create a scatterplot matrix (SPLOM)
fig_splom = px.scatter_matrix(
    df_iris,
    dimensions=['sepal_length', 'sepal_width', 'petal_length', 'petal_width'],
    color='species',
    title='Scatterplot Matrix for Iris Dataset'
)
fig_splom.update_layout(width=1000, height=1000)
fig_splom.show()

## Contour Plots

Contour plots are useful for showing density and level curves. We'll use the Gapminder dataset for this example.

In [17]:
# Use Gapminder data to create a density contour plot
df_gapminder = px.data.gapminder().query("year==2007")

fig_contour = px.density_contour(
    df_gapminder, 
    x='gdpPercap', 
    y='lifeExp', 
    title='Density Contour Plot: GDP per Capita vs Life Expectancy (2007)',
    labels={'gdpPercap': 'GDP per Capita', 'lifeExp': 'Life Expectancy'}
)

# Add scatter points on top for clarity
fig_contour.add_scatter(
    x=df_gapminder['gdpPercap'],
    y=df_gapminder['lifeExp'],
    mode='markers',
    marker=dict(color='black', size=4),
    name='Data Points'
)


fig_contour.show()

## 3D Scatter Plot

3D scatter plots allow you to visualize data with an extra dimension. We'll create an interactive 3D scatter plot using the Iris dataset.

In [9]:
# 3D Scatter Plot Example using the Iris dataset
fig_3d = px.scatter_3d(
    df_iris,
    x='sepal_length', 
    y='sepal_width', 
    z='petal_length', 
    color='species',
    title='3D Scatter Plot for Iris Dataset'
)

fig_3d.show()

## Sunburst Chart

Sunburst charts are used to visualize hierarchical data. Below is an example using a sample dataset to illustrate how you can display a hierarchy in a circular layout.

In [12]:
# Create a sample DataFrame for a sunburst chart
df_sunburst = px.data.tips()

# Create a Sunburst Chart showing the hierarchy: day -> time -> size
fig_sunburst = px.sunburst(
    df_sunburst,
    path=['day', 'time', 'size'],
    values='total_bill',
    title='Sunburst Chart: Hierarchical Breakdown of Tips Data'
)

fig_sunburst.show()

## Treemap in Plotly
A treemap is a visualization used to represent hierarchical data using nested rectangles. It’s similar to a Sunburst chart, but instead of circles, it uses rectangles. The area of each rectangle represents the value of that category.



In [15]:
df_treemap = px.data.tips()

# Create a Treemap Chart showing the hierarchy: day -> time -> size
fig_treemap = px.treemap(
    df_treemap,
    path=['day', 'time', 'size'],  # Defines the hierarchy
    values='total_bill',  # Determines rectangle size
    title='Treemap: Hierarchical Breakdown of Total Bills'
)

fig_treemap.show()

## Interactive Features and Exporting

Plotly makes your plots interactive by default. You can zoom, pan, and hover over data points for more details. In addition, you can export your figures as HTML files that preserve interactivity:

```python
fig.write_html("my_interactive_plot.html")
```

This generates an HTML file that can be opened in any web browser.

## Warmup
For this exercise, we'll use plotly express to produce two visualizations that show hierarchical information: Treemaps and Sunburst charts. Please read about these charts at the Data Viz Catalogue: [Treemap](https://datavizcatalogue.com/methods/treemap.html), [Sunburst chart](https://datavizcatalogue.com/methods/sunburst_diagram.html). 
Please also go through the documentation provided by plotly exprerss and familiarize yourself with the structure of the dataframe from which these charts are produced: [treemap](https://plotly.com/python/treemaps/), [sunburst chart](https://plotly.com/python/sunburst-charts/).

## Data
- Please download the [Corporate Energy Cosumption](https://data.calgary.ca/Environment/Corporate-Energy-Consumption/crbp-innf) dataset from the City of Calgary's open data portal.

## Tasks

- Familiarize yourself with the columns in the dataset and do some cleanup:
    - For this question, we'll work with the year 2023. You may discard the other years.
    - Observe that some of the cunsumption is reported in 'GJ' units while the rest is in 'kWh' units. For comparison, we need to convert everything to the same unit. Convert the `Total Consumption` reported in 'GJ' to 'kWh' by multiplying the relevant rows by 277.78.
- Use plotly express to produce a treemap showing the total consumption for the year 2023 with `Energy Description` at the first (top) level, followed by `Business Unit Desc` at the second level. 
- Use plotly express to produce a sunburst chart showing the total consumption for the year 2023 with `Energy Description` at the first (innermost) level, followed by `Business Unit Desc` at the second level.
- Compare and contrast the two visualizations and comment on the following:
  - If the task involved quickly identifying all the information at a particular level, which chart would be better?
  - Which chart makes more efficient use of screen space real estate?
  - Readability of information.
