# Categorical Data Plots

Now let's discuss using plotly to plot categorical data! There are a few main plot types for this:

* 1) Categorical distribution plots
    * 1. box
    * 2. violin
* 2) Categorical scatter plots
        *  strip
 
* 3) Categorical estimate plots
    * 1. bar
    * 2. histogram

Let's go through examples of each!

In [112]:
# Importing the Plotly Express library for creating visualizations
import plotly.express as px
from textwrap import wrap
import seaborn as sns

In [3]:
# Loading the built-in 'tips' dataset from Plotly
tips=px.data.tips()

## 1) Categorical Distribution Plots
* 1. box
* 2. violin

In this section, we will explore categorical distribution plots like boxplots and violin plots,
which are useful for visualizing the distribution of quantitative data across categorical variables.


### 1. Boxplot

A box plot (or box-and-whisker plot) shows the distribution of quantitative data across categories in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.

In [4]:
# Creating a box plot for 'total_bill' grouped by 'day' of the week
px.box(data_frame=tips,y="total_bill",x="day",title="tips_distribution")

In [5]:
# Box plot for 'total_bill' grouped by 'time' (Lunch or Dinner)
px.box(data_frame=tips,y="total_bill",x="time",title="tips_distribution")

In [6]:
# Box plot for 'total_bill' with 'time' on the y-axis (swapping x and y axes)
px.box(data_frame=tips,x="total_bill",y="time",title="tips_distribution")

In [7]:
# Adding a 'color' dimension for 'day' to differentiate the box plot across days
px.box(data_frame=tips,x="total_bill",y="time",title="tips_distribution",color="day")

#### Customizing the Boxplot Colors

In [8]:
# Changing the colors for the box plot based on 'time' (Lunch or Dinner) using a color map
px.box(data_frame=tips,x="total_bill",
       y="day",
       title="total_bill_distribution by time"
       ,color="time",
       color_discrete_map={"Dinner":"darkred","Lunch":"darkgreen"}
       )



In [9]:
# Creating a box plot for 'total_bill' based on smoker status
px.box(data_frame=tips,x="total_bill",y="smoker",title="tips_distribution by smoker statues")

In [10]:
# Box plot for 'total_bill' based on gender, with color distinction for smoker status
px.box(data_frame=tips,y="total_bill",x="sex",
       color="smoker",
       color_discrete_map={"Yes":"darkred","No":"darkgreen"}
       ,title="tips_distribution by smoker gender")



#### Changing Color Scheme for a Different Variable

In [11]:
# Box plot for 'tip' based on gender, using a custom color sequence
px.box(data_frame=tips,y="tip",x="sex",
       color="sex",
       color_discrete_sequence=["green"]
       ,title="tips_distribution by smoker statues")


### 2. Violin Plot
A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual datapoints, the violin plot features a kernel density estimation of the underlying distribution.


A violin plot provides a kernel density estimate of the distribution of the data,
showing more detailed information about the data distribution compared to a box plot.


In [12]:
# Creating a simple violin plot for 'total_bill' grouped by 'day'
px.violin(data_frame=tips,x="total_bill",y="day")

In [13]:
# Violin plot for 'total_bill' with 'day' on the y-axis (flipping x and y axes)
px.violin(data_frame=tips,y="total_bill",x="day")


In [14]:
# Adding a 'color' dimension for smoker status in the violin plot
px.violin(data_frame=tips,y="total_bill",x="day",
          color="smoker")


In [15]:
# Customizing the color of the violin plot based on smoker status
px.violin(data_frame=tips,y="total_bill",x="day",
          color="smoker",
          color_discrete_map={"Yes":"red" , "No":"green"})

#### Adding Box Plot Inside the Violin Plot

In [16]:
# Adding a box plot inside the violin plot for 'total_bill' grouped by 'day'
px.violin(data_frame=tips,y="total_bill",x="day",
          box=True,
          color="smoker",
          color_discrete_map={"Yes":"red" , "No":"green"})

In [17]:
# Adding a box plot inside the violin plot for 'total_bill' grouped by 'time' (Lunch or Dinner)
px.violin(data_frame=tips,y="total_bill",x="time",
          box=True,
          )

## 2) Categorical Scatter Plots
We will start by exploring scatter plots for categorical data.

### 1. Strip Plot
The strip will draw a scatterplot where one variable is categorical. A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution.

A strip plot shows individual observations along a categorical axis. It can be helpful to show all observations with the underlying distribution.

In [18]:
# ## Imports and Loading Data
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [19]:
# Load the 'tips' dataset from Plotly
tips=px.data.tips()

In [20]:
# Create a strip plot for 'total_bill' based on 'day'
px.strip(data_frame=tips,
         x="total_bill",
         y="day")

In [21]:
# Swap axes for the strip plot (flipping x and y axes)
px.strip(data_frame=tips,
         y="total_bill",
        x="day")

In [22]:
# Add color to differentiate based on smoker status
px.strip(data_frame=tips,
         y="total_bill",
        x="day",
        color="smoker"
        )

In [23]:
# Add color to differentiate based on gender
px.strip(data_frame=tips,
         y="total_bill",
        x="day",
        color="sex"
        )

In [24]:
# Add color based on time (Lunch or Dinner)
px.strip(data_frame=tips,
         y="total_bill",
        x="day",
        color="time"
        )

### Combining Strip Plot with Box and Violin Plots
Strip plots can be combined with box or violin plots for better visual representation.


In [None]:
# Combining box and violin plots

fig1=px.violin(data_frame=tips,
    x="day",
    y="total_bill",
    title="Violin Plot",
)
fig2=px.box(data_frame=tips,
    x="day",
    y="total_bill",
    title="box Plot")

# Merging both figures for a combined plot

# Add box plot traces to the violin plot
for trace in fig2.data:
    fig1.add_trace(trace)

# Show the combined figure
fig1.show()


In [56]:
# Combining strip plot and violin plot
fig1=px.violin(data_frame=tips,
    x="day",
    y="total_bill",
    title="Violin Plot",
)

fig2=px.strip(data_frame=tips,
    x="day",
    y="total_bill",
    title="Violin Plot",)

# Merging both figures for a combined plot
for trace in fig2.data:
    fig1.add_trace(trace)

# Show the combined figure
fig1.show()


In [59]:
# Box plot with all individual points displayed
fig = px.box(
    tips, 
    x="day", 
    y="total_bill", 
    title="Box Plot with Individual Points", 
    points="all"  # Show all individual points
)
fig.show()

## 3) Categorical Estimate Plots

* 1. bar
* 2. countplot

These very similar plots allow you to get aggregate data off a categorical feature in your data.

### 1. Bar Plot
is a general plot that allows you to aggregate the categorical data based off some function, by default the mean:

In [81]:
# Creating a bar plot for 'total_bill' grouped by gender
px.bar(data_frame=tips,
       x="sex",
       y="total_bill",
       color="sex", 
       title="Total Bill Grouped by Gender",  )

In [None]:
# Bar plot for 'total_bill' grouped by time and differentiated by smoker status (relative bar mode)
px.bar(data_frame=tips,
       x="time",
       y="total_bill",
       barmode="relative", 
       color="smoker",
       color_discrete_map={"Yes":"red" , "No":"green"},
       title="Total Bill Grouped by time and smoker",  )

In [82]:
# Bar plot with bars grouped by smoker status
px.bar(data_frame=tips,
       x="sex",
       y="total_bill",
       barmode="group", 
       color="smoker",
       color_discrete_map={"Yes":"red" , "No":"green"},
       title="Total Bill Grouped by gender",  )

In [85]:
# Overlaying bars for smoker status
px.bar(data_frame=tips,
       x="sex",
       y="total_bill",
       barmode="overlay", 
       color="smoker",
       color_discrete_map={"Yes":"red" , "No":"green"},
       title="Total Bill Grouped by time",  )

### 2. Countplot (histogram)

This is essentially the same as bar except the estimator is explicitly counting the number of occurrences. Which is why we only pass the x value:

In [90]:
# Histogram for 'sex'
px.histogram(data_frame=tips,
       x="sex",
       color="sex")

In [91]:
# Histogram with 'sex' on the y-axis (flipping axes)
px.histogram(data_frame=tips,
       y="sex",
       color="sex")

In [92]:
# Histogram for 'smoker' status
px.histogram(data_frame=tips,
       x="smoker",
       color="smoker")

In [100]:
# Histogram for 'day' differentiated by smoker status
px.histogram(data_frame=tips,
       x="day",
       color="smoker",
       )

In [None]:
# Bar plot (similar to a histogram) with categorical 'day' data, colored by smoker status
px.bar(data_frame=tips,
       x="day",
       color="smoker",
       barmode="group"
       )

In [99]:
# Grouped histogram for 'day' by smoker status
px.histogram(data_frame=tips,
       x="day",
       color="smoker",
       barmode="group"
       )

In [101]:
# Normalized histogram (percent) by day and smoker status
px.histogram(data_frame=tips,
       x="day",
       color="smoker",
       barmode="group",
        histnorm="percent",
       )

In [104]:
# Normalized histogram (fraction) by day and smoker status
px.histogram(data_frame=tips,
       x="day",
       color="smoker",
       barmode="group",
        histnorm="probability",
       )

#### Using Different Colorscales
You can use https://plotly.com/python/builtin-colorscales/ to see the available colorscales.

In [118]:
# Custom colorscale with the histogram
# named_colorscales = px.colors.named_colorscales()
# print("\n".join(wrap("".join('{:<12}'.format(c) for c in named_colorscales), 96)))
# print(px.colors.sequential.Plasma)
px.histogram(data_frame=tips,
       x="day",
       color="smoker",
       barmode="group",
        histnorm="probability",
       )

In [119]:
# Creating a grouped and normalized histogram with facet columns based on time
px.histogram(data_frame=tips,
       x="time",
       color="smoker",
       barmode="group",
        histnorm="probability",
       )

## Subplots
Let's create a subplot to compare different categorical plots in one figure.

In [155]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import seaborn as sns
import plotly.express as px

fig = make_subplots(
    rows=2, cols=2, 
    subplot_titles=("Box Plot", "Scatter Plot", "Line Plot", "Box Plot"),
    shared_xaxes=True, shared_yaxes=True
)
# Adding a box plot to the first subplot
box_plot = go.Box(
    y=tips["total_bill"], 
    name="Box Plot"
)
fig.add_trace(box_plot, row=1, col=1)
          
# Adding a violin plot to the second subplot

violin_plot=go.Violin(
        x=tips["day"], 
        y=tips["total_bill"], 
        name="violin plot"
)

fig.add_trace(violin_plot, row=1, col=2)

# Adding a strip plot to the third subplot

strip_plot=go.Scatter(
        x=tips["day"], 
        y=tips["total_bill"], 
            mode="markers", 
        name="strip plot"
)

fig.add_trace(strip_plot, row=2, col=1)

# Adding a scatter plot to the fourth subplot
scatter_plot=go.Scatter(
        x=tips["day"], 
        y=tips["total_bill"], 
            mode="lines", 
        name="strip plot"
)

fig.add_trace(scatter_plot, row=2, col=2)

# Updating layout and displaying the figure

fig.update_layout(
    title="4 Different Types of Plots in a Subplot",
    showlegend=True,  # Disable legend for simplicity
)

fig.show()


# Great Work!