### I will be demonstrating in a series of steps how to build a raincloud plot! This special plot combines box, scatter, and KDE plots together in a cohesive and interesting manner. 

First, import the necessary python packages.

In [None]:
import pandas as pd
import plotly.graph_objects as go
import numpy as np

Next, import the data. I am using the dataset Palmer Penguins, which is publically available.

In [7]:
data = pd.read_csv("penguins.csv")
data = data.dropna()
data.head()

Unnamed: 0,species,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
0,Chinstrap,46.7,17.9,195.0,3300.0
1,Adelie,35.9,16.6,190.0,3050.0
2,Adelie,36.6,17.8,185.0,3700.0
3,Adelie,37.8,18.3,174.0,3400.0
4,Adelie,35.5,17.5,190.0,3700.0


Now, define the penguin species list, initialize the figure, and choose your colors! It is recommended to use colors with hues different than one another in order to be colorblind friendly and to avoid confusion (Few et al.). Keep in mind that color is usually not necessary. In this case, however, I wanted to tell a small story with the color, where blue represents the arctic where these penguins are found (Data Feminism, D'Ignazio and Klein).

In [13]:
species_list = df['species'].unique()
raincloud_fig = go.Figure()

colors = {
    "Adelie": "#08306B", #all different hues of blue
    "Gentoo": "#2171B5",
    "Chinstrap": "#6BAED6"}

Next, begin adding to your plot. The code will go through the species list previously defines, and add a half violin plot and a box plot. The "jitter" in the code adds to the rain effect in the plot. It spreads out the scatter points to align with the distribution.

In [14]:
for i, species in enumerate(species_list):
    subset = df[df['species'] == species]

    # Half violin (distribution)
    raincloud_fig.add_trace(go.Violin(
        x=[i]*len(subset),
        y=subset['flipper_length_mm'],
        name=species,
        side='positive',
        line_color='black',
        fillcolor=colors[species],
        width=0.6,
        points=False,
        spanmode='hard',
        showlegend=False))

    # Box plot
    raincloud_fig.add_trace(go.Box(
        x=[i]*len(subset),
        y=subset['flipper_length_mm'],
        name=species,
        marker_color='black',
        width=0.2,
        boxpoints=False,
        showlegend=False))
    
    # Jittered scatter plot
    jitter_strength = 0.15
    x_jittered = np.random.uniform(-jitter_strength, jitter_strength, size=len(subset)) + i #adding jitter to points

    raincloud_fig.add_trace(go.Scatter(
        x=x_jittered,
        y=subset['flipper_length_mm'],
        mode='markers',
        marker=dict(color=colors[species], opacity=0.4, size=5), #scatter point aesthetics
        name=species,
        showlegend=False))

Lastly, but very importantly, there is a lot to change **aesthetically**. Basic Plotly graphs look okay, but there are many ways in which they can be improved. For example, setting the background color to white, getting rid of zero axis lines, changing x and y axis titles, adding a grid, and adjusting the font weight (Tufte chapters 5, 6). All of these things combined help make a more cohesive and simple raincloud plot!

In [17]:
species_list = df['species'].unique()
raincloud_fig = go.Figure()

colors = {
    "Adelie": "#08306B", #all different hues of blue
    "Gentoo": "#2171B5",
    "Chinstrap": "#6BAED6" }

for i, species in enumerate(species_list):
    subset = df[df['species'] == species]

    # Half violin (distribution)
    raincloud_fig.add_trace(go.Violin(
        x=[i]*len(subset),
        y=subset['flipper_length_mm'],
        name=species,
        side='positive',
        line_color='black',
        fillcolor=colors[species],
        width=0.6,
        points=False,
        spanmode='hard',
        showlegend=False))

    # Box plot
    raincloud_fig.add_trace(go.Box(
        x=[i]*len(subset),
        y=subset['flipper_length_mm'],
        name=species,
        marker_color='black',
        width=0.2,
        boxpoints=False,
        showlegend=False))

    # Jittered scatter plot
    jitter_strength = 0.15 
    x_jittered = np.random.uniform(-jitter_strength, jitter_strength, size=len(subset)) + i #adding jitter to points

    raincloud_fig.add_trace(go.Scatter(
        x=x_jittered,
        y=subset['flipper_length_mm'],
        mode='markers',
        marker=dict(color=colors[species], opacity=0.4, size=5), #scatter point aesthetics
        name=species,
        showlegend=False))

# Update layout
raincloud_fig.update_layout(
    title="<b>Raincloud Plot of Penguin Flipper Lengths</b>", # plot title, <b> bolds text
    yaxis_title="Flipper Length (mm)",
    xaxis=dict(
        title="<b>Species</b>",
        tickmode='array',
        tickvals=list(range(len(species_list))),
        ticktext=species_list),
    yaxis=dict(
    title="<b>Flipper Length (mm)</b>",
    showgrid=True,
    gridcolor='lightgray', #addign horizontal grid
    zeroline=False
    ),
    violingap=0.3,
    violingroupgap=0.4,
    violinmode='overlay',
    template='simple_white', #white background
    font=dict(size=14))

raincloud_fig.show()


*Thank you for reading!*