---
layout: post
title: Blog Post 0
---

Prompt: Write a tutorial explaining how to construct an interesting data visualization of the Palmer Penguins data set.

## Data preparation
First, let's retrieve and clean up the data a little.

In [1]:
import pandas as pd

# read the data from url
url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/palmer_penguins.csv"
penguins = pd.read_csv(url)

# Modify columns that we use
penguins = penguins.dropna(subset = ["Body Mass (g)", "Sex"])
penguins["Species"] = penguins["Species"].str.split().str.get(0)
penguins = penguins[penguins["Sex"] != "."]

# Drop columns that we are not using
cols = ["Species", "Island", "Sex", "Flipper Length (mm)", "Body Mass (g)", "Delta 15 N (o/oo)", "Delta 13 C (o/oo)"]
penguins = penguins[cols]

# Drop NaN values
penguins = penguins.dropna()

Then, let's take a look at the simplified data set.

In [2]:
penguins.head()

Unnamed: 0,Species,Island,Sex,Flipper Length (mm),Body Mass (g),Delta 15 N (o/oo),Delta 13 C (o/oo)
1,Adelie,Torgersen,FEMALE,186.0,3800.0,8.94956,-24.69454
2,Adelie,Torgersen,FEMALE,195.0,3250.0,8.36821,-25.33302
4,Adelie,Torgersen,FEMALE,193.0,3450.0,8.76651,-25.32426
5,Adelie,Torgersen,MALE,190.0,3650.0,8.66496,-25.29805
6,Adelie,Torgersen,FEMALE,181.0,3625.0,9.18718,-25.21799


In this data set, each row corresponds to an individual penguin. The penguin's species, island of encounter, and sex are recorded as qualitative variables. There are also measurements of the penguin's culmen (bill), as well as its flipper length, body mass, and elements in its blood(Delta).

## Creating plots using Plotly

Let's create an interactive data graphics with Plotly. We are only going to using the Plotly Express module, which allows us to create several of the most important kinds of plots using convenient, high-level functions. We will also import plotly.io to control over the plot appearance through themes

In [3]:
from plotly import express as px
import plotly.io as pio

First, we make a basic scatter plot of "Delta 15 N (o/oo)" and "Delta 13 C (o/oo)" for different penguin species.

In [None]:
fig = px.scatter(data_frame = penguins,  # data set
                 x = "Delta 15 N (o/oo)", # column for x axis
                 y = "Delta 13 C (o/oo)", # column for y axis
                 color = "Species", # column for dot color
                 width = 500, # width of figure
                 height = 300)  # height of figure

# reduce whitespace
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
# show the plot
fig.show()

We could adjust opacity by setting a specific `opacity`. Let's also create two subplots placed row by row, each of which displays penguin sex using the `facet_row` argument. We also change the theme to `plotly_white` to have a white background. In order for the user to reveal more information about a data point by moving their mouse cursor over the point and having a hover label appears, we use `hover_name` and `hover_data` to specify those pieces of information. Then, let's add some marginal boxplots for the statistically inclined.

In [None]:
# Change the theme
pio.templates.default = "plotly_white"


fig = px.scatter(data_frame = penguins, # data set
                 x = "Delta 15 N (o/oo)",  # column for x axis
                 y = "Delta 13 C (o/oo)", # column for y axis
                 color = "Species", # column for dot color
                 hover_name = "Species", # Name of the hover label
                 hover_data = ["Island", "Sex"],  # extra columns contained in hover label
                 size = "Body Mass (g)", # column for dot size
                 size_max = 8, # max dot size
                 width = 600, # width of figure
                 height = 400, # height of figure
                 opacity = 0.5, # adjust opacity
                 facet_row = "Sex", # column for column facets
                 marginal_y = "box") # marginal plot type

# reduce whitespace
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
# show the plot
fig.show()

Congratulations! Now you have successfully constructed an interactive data visualization using plotly.