# Introduction to Palmer Penguins
## A Vignette About the Data

The palmer penguins dataset comes from a paper about antarctic penguins.
The data was then packaged up into an R package for ease of use as a teaching example,
and later ported to Python. This data is colloquially referred to as the "palmer penguins" dataset. 

Here we use the pandas package to analyze and plotnine to visualize the data. 

# Setup

In [None]:
#| label: setup

import pandas as pd
import plotnine as p9

from plotnine import (aes, coord_flip, facet_wrap, geom_bar, geom_histogram,
                      geom_jitter, geom_point, ggplot, theme_minimal)

# Load Data

In [None]:
penguins = pd.read_csv("data/penguins_clean.csv")
penguins = penguins.dropna()

In [None]:
penguins.head()

In [None]:
penguins.info()

In [None]:
penguins.select_dtypes('object')

# Summarize Data

In [None]:
(
  penguins
  .groupby(['species', 'island'])
  ['species']
  .agg('count')
)

In [None]:
pd.crosstab(
  penguins['species'],
  penguins['island']
)

# Data Visualization

In [None]:
(
  ggplot(penguins, aes(x='island', fill='species')) +
  geom_bar(alpha = 0.8) +
  theme_minimal() +
  facet_wrap('species', ncol = 1) +
  coord_flip()
)

In [None]:
(
  ggplot(penguins, aes(x='sex', fill='species')) +
  geom_bar(alpha = 0.8) +
  theme_minimal() +
  facet_wrap('species', ncol = 1) +
  coord_flip()
)

In [None]:
penguins.filter(regex='_mm', axis='columns')

In [None]:
(
  ggplot(penguins, aes(x="flipper_length_mm", y="body_mass_g")) +
  geom_point(aes(color="species", shape="species"), size=2) 
)

In [None]:
(
  ggplot(penguins, aes(x="flipper_length_mm", y="bill_depth_mm")) +
  geom_point(aes(color="species", shape="species"), size=2)
)

In [None]:
(
  ggplot(penguins, aes(x = "flipper_length_mm", y = "body_mass_g")) +
  geom_point(aes(color = "sex"), size = 2) +
  facet_wrap("species")
)

In [None]:
(
  ggplot(penguins, aes(x="species", y="bill_length_mm")) +
  geom_jitter(aes(color="species"), width=0.1, alpha=0.7)
)

In [None]:
(
  ggplot(penguins, aes(x="flipper_length_mm")) +
  geom_histogram(aes(fill="species"), bins=30, alpha=0.5, position="identity")
)

# Correlations

In [None]:
corr = penguins.select_dtypes('number').corr()

corr.style.background_gradient(cmap='coolwarm').format(precision=2)

# References

* Gorman, Tony D. AND Fraser, Kristen B. AND Williams. 2014. “Ecological Sexual Dimorphism and Environmental Variability Within a Community of Antarctic Penguins (Genus Pygoscelis).” PLOS ONE 9 (3): 1–14. https://doi.org/10.1371/journal.pone.0090081.


* Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020. Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. https://doi.org/10.5281/zenodo.3960218.


* Nakhaee, Muhammad Chenariyan. 2021. Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. https://github.com/mcnakhaee/palmerpenguins.