## Penguins: Joins and Aggregations

In [1]:
import pandas as pd

In [2]:
penguins_adelie = pd.read_csv("../../data/penguins_adelie.csv")
penguins_chinstrap = pd.read_csv("../../data/penguins_chinstrap.csv")
penguins_gentoo = pd.read_csv("../../data/penguins_gentoo.csv")

In [3]:
# Print the number of rows and columns for each DataFrame before concatenation
print("Adelie Penguins: ", penguins_adelie.shape)
print("Chinstrap Penguins: ", penguins_chinstrap.shape)
print("Gentoo Penguins: ", penguins_gentoo.shape)


Adelie Penguins:  (152, 17)
Chinstrap Penguins:  (68, 17)
Gentoo Penguins:  (124, 17)


| **Column Name**   | **Description**                                                                                               |
|-------------------|---------------------------------------------------------------------------------------------------------------|
| **studyName**      | Sampling expedition from which data were collected, generated, etc.                                           |
| **Sample Number**  | Continuous numbering sequence for each sample                                                                |
| **Species**        | A string representing the species of an organism                                                             |
| **Region**         | Nominal region of Palmer LTER sampling grid                                                                  |
| **Island**         | Island near Palmer Station where samples were collected                                                      |
| **Stage**          | Reproductive stage at sampling                                                                               |
| **Individual ID**  | A unique ID for each individual in the dataset                                                               |
| **Clutch Completion** | Whether the study nest was observed with a full clutch (i.e., 2 eggs)                                      |
| **Date Egg**       | Date the study nest was observed with 1 egg (sampled)                                                        |
| **Culmen Length**  | Length of the dorsal ridge of a bird's bill                                                                  |
| **Culmen Depth**   | Depth of the dorsal ridge of a bird's bill                                                                   |
| **Flipper Length** | Length of the flipper                                                                                        |
| **Body Mass**      | Mass of the body                                                                                             |
| **Sex**            | Code for the sex of an animal                                                                                |
| **Delta 15 N**     | A measure of the ratio of stable isotopes 15N:14N                                                            |
| **Delta 13 C**     | A measure of the ratio of stable isotopes 13C:12C                                                            |
| **Comments**       | Text field to provide additional relevant information for the data                                            |


<br>

> ### Methods and protocols used in the collection of this data package
>
> Each season, study nests, where pairs of adults were present, were individually marked and chosen before the onset of egg-laying, and consistently monitored. When study nests were found at the one-egg stage, both adults were captured to obtain blood samples used for molecular sexing and stable isotope analyses, and measurements of structural size and body mass. 
>
> At the time of capture, each adult penguin was quickly blood sampled (~1 ml) from the brachial vein using a sterile 3 ml syringe and heparinized infusion needle. Collected blood was stored in 1.5 ml micro-centrifuge tubes that were kept cool. In the field, a small amount of whole blood was smeared on clean filter paper stored in a 1.5 ml micro-centrifuge tube for molecular sexing. 
>
> Measurements of culmen length and depth (using dial calipers ± 0.1 mm), right flipper (using a ruler ± 1 mm), and body mass (using 5 kg ± 25 g or 10 kg ± 50 g Pesola spring scales and a weigh bag) were obtained to quantify body size variation. After handling, individuals at study nests were further monitored to ensure the pair reached clutch completion, i.e., two eggs. 
>
> Molecular analyses were conducted at Simon Fraser University following standard PCR protocols, and stable isotope analyses were conducted at the Stable Isotope Facility at the University of California, Davis using an elemental analyzer interfaced with an isotope ratio mass spectrometer.
>

---
## Concat all penguin species

In [4]:
# Concatenate the DataFrames along rows
penguins_all = pd.concat(
    [penguins_adelie, penguins_chinstrap, penguins_gentoo])

# Reset the index after concatenation (optional)
penguins_all.reset_index(drop=True, inplace=True)

# Print the number of rows and columns for the concatenated DataFrame
print("All Penguins (after concat): ", penguins_all.shape)

All Penguins (after concat):  (344, 17)


---
## Join location details

In [5]:
# Create a dictionary with the island names and their coordinates
island_data = {
    'Island': ['Torgersen', 'Biscoe', 'Dream'],
    'Latitude': [-64.7667, -66.0000, -64.7333],
    'Longitude': [-64.0667, -66.0000, -64.2333],
    'Location': [
        'Torgersen Island, near Palmer Station on the Antarctic Peninsula, close to Anvers Island',
        'Biscoe Islands, an archipelago off the west coast of the Antarctic Peninsula, south of the Palmer Archipelago',
        'Dream Island, part of the Argentine Islands, located near the western side of the Antarctic Peninsula'
    ]
}

# Convert the dictionary into a pandas DataFrame
penguin_islands_df = pd.DataFrame(island_data)
penguin_islands_df

Unnamed: 0,Island,Latitude,Longitude,Location
0,Torgersen,-64.7667,-64.0667,"Torgersen Island, near Palmer Station on the A..."
1,Biscoe,-66.0,-66.0,"Biscoe Islands, an archipelago off the west co..."
2,Dream,-64.7333,-64.2333,"Dream Island, part of the Argentine Islands, l..."


In [6]:
# Merge the DataFrames on the 'Island' column
penguins_with_location = pd.merge(
    penguins_all, penguin_islands_df, on='Island', how='left')

penguins_with_location.head()

Unnamed: 0,studyName,Sample Number,Species,Region,Island,Stage,Individual ID,Clutch Completion,Date Egg,Culmen Length (mm),Culmen Depth (mm),Flipper Length (mm),Body Mass (g),Sex,Delta 15 N (o/oo),Delta 13 C (o/oo),Comments,Latitude,Longitude,Location
0,PAL0708,1,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N1A1,Yes,2007-11-11,39.1,18.7,181.0,3750.0,MALE,,,Not enough blood for isotopes.,-64.7667,-64.0667,"Torgersen Island, near Palmer Station on the A..."
1,PAL0708,2,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N1A2,Yes,2007-11-11,39.5,17.4,186.0,3800.0,FEMALE,8.94956,-24.69454,,-64.7667,-64.0667,"Torgersen Island, near Palmer Station on the A..."
2,PAL0708,3,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N2A1,Yes,2007-11-16,40.3,18.0,195.0,3250.0,FEMALE,8.36821,-25.33302,,-64.7667,-64.0667,"Torgersen Island, near Palmer Station on the A..."
3,PAL0708,4,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N2A2,Yes,2007-11-16,,,,,,,,Adult not sampled.,-64.7667,-64.0667,"Torgersen Island, near Palmer Station on the A..."
4,PAL0708,5,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N3A1,Yes,2007-11-16,36.7,19.3,193.0,3450.0,FEMALE,8.76651,-25.32426,,-64.7667,-64.0667,"Torgersen Island, near Palmer Station on the A..."


---
## Aggregate

In [7]:
aggregation_by_island_sex = penguins_with_location.groupby(['Island', 'Sex']).agg({
    'Body Mass (g)': ['mean', 'max'],
    'Flipper Length (mm)': ['mean', 'max']
})

aggregation_by_island_sex

Unnamed: 0_level_0,Unnamed: 1_level_0,Body Mass (g),Body Mass (g),Flipper Length (mm),Flipper Length (mm)
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,max,mean,max
Island,Sex,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
Biscoe,.,4875.0,4875.0,217.0,217.0
Biscoe,FEMALE,4319.375,5200.0,205.6875,222.0
Biscoe,MALE,5104.518072,6300.0,213.289157,231.0
Dream,FEMALE,3446.311475,4150.0,190.016393,202.0
Dream,MALE,3987.096774,4800.0,196.306452,212.0
Torgersen,FEMALE,3395.833333,3800.0,188.291667,196.0
Torgersen,MALE,4034.782609,4700.0,194.913043,210.0


In [8]:
aggregation_by_island_sex.loc[('Dream', 'FEMALE'), 'Body Mass (g)']['max']

4150.0

In [9]:
penguins_aggregated_by_island_sex = penguins_with_location.groupby(
    ['Island', 'Sex']).agg({
        'Sample Number': ['count', 'min', 'max'],
        'Culmen Length (mm)': ['mean', 'min', 'max'],
        'Culmen Depth (mm)': ['mean', 'min', 'max'],
        'Flipper Length (mm)': ['mean', 'min', 'max'],
        'Body Mass (g)': ['mean', 'min', 'max'],
        'Delta 15 N (o/oo)': ['mean', 'min', 'max'],
        'Delta 13 C (o/oo)': ['mean', 'min', 'max'],
        'Sex': ['count', 'nunique'],
        'Latitude': ['mean'],
        'Longitude': ['mean']
    })

penguins_aggregated_by_island_sex

Unnamed: 0_level_0,Unnamed: 1_level_0,Sample Number,Sample Number,Sample Number,Culmen Length (mm),Culmen Length (mm),Culmen Length (mm),Culmen Depth (mm),Culmen Depth (mm),Culmen Depth (mm),Flipper Length (mm),...,Delta 15 N (o/oo),Delta 15 N (o/oo),Delta 15 N (o/oo),Delta 13 C (o/oo),Delta 13 C (o/oo),Delta 13 C (o/oo),Sex,Sex,Latitude,Longitude
Unnamed: 0_level_1,Unnamed: 1_level_1,count,min,max,mean,min,max,mean,min,max,mean,...,mean,min,max,mean,min,max,count,nunique,mean,mean
Island,Sex,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
Biscoe,.,1,117,117,44.5,44.5,44.5,15.7,15.7,15.7,217.0,...,8.04111,8.04111,8.04111,-26.18444,-26.18444,-26.18444,1,1,-66.0,-66.0
Biscoe,FEMALE,80,1,123,43.3075,34.5,50.5,15.19125,13.1,20.7,205.6875,...,8.353135,7.6322,9.79532,-26.121022,-27.01854,-24.3613,80,1,-66.0,-66.0
Biscoe,MALE,83,2,124,47.119277,37.6,59.6,16.59759,14.1,21.1,213.289157,...,8.456226,7.76843,9.63954,-26.102628,-26.86127,-25.00169,83,1,-66.0,-66.0
Dream,FEMALE,61,1,151,42.296721,32.1,58.0,17.601639,15.5,19.4,190.016393,...,9.10217,8.01485,9.80589,-25.083819,-26.69543,-23.89017,61,1,-64.7333,-64.2333
Dream,MALE,62,2,152,46.116129,36.3,55.8,19.066129,17.0,21.2,196.306452,...,9.257592,8.39459,10.02544,-25.049476,-26.57941,-23.78767,62,1,-64.7333,-64.2333
Torgersen,FEMALE,24,2,131,37.554167,33.5,41.1,17.55,15.9,19.3,188.291667,...,8.66316,7.69778,9.30722,-25.738735,-26.63085,-23.90309,24,1,-64.7667,-64.0667
Torgersen,MALE,23,1,132,40.586957,34.6,46.0,19.391304,17.6,21.5,194.913043,...,8.919919,8.18658,9.59462,-25.835347,-26.46254,-24.77227,23,1,-64.7667,-64.0667
