This notebook will ask you to do some simple data manipulation using pandas on an example dataset.

The dataset used is provided in the first cell and is a subset of one known as the [Palmer penguins dataset](https://github.com/allisonhorst/palmerpenguins), containing collected data on penguins found on a number of islands in Antarctica. The labels of the dataframe below, accessible via a variable called `df`, are identifiers of each penguin observed.

In [1]:
import numpy as np
import pandas as pd
data = {'species': {4: 'Adelie',
  118: 'Adelie',
  160: 'Chinstrap',
  163: 'Chinstrap',
  177: 'Chinstrap',
  37: 'Adelie',
  291: 'Gentoo',
  256: 'Gentoo',
  147: 'Adelie',
  179: 'Chinstrap',
  38: 'Adelie',
  207: 'Chinstrap',
  257: 'Gentoo',
  3: 'Adelie',
  159: 'Chinstrap',
  115: 'Adelie',
  24: 'Adelie',
  171: 'Chinstrap',
  7: 'Adelie',
  28: 'Adelie'},
 'island': {4: 'Torgersen',
  118: 'Torgersen',
  160: 'Dream',
  163: 'Dream',
  177: 'Dream',
  37: 'Dream',
  291: 'Biscoe',
  256: 'Biscoe',
  147: 'Dream',
  179: 'Dream',
  38: 'Dream',
  207: 'Dream',
  257: 'Biscoe',
  3: 'Torgersen',
  159: 'Dream',
  115: 'Biscoe',
  24: 'Biscoe',
  171: 'Dream',
  7: 'Torgersen',
  28: 'Biscoe'},
 'bill_length_mm': {4: 36.7,
  118: 35.7,
  160: 46.0,
  163: 51.7,
  177: 52.0,
  37: 42.2,
  291: 46.4,
  256: 42.6,
  147: 36.6,
  179: 49.5,
  38: 37.6,
  207: 52.2,
  257: 44.4,
  3: np.nan,
  159: 51.3,
  115: 42.7,
  24: 38.8,
  171: 49.2,
  7: 39.2,
  28: 37.9},
 'bill_depth_mm': {4: 19.3,
  118: 17.0,
  160: 18.9,
  163: 20.3,
  177: 19.0,
  37: 18.5,
  291: 15.6,
  256: 13.7,
  147: 18.4,
  179: 19.0,
  38: 19.3,
  207: 18.8,
  257: 17.3,
  3: np.nan,
  159: 18.2,
  115: 18.3,
  24: 17.2,
  171: 18.2,
  7: 19.6,
  28: 18.6},
 'flipper_length_mm': {4: 193.0,
  118: 189.0,
  160: 195.0,
  163: 194.0,
  177: 197.0,
  37: 180.0,
  291: 221.0,
  256: 213.0,
  147: 184.0,
  179: 200.0,
  38: 181.0,
  207: 197.0,
  257: 219.0,
  3: np.nan,
  159: 197.0,
  115: 196.0,
  24: 180.0,
  171: 195.0,
  7: 195.0,
  28: 172.0},
 'body_mass_g': {4: 3450.0,
  118: 3350.0,
  160: 4150.0,
  163: 3775.0,
  177: 4150.0,
  37: 3550.0,
  291: 5000.0,
  256: 4950.0,
  147: 3475.0,
  179: 3800.0,
  38: 3300.0,
  207: 3450.0,
  257: 5250.0,
  3: np.nan,
  159: 3750.0,
  115: 4075.0,
  24: 3800.0,
  171: 4400.0,
  7: 4675.0,
  28: 3150.0},
 'sex': {4: 'Female',
  118: 'Female',
  160: 'Female',
  163: 'Male',
  177: 'Male',
  37: 'Female',
  291: 'Male',
  256: 'Female',
  147: 'Female',
  179: 'Male',
  38: 'Female',
  207: 'Male',
  257: 'Male',
  3: np.nan,
  159: 'Male',
  115: 'Male',
  24: 'Male',
  171: 'Male',
  7: 'Male',
  28: 'Female'}}
df = pd.DataFrame(data)

Answer the below questions about the sample dataset, then see if you can do so for the full dataset, which you'll have to retrieve from the above link:

* How many penguins are there in total of each species in the dataframe?

* What row(s) in the dataframe contain the penguin with the longest bill?

* What are all of the penguins whose body mass is less than 4000 grams?

* What are all of the penguins whose body mass is in the top quartile?

* What is the minimum flipper length for all penguins living on Torgersen island?

* Do penguins of the Adelie species have a mean flipper length that is longer or shorter than ones of the Chinstrap species?

In [12]:
df.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female
118,Adelie,Torgersen,35.7,17.0,189.0,3350.0,Female
160,Chinstrap,Dream,46.0,18.9,195.0,4150.0,Female
163,Chinstrap,Dream,51.7,20.3,194.0,3775.0,Male
177,Chinstrap,Dream,52.0,19.0,197.0,4150.0,Male


In [14]:
# How many penguins are there in total of each species in the dataframe?

df.groupby(by='species').count().iloc[:,0]

species
Adelie       10
Chinstrap     7
Gentoo        3
Name: island, dtype: int64

In [19]:
# What row(s) in the dataframe contain the penguin with the longest bill?

df.loc[df['bill_length_mm'] == df['bill_length_mm'].max()]

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
207,Chinstrap,Dream,52.2,18.8,197.0,3450.0,Male


In [20]:
# What are all of the penguins whose body mass is less than 4000 grams?

df[df['body_mass_g'] < 4000]

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female
118,Adelie,Torgersen,35.7,17.0,189.0,3350.0,Female
163,Chinstrap,Dream,51.7,20.3,194.0,3775.0,Male
37,Adelie,Dream,42.2,18.5,180.0,3550.0,Female
147,Adelie,Dream,36.6,18.4,184.0,3475.0,Female
179,Chinstrap,Dream,49.5,19.0,200.0,3800.0,Male
38,Adelie,Dream,37.6,19.3,181.0,3300.0,Female
207,Chinstrap,Dream,52.2,18.8,197.0,3450.0,Male
159,Chinstrap,Dream,51.3,18.2,197.0,3750.0,Male
24,Adelie,Biscoe,38.8,17.2,180.0,3800.0,Male


In [23]:
# What are all of the penguins whose body mass is in the top quartile?

df[df['body_mass_g']>=df['body_mass_g'].quantile(q=0.75)]

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
291,Gentoo,Biscoe,46.4,15.6,221.0,5000.0,Male
256,Gentoo,Biscoe,42.6,13.7,213.0,4950.0,Female
257,Gentoo,Biscoe,44.4,17.3,219.0,5250.0,Male
171,Chinstrap,Dream,49.2,18.2,195.0,4400.0,Male
7,Adelie,Torgersen,39.2,19.6,195.0,4675.0,Male


In [25]:
# What is the minimum flipper length for all penguins living on Torgersen island?

df[df['island'] == 'Torgersen']['flipper_length_mm'].min()

189.0

In [26]:
# Do penguins of the Adelie species have a mean flipper length that is longer or shorter than ones of the Chinstrap species?

df[df['species'] == 'Adelie']['flipper_length_mm'].mean() > df[df['species'] == 'Chinstrap']['flipper_length_mm'].mean()

False

#### Now Using the Full Data Set

In [35]:
from palmerpenguins import load_penguins

In [36]:
penguins = load_penguins()
penguins.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007
3,Adelie,Torgersen,,,,,,2007
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007


Q. How many penguins are there in total of each species in the dataframe?

In [37]:
penguins.groupby(by='species').count().iloc[:,0]

species
Adelie       152
Chinstrap     68
Gentoo       124
Name: island, dtype: int64

Q. What row(s) in the dataframe contain the penguin with the longest bill?

In [40]:
penguins.loc[penguins['bill_length_mm'] == penguins['bill_length_mm'].max()]

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
185,Gentoo,Biscoe,59.6,17.0,230.0,6050.0,male,2007


Q. What are all of the penguins whose body mass is less than 4000 grams?

In [41]:
penguins[penguins['body_mass_g'] < 4000]

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007
5,Adelie,Torgersen,39.3,20.6,190.0,3650.0,male,2007
...,...,...,...,...,...,...,...,...
337,Chinstrap,Dream,46.8,16.5,189.0,3650.0,female,2009
338,Chinstrap,Dream,45.7,17.0,195.0,3650.0,female,2009
340,Chinstrap,Dream,43.5,18.1,202.0,3400.0,female,2009
341,Chinstrap,Dream,49.6,18.2,193.0,3775.0,male,2009


Q. What are all of the penguins whose body mass is in the top quartile?

In [43]:
penguins[penguins['body_mass_g']>=penguins['body_mass_g'].quantile(q=0.75)]

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
109,Adelie,Biscoe,43.2,19.0,197.0,4775.0,male,2009
153,Gentoo,Biscoe,50.0,16.3,230.0,5700.0,male,2007
155,Gentoo,Biscoe,50.0,15.2,218.0,5700.0,male,2007
156,Gentoo,Biscoe,47.6,14.5,215.0,5400.0,male,2007
158,Gentoo,Biscoe,45.4,14.6,211.0,4800.0,female,2007
...,...,...,...,...,...,...,...,...
272,Gentoo,Biscoe,46.8,14.3,215.0,4850.0,female,2009
273,Gentoo,Biscoe,50.4,15.7,222.0,5750.0,male,2009
274,Gentoo,Biscoe,45.2,14.8,212.0,5200.0,female,2009
275,Gentoo,Biscoe,49.9,16.1,213.0,5400.0,male,2009


Q. What is the minimum flipper length for all penguins living on Torgersen island?

In [44]:
penguins[penguins['island'] == 'Torgersen']['flipper_length_mm'].min()

176.0

Q. Do penguins of the Adelie species have a mean flipper length that is longer or shorter than ones of the Chinstrap species?

In [46]:
penguins[penguins['species'] == 'Adelie']['flipper_length_mm'].mean() > penguins[penguins['species'] == 'Chinstrap']['flipper_length_mm'].mean()

False