# Palmer Penguins

This notebook contains my analysis of the famous [Palmer penguins data set](https://allisonhorst.github.io/palmerpenguins/).

Data were collected between 2007 and 2009, and made available by [Dr. Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php) and [the Palmer Station, Antarctica LTER](https://en.wikipedia.org/wiki/Palmer_Station), a member of [the Long Term Ecological Research Network](https://lternet.edu/).

![Palmer_Station](https://upload.wikimedia.org/wikipedia/commons/thumb/3/36/PalmerFromGlaciar.JPG/320px-PalmerFromGlaciar.JPG)

***
#### What are they like, penguins?

##### Description: 

- [**Gentoo penguin**](https://en.wikipedia.org/wiki/Gentoo_penguin)

![image_gentoo](https://upload.wikimedia.org/wikipedia/commons/thumb/0/00/Brown_Bluff-2016-Tabarin_Peninsula%E2%80%93Gentoo_penguin_%28Pygoscelis_papua%29_03.jpg/320px-Brown_Bluff-2016-Tabarin_Peninsula%E2%80%93Gentoo_penguin_%28Pygoscelis_papua%29_03.jpg)

- [**Adélie penguin**](https://en.wikipedia.org/wiki/Ad%C3%A9lie_penguin)

![image_Adélie](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e3/Hope_Bay-2016-Trinity_Peninsula%E2%80%93Ad%C3%A9lie_penguin_%28Pygoscelis_adeliae%29_04.jpg/346px-Hope_Bay-2016-Trinity_Peninsula%E2%80%93Ad%C3%A9lie_penguin_%28Pygoscelis_adeliae%29_04.jpg)

- [**Chinstrap penguin**](https://en.wikipedia.org/wiki/Chinstrap_penguin)

![image_Chinstrap](https://upload.wikimedia.org/wikipedia/commons/thumb/0/08/South_Shetland-2016-Deception_Island%E2%80%93Chinstrap_penguin_%28Pygoscelis_antarctica%29_04.jpg/320px-South_Shetland-2016-Deception_Island%E2%80%93Chinstrap_penguin_%28Pygoscelis_antarctica%29_04.jpg)

***

### What is the Palmer Penguins data set?

This is a data set of 344 individuals of three species of penguin living on three islands that were monitored, observed and measured from 2007 to 2009.

#### Components of data set:
- species (Adelie, Chinstrap, Gentoo)
- island (Torgersen, Biscoe, Dream)
- bill length in mm
- bill depth in mm
- flipper length in mm
- body mass in g
- sex

***

### Data analysis Python libraries required: 
 - Pandas - to read, manipulate, calculate data, review data types 
 - Numpy - to calculate data
 - Matplotlib.pyplot - to create plots 
 - Seaborn - to visualise rezults of analysis 

In [33]:

import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns 


*** 
### Load the penguins data set and view the first and last five rows

In [34]:
# Load the penguins data set.
df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv") 

In [35]:
# Let's have a look. 
df

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE
...,...,...,...,...,...,...,...
339,Gentoo,Biscoe,,,,,
340,Gentoo,Biscoe,46.8,14.3,215.0,4850.0,FEMALE
341,Gentoo,Biscoe,50.4,15.7,222.0,5750.0,MALE
342,Gentoo,Biscoe,45.2,14.8,212.0,5200.0,FEMALE


In [36]:
# Look at the first row. 
df.iloc[0]

species                 Adelie
island               Torgersen
bill_length_mm            39.1
bill_depth_mm             18.7
flipper_length_mm        181.0
body_mass_g             3750.0
sex                       MALE
Name: 0, dtype: object

In [37]:
# Sex of penguins. 
df["sex"]

0        MALE
1      FEMALE
2      FEMALE
3         NaN
4      FEMALE
        ...  
339       NaN
340    FEMALE
341      MALE
342    FEMALE
343      MALE
Name: sex, Length: 344, dtype: object

In [38]:
# Count the number of penguins of each sex. 
df["sex"].value_counts()

sex
MALE      168
FEMALE    165
Name: count, dtype: int64

In [39]:
# Describe the data set.
df.describe()

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
count,342.0,342.0,342.0,342.0
mean,43.92193,17.15117,200.915205,4201.754386
std,5.459584,1.974793,14.061714,801.954536
min,32.1,13.1,172.0,2700.0
25%,39.225,15.6,190.0,3550.0
50%,44.45,17.3,197.0,4050.0
75%,48.5,18.7,213.0,4750.0
max,59.6,21.5,231.0,6300.0


***
### Overview of types of variables in the data set.

In [40]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 344 entries, 0 to 343
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   species            344 non-null    object 
 1   island             344 non-null    object 
 2   bill_length_mm     342 non-null    float64
 3   bill_depth_mm      342 non-null    float64
 4   flipper_length_mm  342 non-null    float64
 5   body_mass_g        342 non-null    float64
 6   sex                333 non-null    object 
dtypes: float64(4), object(3)
memory usage: 18.9+ KB


##### **Run info() to get technical information about Data set:** 

1. The first outcome confirms that this is DataFrame.
2. There are 344 entries, i.e. 344 rows.
3. Each row has a row index with values ranging from 0 to 343.
4. The table has 7 columns. The columns species, island have a value for each of the rows (all 344 values are non-null). The columns bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g,  sex do have missing values and less than 344 non-null values.
5. The columns species, island, sex contain string type data (object). The other columns contain float type data (float64).
6. dtypes displays the sum of columns for each data type(float64(4), object(3)). 
7. The approximate amount of RAM used to hold the DataFrame is provided as well.



***

### END 