# Palmer Penguins 

![Penguins](https://raw.githubusercontent.com/UWASIKLK/pda-mywork/main/picture%20of%20penguins.png)

The data set is available [on GitHub](https://allisonhorst.github.io/palmerpenguins/)

---

## Introduction
***

The Palmer Penguins dataset contains various measurements of three different penguin species (Adelie, Gentoo and Chinstrap) and it commonly used for data analysis as an alternative to the Iris dataset.

These data were collected from 2007 - 2009 by [Dr. Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php) 
with the [Palmer Station](https://pallter.marine.rutgers.edu/), a member of the [Long Term Ecological Research Network](https://lternet.edu/) on three islands. 

## Investigation
***


The dataset has details of 344 penguins from three species of penguin: _Adélie (152 penguins)_, _Chinstrap (68 penguins)_ and _Gentoo (124 penguins)_.

The dataset contains following variables:
 - **species:**nbsp; penguins species (_Adélie_, _Chinstrap_ or _Gentoo_)
 - **island:**&nbsp; island name (_Dream_, _Torgersen_ or _Biscoe_)
 - **bill_length_mm:**&nbsp; bill (culmen) length in millimetres
 - **bill_depth_mm:**&nbsp; depth length in millimetres
 - **flipper_length_mm:**&nbsp; flipper length in millimetres
- **body_mass_g:**&nbsp; weights in grams
- **sex:**&nbsp; the gender of penguin

![Penguins details]()

### Import libraries

***

For the data analysis purpose of this dataset, I would recommend downloading following python libraries:
-	Numpy – this library is used for working with arrays and processing mathematical operations.
-	Pandas – it is used to store and manipulate structured data.
-	Matplotib.pyplot – this is used to created and customise plots.
-	Seaborn – this is another tool for creating attractive data visualisation as it has beautiful default style and colour palettes.


In [2]:
# Importing libraries:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Data download

***

The Palmer Penguins dataset was downloaded from [GitHub](https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv) 

In [11]:
penguins = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")

The data is now loaded.

## Data overview

***

First, I check the data to get an overview of what's in the dataset.

In [12]:
# View of the data set. 
# It will dispay firs 5 and last 5 rows only amd cpnfirm totals of rows and columns.
penguins

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE
...,...,...,...,...,...,...,...
339,Gentoo,Biscoe,,,,,
340,Gentoo,Biscoe,46.8,14.3,215.0,4850.0,FEMALE
341,Gentoo,Biscoe,50.4,15.7,222.0,5750.0,MALE
342,Gentoo,Biscoe,45.2,14.8,212.0,5200.0,FEMALE


In [13]:
# The code penguins.head(10) will allow me to view first 10 rows from dataset.

penguins.head(10)

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE
5,Adelie,Torgersen,39.3,20.6,190.0,3650.0,MALE
6,Adelie,Torgersen,38.9,17.8,181.0,3625.0,FEMALE
7,Adelie,Torgersen,39.2,19.6,195.0,4675.0,MALE
8,Adelie,Torgersen,34.1,18.1,193.0,3475.0,
9,Adelie,Torgersen,42.0,20.2,190.0,4250.0,


In [14]:
#Similarly for penguins.tail(10) which will show the last 10 rows.
penguins.tail(10)

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
334,Gentoo,Biscoe,46.2,14.1,217.0,4375.0,FEMALE
335,Gentoo,Biscoe,55.1,16.0,230.0,5850.0,MALE
336,Gentoo,Biscoe,44.5,15.7,217.0,4875.0,
337,Gentoo,Biscoe,48.8,16.2,222.0,6000.0,MALE
338,Gentoo,Biscoe,47.2,13.7,214.0,4925.0,FEMALE
339,Gentoo,Biscoe,,,,,
340,Gentoo,Biscoe,46.8,14.3,215.0,4850.0,FEMALE
341,Gentoo,Biscoe,50.4,15.7,222.0,5750.0,MALE
342,Gentoo,Biscoe,45.2,14.8,212.0,5200.0,FEMALE
343,Gentoo,Biscoe,49.9,16.1,213.0,5400.0,MALE


In [15]:
#The df.info will provide informations like names of each colum, data type, etc

penguins.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 344 entries, 0 to 343
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   species            344 non-null    object 
 1   island             344 non-null    object 
 2   bill_length_mm     342 non-null    float64
 3   bill_depth_mm      342 non-null    float64
 4   flipper_length_mm  342 non-null    float64
 5   body_mass_g        342 non-null    float64
 6   sex                333 non-null    object 
dtypes: float64(4), object(3)
memory usage: 18.9+ KB


In [16]:
#this code will give us a summary of the dataset

# count = count of
# mean = 
penguins.describe()

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
count,342.0,342.0,342.0,342.0
mean,43.92193,17.15117,200.915205,4201.754386
std,5.459584,1.974793,14.061714,801.954536
min,32.1,13.1,172.0,2700.0
25%,39.225,15.6,190.0,3550.0
50%,44.45,17.3,197.0,4050.0
75%,48.5,18.7,213.0,4750.0
max,59.6,21.5,231.0,6300.0


In [20]:
#to find the number of columns and rows we can call penguins.shape 
penguins.shape

(344, 7)

In [24]:
#penguins.isnull()
penguins.iloc[339]

species              Gentoo
island               Biscoe
bill_length_mm          NaN
bill_depth_mm           NaN
flipper_length_mm       NaN
body_mass_g             NaN
sex                     NaN
Name: 339, dtype: object

## Inspect Data

***


In [7]:
# look at the first row.
df.iloc[339]

species                 Adelie
island               Torgersen
bill_length_mm            39.1
bill_depth_mm             18.7
flipper_length_mm        181.0
body_mass_g             3750.0
sex                       MALE
Name: 0, dtype: object

In [9]:
 # Sex of penquins
df['sex']

0        MALE
1      FEMALE
2      FEMALE
3         NaN
4      FEMALE
        ...  
339       NaN
340    FEMALE
341      MALE
342    FEMALE
343      MALE
Name: sex, Length: 344, dtype: object

In [10]:
# Count the number of each sex.
df['sex'].value_counts()

sex
MALE      168
FEMALE    165
Name: count, dtype: int64

In [11]:
# Describe the data set.
df.describe()

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
count,342.0,342.0,342.0,342.0
mean,43.92193,17.15117,200.915205,4201.754386
std,5.459584,1.974793,14.061714,801.954536
min,32.1,13.1,172.0,2700.0
25%,39.225,15.6,190.0,3550.0
50%,44.45,17.3,197.0,4050.0
75%,48.5,18.7,213.0,4750.0
max,59.6,21.5,231.0,6300.0


# This will be big

You *might* have a **paragraph**. Paragraphs have senteces. You might have a paragraph. Paragraphs have senteces. You might have a paragraph. Paragraphs have senteces. You might have a paragraph. Paragraphs have senteces. You might have a paragraph. Paragraphs have senteces. You might have a paragraph. Paragraphs have senteces. You might have a paragraph. Paragraphs have senteces. You might have a paragraph. Paragraphs have senteces.

If you want a second paragraph leave a bank line! If you want a second paragraph leave a bank line! If you want a second paragraph leave a bank line! If you want a second paragraph leave a bank line! If you want a second paragraph leave a bank line! If you want a second paragraph leave a bank line! If you want a second paragraph leave a bank line!

## This will be slightly smaller

 - Bullet
     - Second
     - Level
     - Bullets
 - Point
 - Lists

### This will be slightly smaller again 

1. Numbered bullets.
1. Just use numbers.
1. How complex!

> The definition of stubidity is doing the same thing twice and expecting diferent results! (Albert Einstein,not)




## Tables

***

| Species   | Bill  Length  (mm)| Body Mass (g)|
|-----------|------------------:|-------------:|
|Adelie     |                8.8|          3701|
|Chinstrap  |               48.8|          3733|
|Gentoo     |               47.5|          5076|

## Math

***

$f(x) = x^2$

$\sum_{i=0}^{n-1} i$

$\bar {x} = \frac {\sum_{i=0}^{n-1} x_i} {n}$

***

### End