# Palmer Penguins

![Palmer] (https://upload.wikimedia.org/wikipedia/commons/thumb/2/24/Antarctica_relief_location_map.jpg/480px-Antarctica_relief_location_map.jpg).

This notebook contains my analysis of the famous Palmer Penguins dataset.

The data set is available [onGitHub] (https://allisonhorst.github.io/palmerpenguins/).

In [16]:
# Libraries required.
import pandas as pd 
import numpy as np
import matplotlib.pyplot






In [17]:
# Load the Penguins Data Set
df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")

In [18]:
# Let's have a look.
df

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE
...,...,...,...,...,...,...,...
339,Gentoo,Biscoe,,,,,
340,Gentoo,Biscoe,46.8,14.3,215.0,4850.0,FEMALE
341,Gentoo,Biscoe,50.4,15.7,222.0,5750.0,MALE
342,Gentoo,Biscoe,45.2,14.8,212.0,5200.0,FEMALE


In [19]:
# Lets look at the first row.
df.iloc[0]

species                 Adelie
island               Torgersen
bill_length_mm            39.1
bill_depth_mm             18.7
flipper_length_mm        181.0
body_mass_g             3750.0
sex                       MALE
Name: 0, dtype: object

With this information we can now see the variables: species, island, bill length, bill depth,
flipper length, body mass and sex.

In [20]:
# First 5 rows
df.head

<bound method NDFrame.head of     species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \
0    Adelie  Torgersen            39.1           18.7              181.0   
1    Adelie  Torgersen            39.5           17.4              186.0   
2    Adelie  Torgersen            40.3           18.0              195.0   
3    Adelie  Torgersen             NaN            NaN                NaN   
4    Adelie  Torgersen            36.7           19.3              193.0   
..      ...        ...             ...            ...                ...   
339  Gentoo     Biscoe             NaN            NaN                NaN   
340  Gentoo     Biscoe            46.8           14.3              215.0   
341  Gentoo     Biscoe            50.4           15.7              222.0   
342  Gentoo     Biscoe            45.2           14.8              212.0   
343  Gentoo     Biscoe            49.9           16.1              213.0   

     body_mass_g     sex  
0         3750.0    MALE  
1  

In [21]:
# Last 5 rows
df.tail

<bound method NDFrame.tail of     species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \
0    Adelie  Torgersen            39.1           18.7              181.0   
1    Adelie  Torgersen            39.5           17.4              186.0   
2    Adelie  Torgersen            40.3           18.0              195.0   
3    Adelie  Torgersen             NaN            NaN                NaN   
4    Adelie  Torgersen            36.7           19.3              193.0   
..      ...        ...             ...            ...                ...   
339  Gentoo     Biscoe             NaN            NaN                NaN   
340  Gentoo     Biscoe            46.8           14.3              215.0   
341  Gentoo     Biscoe            50.4           15.7              222.0   
342  Gentoo     Biscoe            45.2           14.8              212.0   
343  Gentoo     Biscoe            49.9           16.1              213.0   

     body_mass_g     sex  
0         3750.0    MALE  
1  

In [22]:
# Inspect types
df.dtypes

species               object
island                object
bill_length_mm       float64
bill_depth_mm        float64
flipper_length_mm    float64
body_mass_g          float64
sex                   object
dtype: object

Here it can be seen which variables have decimal number information i.e length and weight variables.

In [23]:
# Sex of the Penguins
df["sex"]

0        MALE
1      FEMALE
2      FEMALE
3         NaN
4      FEMALE
        ...  
339       NaN
340    FEMALE
341      MALE
342    FEMALE
343      MALE
Name: sex, Length: 344, dtype: object

In [24]:
# Count the number of Penguins of each sex.
df["sex"].value_counts ()

sex
MALE      168
FEMALE    165
Name: count, dtype: int64

In [25]:
# Describe the Data Set
df.describe()

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
count,342.0,342.0,342.0,342.0
mean,43.92193,17.15117,200.915205,4201.754386
std,5.459584,1.974793,14.061714,801.954536
min,32.1,13.1,172.0,2700.0
25%,39.225,15.6,190.0,3550.0
50%,44.45,17.3,197.0,4050.0
75%,48.5,18.7,213.0,4750.0
max,59.6,21.5,231.0,6300.0


Now with the information above we can see there were 342 penguins.

## Tables

***

| Species     || Bill Length (mm) | Body Mass (g) |
| ------------|| ---------------- | ------------- |
| Adelie      ||              38.8|           3701|
| Chinstrap   ||              48.8|           3733|
| Gentoo      ||              47.5|           5076|

## Two variable plots.
***

In [26]:
# Get just the bill Length
bill_len = df["bill_length_mm"]

# Show
print(bill_len)

# Type
print(type(bill_len))

0      39.1
1      39.5
2      40.3
3       NaN
4      36.7
       ... 
339     NaN
340    46.8
341    50.4
342    45.2
343    49.9
Name: bill_length_mm, Length: 344, dtype: float64
<class 'pandas.core.series.Series'>


In [27]:
# Just get the numpy array
bill_len = bill_len.to_numpy()

# Show
bill_len

array([39.1, 39.5, 40.3,  nan, 36.7, 39.3, 38.9, 39.2, 34.1, 42. , 37.8,
       37.8, 41.1, 38.6, 34.6, 36.6, 38.7, 42.5, 34.4, 46. , 37.8, 37.7,
       35.9, 38.2, 38.8, 35.3, 40.6, 40.5, 37.9, 40.5, 39.5, 37.2, 39.5,
       40.9, 36.4, 39.2, 38.8, 42.2, 37.6, 39.8, 36.5, 40.8, 36. , 44.1,
       37. , 39.6, 41.1, 37.5, 36. , 42.3, 39.6, 40.1, 35. , 42. , 34.5,
       41.4, 39. , 40.6, 36.5, 37.6, 35.7, 41.3, 37.6, 41.1, 36.4, 41.6,
       35.5, 41.1, 35.9, 41.8, 33.5, 39.7, 39.6, 45.8, 35.5, 42.8, 40.9,
       37.2, 36.2, 42.1, 34.6, 42.9, 36.7, 35.1, 37.3, 41.3, 36.3, 36.9,
       38.3, 38.9, 35.7, 41.1, 34. , 39.6, 36.2, 40.8, 38.1, 40.3, 33.1,
       43.2, 35. , 41. , 37.7, 37.8, 37.9, 39.7, 38.6, 38.2, 38.1, 43.2,
       38.1, 45.6, 39.7, 42.2, 39.6, 42.7, 38.6, 37.3, 35.7, 41.1, 36.2,
       37.7, 40.2, 41.4, 35.2, 40.6, 38.8, 41.5, 39. , 44.1, 38.5, 43.1,
       36.8, 37.5, 38.1, 41.1, 35.6, 40.2, 37. , 39.7, 40.2, 40.6, 32.1,
       40.7, 37.3, 39. , 39.2, 36.6, 36. , 37.8, 36

In [28]:
# Bill Depth.
b_depth = df["bill_depth_mm"].to_numpy()

# show.
b_depth

array([18.7, 17.4, 18. ,  nan, 19.3, 20.6, 17.8, 19.6, 18.1, 20.2, 17.1,
       17.3, 17.6, 21.2, 21.1, 17.8, 19. , 20.7, 18.4, 21.5, 18.3, 18.7,
       19.2, 18.1, 17.2, 18.9, 18.6, 17.9, 18.6, 18.9, 16.7, 18.1, 17.8,
       18.9, 17. , 21.1, 20. , 18.5, 19.3, 19.1, 18. , 18.4, 18.5, 19.7,
       16.9, 18.8, 19. , 18.9, 17.9, 21.2, 17.7, 18.9, 17.9, 19.5, 18.1,
       18.6, 17.5, 18.8, 16.6, 19.1, 16.9, 21.1, 17. , 18.2, 17.1, 18. ,
       16.2, 19.1, 16.6, 19.4, 19. , 18.4, 17.2, 18.9, 17.5, 18.5, 16.8,
       19.4, 16.1, 19.1, 17.2, 17.6, 18.8, 19.4, 17.8, 20.3, 19.5, 18.6,
       19.2, 18.8, 18. , 18.1, 17.1, 18.1, 17.3, 18.9, 18.6, 18.5, 16.1,
       18.5, 17.9, 20. , 16. , 20. , 18.6, 18.9, 17.2, 20. , 17. , 19. ,
       16.5, 20.3, 17.7, 19.5, 20.7, 18.3, 17. , 20.5, 17. , 18.6, 17.2,
       19.8, 17. , 18.5, 15.9, 19. , 17.6, 18.3, 17.1, 18. , 17.9, 19.2,
       18.5, 18.5, 17.6, 17.5, 17.5, 20.1, 16.5, 17.9, 17.1, 17.2, 15.5,
       17. , 16.8, 18.7, 18.6, 18.4, 17.8, 18.1, 17

In [29]:
```# Simple Plot.
plt.plot(bill_len, b_depth, "x")

# Axis Labels.
plt.xlabel("Bill Length (mm)")
plt.ylabel("Bill Depth(mm)")

# Title
plt.title("Palmer Penguin Data Set")

# X Limits. (to allow 0 to 8)
plt.xlim(0, 8)

# Y Limits. (to allow 0 to 4)
plt.ylim(0, 4)

SyntaxError: invalid syntax (1110828747.py, line 1)

In [None]:
# Create new figure and set of axes.
fig, ax = plt.subplots()

# Simple Plot.
ax.plot(plen, pwidth, "x")

# Axis Labels.
ax.set_xlabel("Petal Length (cm)")
ax.set_ylabel("Petal Width(cm)")

# Title
ax.set_title("Penguin Data Set")

# X Limits. (to allow 0 to 8)
ax.set_xlim(0, 8)

# Y Limits. (to allow 0 to 4)
ax.set_ylim(0, 4)


#### Add Best Fit Line
***

In [None]:
$ y = mx + c = p_1 x^1 + p_0 = p_1 x + p_0 $

SyntaxError: invalid syntax (3875088914.py, line 1)

In [None]:
# Fit a straight line between x and y.
m, c = np.polyfit(plen,pwidth,1)

# show m and c.
print (m, c)

In [None]:
# Create new figure and set of axes.
fig, ax = plt.subplots()

# Simple Plot.
ax.plot(plen, pwidth, "x")

ax.plot(plen, m * plen + c, "r-")


# Axis Labels.
ax.set_xlabel("Petal Length (cm)")
ax.set_ylabel("Petal Width(cm)")

# Title
ax.set_title("Iris Data Set")

# X Limits. (to allow 0 to 8)
ax.set_xlim(0, 8)

# Y Limits. (to allow 0 to 4)
ax.set_ylim(0, 4)

In [None]:
# x values for Best Fit Line.
bf_x = np.linspace (0.0, 8.0, 100)

# y values for Best Fit Line.
bf_y = m * bf_x + c

In [None]:
# Create new figure and set of axes.
fig, ax = plt.subplots()

# Simple Plot.
ax.plot(plen, pwidth, "x")

# showing the best fit in red line
ax.plot(bf_x, bf_y, "r-")

In [None]:
# Create new figure and set of axes.
fig, ax = plt.subplots()

# Simple Plot.
ax.plot(plen, pwidth, "x")

# showing the best fit in red line
ax.plot(bf_x, bf_y, "r-")


# Axis Labels.
ax.set_xlabel("Petal Length (cm)")
ax.set_ylabel("Petal Width(cm)")

# Title
ax.set_title("Iris Data Set")

# X Limits. (to allow 0 to 8)
ax.set_xlim(0, 8)

# Y Limits. (to allow -1 to 4)
ax.set_ylim(-1, 4)

In [None]:
# Measure Correlation
np.corrcoef(plen, pwidth)
```

## Math

***
$ f(x) = x^2$
$\sum_{i=0}^{n-1} i$
$\bar{x} = \frac{\sum_{i=0}^{n-1} x_i}{n}$

***
###
End.