**Penguin Species**

There are 17 penguin species on the planet, but the eight most iconic reside in Antarctica, its nearby islands, and the sub-Antarctic archipelagos of South Georgia and the Falklands.

For now, we are shifting our focus to mainly 3 type of penguins ie Adelie, Gentoo & Chinstrap Penguins

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRuDOFJ-SBxZVuVbHsa6hyYrSygNg9nTHysnw&usqp=CAU)

Load required Libraries

In [None]:
import pandas as pd
import numpy as np 
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

In [None]:
penguin = pd.read_csv('../input/palmer-archipelago-antarctica-penguin-data/penguins_size.csv')

Checking first 5 and last 5 records from the datasets

In [None]:
penguin.head(5)

In [None]:
penguin.tail(5)

Finding the shape of data. Total no of rows and columns.

In [None]:
penguin.shape

In [None]:
penguin.columns

**The dataset consists of 7 columns.**

* species: penguin species (Chinstrap, Adélie, or Gentoo)
* island: island name (Dream, Torgersen, or Biscoe) in the Palmer Archipelago (Antarctica)
* culmen_length_mm: culmen length (mm)
* culmen_depth_mm: culmen depth (mm)
* flipper_length_mm: flipper length (mm)
* body_mass_g: body mass (g)
* sex: penguin sex

Finding information of the data- data types, columns and null values

In [None]:
penguin.info()

In penguin data set, there are 7 columns and 344 rows. There are few null values present in culmen_length_mm, culmen_depth_mm, flipper_length_mm, body_mass_g and sex columns

Let us work on missing values

In [None]:
penguin.isna().sum()

In [None]:
penguin["culmen_length_mm"] = penguin["culmen_length_mm"].fillna(value = penguin["culmen_length_mm"].mean())
penguin["culmen_depth_mm"] = penguin["culmen_depth_mm"].fillna(value = penguin["culmen_depth_mm"].mean())
penguin["flipper_length_mm"] = penguin["flipper_length_mm"].fillna(value = penguin["flipper_length_mm"].mean())
penguin["body_mass_g"] = penguin["body_mass_g"].fillna(value = penguin["body_mass_g"].mean())


In [None]:
penguin['sex'] = penguin['sex'].fillna('MALE')

There is one missing value in sex column. Let's fix that too

In [None]:
penguin[penguin['sex']=='.']

In [None]:
penguin.loc[336,'sex'] = 'MALE'

In [None]:
penguin.isna().sum()

In [None]:
penguin.info()

There no null values in the data set now. So we can go ahead and work on EDA

Checking the dataset stats

In [None]:
penguin.describe()

Let's check count for each species

In [None]:
penguin['species'].value_counts()

The penguin dataset consists of 344 data instances. There are 3 classes(species) - Adelie, Gentoo and Chinstrap

In [None]:
sns.countplot('species',data=penguin, palette=('DarkOrange', 'MediumOrchid', 'Teal'))
plt.show()

The penguins dataset has different number of samples for each species. Adelie are the highest number followed by Gentoo and Chinstrap

In [None]:
penguin['island'].value_counts()

In [None]:
sns.countplot(x = "island", data = penguin)

Most of the Penguins belong to Biscoe island and least are from Torgersen

In [None]:
sns.FacetGrid(penguin, hue="species", height=5, palette=('DarkOrange', 'MediumOrchid', 'Teal')).map(sns.distplot,"culmen_length_mm").add_legend();

From above plot, we see that on the basis of culmen length, Adelie is separable while the other two are overlapping

In [None]:
sns.FacetGrid(penguin, hue="species", height=5, palette=('DarkOrange', 'MediumOrchid', 'Teal')).map(sns.distplot,"culmen_depth_mm").add_legend();

From above plot, we see that on the basis of culmen depth, Gentoo is separable while the other two are overlapping

In [None]:
sns.FacetGrid(penguin, hue="species", height=5, palette=('DarkOrange', 'MediumOrchid', 'Teal')).map(sns.distplot,"flipper_length_mm").add_legend();

From above plot, we see that on the basis of flipper length, Gentoo is separable while the other two are overlapping

In [None]:
sns.FacetGrid(penguin, hue="species", height=5, palette=('DarkOrange', 'MediumOrchid', 'Teal')).map(sns.distplot,"body_mass_g").add_legend();

* From above plot, we see that on the basis of body mass, Gentoo is separable while the other two are overlapping

In [None]:
plt.figure(figsize=(7,7))
sns.set_style('whitegrid')
sns.pairplot(data=penguin, hue='species', palette=('DarkOrange', 'MediumOrchid', 'Teal'))

From above plot we can see that,

* In case of culmen length, Adelie is easily seperable/distinguishable.
* In case of culmen depth, flipper length and body mass, Gentoo is easily seperable/distinguishable.

In [None]:
sns.catplot(x="sex", y="culmen_length_mm", hue="species", data=penguin, kind="bar", palette=('DarkOrange', 'MediumOrchid', 'Teal'))

Chinstrap penguins have highest culmen length in both male and female followed by Gentoo and Adelie

In [None]:
sns.catplot(x="sex", y="culmen_depth_mm", hue="species", data=penguin, kind="bar", palette=('DarkOrange', 'MediumOrchid', 'Teal'))

Gentoo penguins have highest flipper length in both male and female

In [None]:
sns.catplot(x="sex", y="body_mass_g", hue="species", data=penguin, kind="bar", palette=('DarkOrange', 'MediumOrchid', 'Teal'))

Gentoo penguins have highest body weight in both male and female

Analyzing the relationship between culmen_length and culmen_depth

In [None]:
sns.scatterplot(x = penguin.culmen_length_mm, y = penguin.culmen_depth_mm, hue = penguin.species, palette=('DarkOrange', 'MediumOrchid', 'Teal'))

From above plot:

* Clearly three groups of species can be identified based on culment length and depth feature
* Each of the species culmen_length and culmen_depth fall in a certain range.

Analyzing of mass with respective of species

In [None]:
sns.boxplot(x = penguin.sex, y = penguin.body_mass_g, hue = penguin.species, palette=('DarkOrange', 'MediumOrchid', 'Teal'))

From above plot:

* Male penguins of all species are heavier than female penguins.
* Gentoo penguins are heavier than Adelie and Chinstrap penguins.

In [None]:
corr_mat = penguin.corr()
corr_mat

In [None]:
plt.figure(figsize=(8,8))
sns.heatmap(corr_mat, annot=True)

Flipper length and body_mass are strongly dependent with corelation value of 0.87. In other words penguins with longer flips, generally weigh more.