![header](https://storage.googleapis.com/kaggle-competitions/kaggle/22962/logos/header.png?t=2021-03-17-22-44-09)

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 1px;
              color:#2596be;">
<span style="font-size:30px;"> Introduction </span>
</p>
</div>

Data in this competition contains images of over 15,000 unique individual marine mammals from 30 different species collected from 28 different research organizations. Individuals have been manually identified and given an individual_id by marine researches.
<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 1px;
              color:#2596be;">
<span style="font-size:30px;">Objective</span>
</p>
</div>

Your task is to correctly identify these individuals in images. It's a challenging task that has the potential to drive significant advancements in understanding and protecting marine mammals across the globe.
<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 1px;
              color:#2596be;">
<span style="font-size:30px;"> Methodology </span>
</p>
</div>

Introducing Species

Read Data & View Images

Data Cleaning

Exploratory Data Analysis

Image Transformation

Image Augmentation

***In Progress...***

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 1px;
              color:#2596be;">
<span style="font-size:30px;"> Import Libraries </span>
</p>
</div>

In [None]:
import numpy as np
import cv2
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import os
from tqdm import tqdm


<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 1px;
              color:#2596be;">
<span style="font-size:30px;"> Used Functions </span>
</p>
</div>

In [None]:
# Function from https://www.kaggle.com/heye0507/happy-whale-eda-baseline
PATH = '../input/happy-whale-and-dolphin'
def plot_n_samples(species,row=5,path=PATH,folder='train_images'):
    path = PATH + '/' + folder
    n = row * row
    df1 = df[df['species'] == species]
    
    _,ax = plt.subplots(row,row,figsize=(15,10))
    
    for i in range(n):
        idx = np.random.randint(len(df1))
        fname = df1.iloc[idx]['image']
        label = df1.iloc[idx]['individual_id']
        img = cv2.imread(path + '/' + fname)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        ax[i//row][i%row].imshow(img)
        ax[i//row][i%row].set_title(label + '-' + species)
        ax[i//row][i%row].axis('off')

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 1px;
              color:#2596be;">
<span style="font-size:30px;"> Read Data </span>
</p>
</div>

In [None]:
df = pd.read_csv("../input/happy-whale-and-dolphin/train.csv")
df

In [None]:
print("Number of Unique Individual IDs: ", df.individual_id.nunique())
print("Number of Unique Species: ", df.species.nunique())

In [None]:
df.individual_id.value_counts()

In [None]:
df.species.value_counts()

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<h1 style="padding: 1px;
              color:#2596be;">
<span style="font-size:30px;"> Getting To Know Every Specie!  </span>
</h1>
</div>

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:#F17D64;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:white;">
<span style="font-size:20px;"> 
We need to pay particular attention to dorsal fins and lateral body views (to the side of, or away from, or the middle of the body)  </span>
</p>
</div>

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
They           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;"> 🐬 Bottlenose Dolphin </span>
</p>
</div>

![bottlenose](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-dolphin_bottlenose_nb_w.png?itok=e6t8QRC9)

▪️ They are generally gray in color. They can range from light gray to almost black on top near their dorsal fin and light gray to almost white on their belly.

▪️ They are easy to view in the wild because they live close to shore and are distributed throughout coastal and estuarine waters.

https://www.fisheries.noaa.gov/species/common-bottlenose-dolphin

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;"> 🐳 Beluga </span>
</p>
</div>

![beluga](https://media.istockphoto.com/illustrations/watercolor-beluga-whale-illustration-isolated-on-white-background-illustration-id1227138495?k=20&m=1227138495&s=170667a&w=0&h=SlYhWGMejmAixWtRzo68wtU5EGuAe9Mnqa0ysIBIVzo=)

▪️ The beluga, or white whale, is one of the smallest species of whale. Their distinctive color and prominent foreheads make them easily identifiable.

▪️ The beluga has a very flexible neck that enables it to nod and turn its head in all directions.


https://kids.nationalgeographic.com/animals/mammals/facts/beluga-whale

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;"> 🐬Dusky Dolphin </span>
</p>
</div>

![dusky](https://ars.els-cdn.com/content/image/3-s2.0-B9780128043271001114-f04-24-9780128043271.jpg)

▪️ Often several dusky dolphins jump together in the same patch of the ocean but facing in multiple directions,

▪️ Dusky’s look like they are wearing eyemasks as they have distinctive dark lips, snout tip, and patch around each eye which stand out on the lighter face. 

▪️ The back is dark grey to bluish-black, the belly is white, and the sides are grey. 

https://uk.whales.org/whales-dolphins/species-guide/dusky-dolphin/

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐬Spinner Dolphin </span>
</p>
</div>
    
![spinnner](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-spinner-dolphin.png?itok=vN0dNT-x)

▪️ Spinner dolphins received their common name because they are often seen leaping and spinning out of the water.

▪️ Spinner dolphins are relatively small compared with other species of oceanic dolphins. They are slender, with thin, recurved flippers, and dorsal fins that usually range from slightly curved to erect and triangular.

https://www.fisheries.noaa.gov/species/spinner-dolphin

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐬Spotted Dolphin </span>
</p>
</div>

![spotted](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-atlantic-spotted-dolphin.png?itok=I6ISA-rw)

▪️ There are two species of spotted dolphins the pantropical spotted dolphin and atlantic spotted dolphin, probably in this atlantic spotted dolphin is meant as pantropical dolphin is already in different category.

▪️ Young Atlantic spotted dolphins do not have spots. As a result, they can look like slender bottlenose dolphins

▪️ They have a robust body with a tall, curved dorsal fin located midway down their back.


https://uk.whales.org/whales-dolphins/species-guide/pantropical-spotted-dolphin/

https://www.fisheries.noaa.gov/species/atlantic-spotted-dolphin

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐬Common Dolphin </span>
</p>
</div>
    
![common](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-short-beaked-common-dolphin.png?itok=aZiaoEmW)

▪️ Short-beaked common dolphins are one of the most abundant and familiar dolphins in the world. 

▪️ Their body is sleek with a relatively tall, triangular dorsal fin in the middle of their back.



https://www.fisheries.noaa.gov/species/short-beaked-common-dolphin

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐬White Sided Dolphin </span>
</p>
</div>
    
***1. Atlantic White Sided Dolphin***
![atlantic](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-atlantic-white-sided-dolphin.jpg?itok=2t_0VVSy)

▪️ Atlantic white-sided dolphins are relatively small delphinds.

▪️ They have a distinct prominent, relatively large, tall, falcate dorsal fin, located midway down the back.

***2. Pacific White Sided Dolphin***
![pacific](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-pacific-white-sided-dolphin.png?itok=-RaS_tlw)

▪️ These dolphins have a robust body, short rostrum (snout), and large dorsal fin compared to their overall body size. 

▪️ Pacific white-sided dolphins are most likely to be confused with common dolphins

https://www.fisheries.noaa.gov/species/atlantic-white-sided-dolphin

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐬Pantropic Spotted Dolphin </span>
</p>
</div>
    
![pantropic](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-pantropical-spotted-dolphin.png?itok=5XTCLTnu)

▪️ Adults are spottier than younger dolphins, as more and more spots appear as they get older.

▪️ The dorsal fin is strongly curved, tall and narrow. 
 
▪️ They are dark grey on top with light spots and lighter grey on the belly and sides with dark spots. 

https://www.fisheries.noaa.gov/species/pantropical-spotted-dolphin

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐬Commersons Dolphin </span>
</p>
</div>
    
![commersons](http://whaleopedia.org/animalfund/wp-content/uploads/2013/10/Commersons-Dolphin1.png)

▪️ Commerson’s dolphins are small, chubby dolphins with cone-shaped heads and no beak. 

▪️ The dorsal fin is large and rounded; it looks like one of Mickey Mouse’s ears.


https://uk.whales.org/whales-dolphins/species-guide/commersons-dolphin/

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐬Rough Toothed Dolphin </span>
</p>
</div>

![rough](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-rough-toothed-dolphin.png?itok=WvxCx5oj)

▪️ Rough-toothed dolphins are relatively small compared to other dolphins. 

▪️ Their dorsal fin and pectoral fins or flippers are distinctively large, which is characteristic of this species. They have a “reptilian” appearance that is also distinct and unique among dolphins.

https://www.fisheries.noaa.gov/species/rough-toothed-dolphin

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐬Frasiers Dolphin </span>
</p>
</div>
    
![fraser](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-frasers-dolphin.png?itok=mAHHyXOJ)

▪️ Their dorsal fin—located midway down their back—is small and triangular.

▪️ All Fraser's dolphins have a dark stripe that extends down their side from eye to pectoral flipper


https://www.fisheries.noaa.gov/species/frasers-dolphin

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐳Globis/Pilot Whale/Short Finned Pilot Whale </span>
</p>
</div>

![pilot](https://media.fisheries.noaa.gov/styles/original/s3/2021-08/640x427-Whale_Short-Finned_Pilot.png?itok=jmCRoGqj)

▪️ They are one of two species of pilot whale, along with the long-finned pilot whale. The two species differ slightly in size, features, coloration, and pattern. In the field and at sea, it is very difficult to tell the difference between the two species.

▪️ The short-finned pilot whale has a bulbous melon head with no obvious rostrum. Its dorsal fin is far forward on its body and has a relatively long base. The body is black or dark brown, with a large gray saddle behind the dorsal fin.


https://www.fisheries.noaa.gov/species/short-finned-pilot-whale

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐳Humpback Whale </span>
</p>
</div>

![humpback](https://media.fisheries.noaa.gov/styles/original/s3/2021-08/640x427-Whale-Humpback-watermark.jpg?itok=3RBJ1uQq)

▪️ The humpback whale gets its common name from the distinctive hump on its back.

https://www.fisheries.noaa.gov/species/humpback-whale

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐳Blue Whale </span>
</p>
</div>
   
   
![blue](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-blue-whale.jpg?itok=Ffb4BA78)

▪️ Blue whales are the largest animals ever to live on our planet. Today, blue whales are listed as endangered.

▪️ Its dorsal (top) fin is small and triangular or falcate (curved) in shape, and is located three-fourths of the way back on the body. 

https://www.fisheries.noaa.gov/species/blue-whale

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐬False Killer Whale </span>
</p>
</div>
   
![falsekiller](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-false-killer-whale.png?itok=v9Lw8QPd)

▪️ The false killer whale’s entire body is black or dark gray, although lighter areas may occur ventrally (on its underside) between the flippers or on the sides of the head

▪️ The dorsal fin is located in the middle of the back and generally curves backward. In Hawaiian waters, dorsal fin shapes show a lot of variability, often caused by injury from fishery interactions!

https://www.fisheries.noaa.gov/species/false-killer-whale

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐳Melon Headed Whale </span>
</p>
</div>
    
![melon](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-melon-headed-whale.png?itok=Cg3fSKwZ)

▪️ Melon-headed whales have a small head with a rounded melon and no discernible beak.

▪️ Their dorsal fin is relatively large and they have pointed, tapering flippers (pectoral fins).

https://www.fisheries.noaa.gov/species/melon-headed-whale

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐳Minke Whale </span>
</p>
</div>
    
    
![MINKE](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-minke-whale.png?itok=lb2-uZjw)

▪️ Minke whales have a fairly tall, sickle-shaped dorsal fin located about two-thirds down their back. 

▪️ Their body is black to dark grayish/brownish, with a pale chevron on the back behind the head and above the flippers, as well as a white underside. 

https://www.fisheries.noaa.gov/species/minke-whale

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐬Killer Whale </span>
</p>
</div>
  
![killer](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-killer-whale.png?itok=mpHhEa6Y)


▪️ It is the largest member of the Delphinidae family, or dolphins

▪️ Killer whales are mostly black on top with white undersides and white patches near the eyes.

▪️ They also have a gray or white saddle patch behind the dorsal fin. 

▪️ Adult males develop disproportionately larger pectoral flippers, dorsal fins, tail flukes, and girths than females.

https://www.fisheries.noaa.gov/species/killer-whale

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐳Fin Whale </span>
</p>
</div>
    
![fin](https://media.fisheries.noaa.gov/styles/original/s3/2020-09/640x4270-fin-whale-v2.jpg?itok=q6QDTYa_)

▪️ The fin whale is the second-largest whale species on earth, second only to the blue whale. 

▪️ It is found throughout the world’s oceans. It gets its name from an easy-to-spot fin on its back, near its tail. They have a tall, hooked dorsal fin, about two-thirds of the way back on the body, that rises at a shallow angle from the back. 

https://www.fisheries.noaa.gov/species/fin-whale

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐳Gray Whale </span>
</p>
</div>
    
![gray](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-gray-whale.png?itok=r5uMVbmg)

▪️ Gray whales have a mottled gray body with small eyes located just above the corners of the mouth. Their pectoral flippers are broad, paddle-shaped, and pointed at the tips. 

▪️ They lack a dorsal fin! Instead Gray whales have a mottled gray body with small eyes located just above the corners of the mouth. Their pectoral flippers are broad, paddle-shaped, and pointed at the tips. 

https://www.fisheries.noaa.gov/species/gray-whale

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐳Southern Right Whale </span>
</p>
</div>
    
![southern](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-southern-right-whale.jpg?itok=Xb4SDpxg)

▪️ Southern right whales have a stocky, black body often with white belly and chin patches and a large head covered in callosities.

▪️ They lack a dorsal fin and have wide, paddle-shaped flippers!

https://www.fisheries.noaa.gov/species/southern-right-whale

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐳Sei Whale </span>
</p>
</div>
    
    
![sei](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-sei-whale.png?itok=aV7zzhmQ)

▪️ Sei whales have a long, sleek body that is dark bluish-gray to black in color and white or cream-colored on the underside. The body is often covered in oval-shaped scars.

▪️ Sei whales have a tall, hooked dorsal fin located about two-thirds down their back.

▪️ At the water's surface, sei whales can be recognized by a columnar or bushy blow that is about 10 to 13 feet high. The dorsal fin usually appears at the same time as the blowhole when the animal surfaces to breathe.

https://www.fisheries.noaa.gov/species/sei-whale

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐳Cuviers Beaked Whale </span>
</p>
</div>


![](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-cuviers-beaked-whale.png?itok=YoK45CGr)

▪️ A Cuvier’s beaked whale’s body has variable coloration that ranges from dark gray to a reddish-brown, with a paler counter-shaded underside. 

▪️ Cuvier's beaked whale is medium-sized with a round and robust body and a triangular falcate dorsal fin located far down the animal’s back. 

https://www.fisheries.noaa.gov/species/cuviers-beaked-whale

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐬Long Finned Pilot Whale </span>
</p>
</div>
    
    
![](https://media.fisheries.noaa.gov/styles/original/s3/dam-migration/640x427-long-finned-pilot-whale.png?itok=pFxt6472)

▪️ Long-finned pilot whales are one of two species of pilot whale, along with short-finned pilot whales. In the field and at sea, it is very difficult to distinguish the two species, which differ only slightly in physical size, features, coloration, and pattern.

▪️ Their thick dorsal fin is located about a third of the body length behind the head. As they mature, their dorsal fin becomes broader and rounder.

▪️ This species gets its common name from the pair of long, tapered, sickle-shaped flippers on either side of its body.

https://www.fisheries.noaa.gov/species/long-finned-pilot-whale

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 3px;
              color:#2596be;">
<span style="font-size:18px;">🐳Brydes Whale </span>
</p>
</div>

![](https://media.fisheries.noaa.gov/styles/original/s3/2021-07/640x427-Whale_Bryde%27s.jpg?itok=wxoXJVQy)
    
    
▪️ They are considered one of the "great whales," or rorquals, which is a group that also includes blue whales and humpback whales. 

▪️ The head of a Bryde's whale makes up about one quarter of its entire body length. The whales have a broad fluke, or tail, and a pointed and strongly hooked dorsal fin located about two-thirds back on the body. 

https://www.fisheries.noaa.gov/species/brydes-whale

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<h1 style="padding: 10px;
              color:#2596be;">
<span style="font-size:25px;"> Data Cleaning </span>
</h1>
</div>

<div style="color:#2596be;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 5px;
              color:#2596be;">
<span style="font-size:20px;"> Replace Misspellings </span>
</p>
</div>

In [None]:
df.species.replace({'kiler_whale': 'killer_whale'}, inplace = True)
df.species.replace({'bottlenose_dolpin': 'bottlenose_dolphin'}, inplace = True)

According to the following links ***globis*** and ***beluga*** are types of whales so we will add whale to it. This is important as I'll create new column called class based on parsing the specie name.


In [None]:
df.species.replace({'globis': 'globis_whale'}, inplace = True)
df.species.replace({'beluga': 'beluga_whale'}, inplace = True)

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 5px;
              color:#2596be;">
<span style="font-size:20px;"> Merge Columns </span>
</p>
</div>

According to the discussion [here](https://www.kaggle.com/c/happy-whale-and-dolphin/discussion/305468) from competition host ***pilot_whale*** and ***globis_whale*** are types of short_finned_pilot_whale, so the three can be merged together.

In [None]:
df.species.replace({'pilot_whale': 'short_finned_pilot_whale',
                    'globis_whale' : 'short_finned_pilot_whale'}, inplace = True)

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 5px;
              color:#2596be;">
<span style="font-size:20px;"> Whale or Dolphin? </span>
</p>
</div>

According to this [notebook](https://www.kaggle.com/kwentar/not-every-whale-is-a-whale). Some species are named as whales but they are technically dolphins so that intrigued me to research more about this as it affects classes distribution drastically. 

🔴 ***False Killer Whale***. "False killer whales are large members of the dolphin family". https://oceana.org/marine-life/false-killer-whale/

🔴 ***Killer Whale***. "Orcas, or killer whales, are the largest of the dolphins". https://www.nationalgeographic.com/animals/mammals/facts/orca

🔴 ***Pygmy Killer Whale***. "pygmy killer whales don’t actually look anything like killer whales. In fact, like killer whales, they’re not even whales: they’re dolphins." https://uk.whales.org/whales-dolphins/species-guide/pygmy-killer-whale/

🔴 ***Pilot Whales***. "The two species of pilot whales are actually dolphins, not whales" https://oceana.org/marine-life/long-finned-pilot-whale/

🔴 ***Melon Headed Whale***. "Not actually a whale, and with no actual melons. Well, not of the fruit variety anyway" https://uk.whales.org/whales-dolphins/species-guide/melon-headed-whale/


Accordingly I will change their class from *whale* to *dolphin*.

In [None]:
# Create a new column for class
df['class'] = df['species'].map(lambda x: 'whale' if 'whale' in x else 'dolphin')

for i in range(len(df)):
    if df.species[i] in ['false_killer_whale', 'killer_whale',
                            'pygmy_killer_whale', 'long_finned_pilot_whale',
                            'short_finned_pilot_whale', 'melon_headed_whale']:
        df['class'][i] = 'dolphin'

In [None]:
print("Total number of species after cleaning: ", df.species.nunique())

In [None]:
print(df.isnull().sum().sum())
print(df.duplicated().sum())
print("There are no nulls or duplicates in the data")

In [None]:
df

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<h1 style="padding: 10px;
              color:#2596be;">
<span style="font-size:25px;"> Exploratory Data Analysis </span>
</h1>
</div>

In [None]:
plot = sns.countplot(x = df['class'], color = '#2596be')
sns.despine()
plot.set_title('Class Distribution\n', font = 'serif', x = 0.1, y=1, fontsize = 16);
plot.set_ylabel("Count", x = 0.02, font = 'serif', fontsize = 12)
plot.set_xlabel("Specie", fontsize = 12, font = 'serif')

for p in plot.patches:
    plot.annotate(format(p.get_height(), '.0f'), (p.get_x() + p.get_width() / 2, p.get_height()), 
       ha = 'center', va = 'center', xytext = (0, -20),font = 'serif', textcoords = 'offset points', size = 15)

We can note from the previous plot that the classes are almost similarily distributed.

In [None]:
plt.figure(figsize=(15,7))

# define plot
plot = sns.countplot(y = df.species,
              color = '#2596be',
              order = df['species'].value_counts().index)

# Add titles
plot.set_title('Species Distributions\n\n', font = 'serif', x = 0.1, y=1, fontsize = 16);
plot.set_xlabel("Count", x = 0.02, font = 'serif', fontsize = 12)
plot.set_ylabel("Species", fontsize = 12, font = 'serif')
# Remove unneccessary gridlines
sns.despine(top = False, bottom = True)
# Move X Axis to top
plot.xaxis.set_ticks_position("top")
plot.xaxis.set_label_position('top')


Within whales and dolphin species the distribution is different so I might consider data augmentation for the species with least frequency.

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 5px;
              color:#2596be;">
<span style="font-size:20px;"> Most Frequent Whales </span>
</p>
</div>

In [None]:
whales = df.loc[df['class'] == 'whale', 'species']
plt.figure(figsize=(15,7))

plot = sns.countplot(y = df.loc[df['class'] == 'whale', 'species'],
                   color = 'grey',
                   order = df.loc[df['class'] == 'whale', 'species'].value_counts().index)

sns.despine(bottom = True, left = True, top = False)

plot.set_title('Most Frequent Whales\n\n', size = 20,x = 0.13, y=1, font = 'serif');
plt.xlabel("Count", font = 'serif', size = 12, x = 0.018)
plot.xaxis.set_ticks_position("top")
plot.xaxis.set_label_position('top')
plt.ylabel("Whale Specie", font = 'serif', size = 12)

x = np.sort(whales.value_counts())[-3:]
for bar in plot.patches:
    if bar.get_width() in x:
        bar.set_color('#2596be')    
    else:
        bar.set_color('grey')

plt.show()

In [None]:
plot_n_samples('beluga_whale', 2)

In [None]:
plot_n_samples('humpback_whale', 2)

In [None]:
plot_n_samples('blue_whale', 2)

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 5px;
              color:#2596be;">
<span style="font-size:20px;"> Least Frequent Whales </span>
</p>
</div>

In [None]:
plt.figure(figsize=(15,7))

plot = sns.countplot(y = df.loc[df['class'] == 'whale', 'species'],
                   color = 'grey',
                   order = df.loc[df['class'] == 'whale', 'species'].value_counts(ascending = True).index)

sns.despine(bottom = True, left = True, top = False)

plot.set_title('Least Frequent Whales\n\n',  x = 0.13, y = 1, size = 20, font = 'serif');
plt.xlabel("Count", font = 'serif', x = 0.018, size = 12)
plot.xaxis.set_ticks_position("top")
plot.xaxis.set_label_position('top')
plt.ylabel("Whale Specie", font = 'serif', size = 12)

x = np.sort(whales.value_counts())[0:3]
for bar in plot.patches:
    if bar.get_width() in x:
        bar.set_color('#2596be')    
    else:
        bar.set_color('grey')

plt.show()

In [None]:
plot_n_samples('pygmy_killer_whale', 2)

In [None]:
plot_n_samples('cuviers_beaked_whale', 2)

In [None]:
plot_n_samples('sei_whale', 2)

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 5px;
              color:#2596be;">
<span style="font-size:20px;"> Most Frequent Dolphins </span>
</p>
</div>

In [None]:
dolphins = df.loc[df['class'] == 'dolphin', 'species']
plt.figure(figsize=(15,7))

plot = sns.countplot(y = df.loc[df['class'] == 'dolphin', 'species'],
                   color = 'grey',
                   order = df.loc[df['class'] == 'dolphin', 'species'].value_counts().index)

sns.despine(bottom = True, left = True, top = False)

plot.set_title('Most Frequent Dolphins\n\n', x = 0.14, y = 1, size = 20, font = 'serif');
plt.xlabel("Count", font = 'serif', x = 0.018, size = 12)
plot.xaxis.set_ticks_position("top")
plot.xaxis.set_label_position('top')
plt.ylabel("Dolphin Specie", font = 'serif', size = 12)

x = np.sort(dolphins.value_counts())[-3:]
for bar in plot.patches:
    if bar.get_width() in x:
        bar.set_color('#2596be')    
    else:
        bar.set_color('grey')

plt.show()

In [None]:
plot_n_samples('bottlenose_dolphin', 2)

In [None]:
plot_n_samples('false_killer_whale', 2)

In [None]:
plot_n_samples('dusky_dolphin', 2)

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 5px;
              color:#2596be;">
<span style="font-size:20px;"> Least Frequent Dolphins </span>
</p>
</div>

In [None]:
plt.figure(figsize=(15,7))

plot = sns.countplot(y = df.loc[df['class'] == 'dolphin', 'species'],
                   color = 'grey',
                   order = df.loc[df['class'] == 'dolphin', 'species'].value_counts(ascending = True).index)

sns.despine(bottom = True, left = True, top = False)

plot.set_title('Least Frequent Dolphins\n\n', x = 0.14, y =1, size = 20, font = 'serif');
plt.xlabel("Count", font = 'serif', x = 0.013, size = 12)
plot.xaxis.set_ticks_position("top")
plot.xaxis.set_label_position('top')
plt.ylabel("Dolphin Specie", font = 'serif', size = 12)

x = np.sort(dolphins.value_counts())[0:3]
for bar in plot.patches:
    if bar.get_width() in x:
        bar.set_color('#2596be')    
    else:
        bar.set_color('grey')

plt.show()

In [None]:
plot_n_samples('frasiers_dolphin', 2)

In [None]:
plot_n_samples('rough_toothed_dolphin', 2)

In [None]:
plot_n_samples('pygmy_killer_whale', 2)

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 5px;
              color:#2596be;">
<span style="font-size:20px;"> Most Frequent Whale ID </span>
</p>
</div>

In [None]:
# Function from https://www.kaggle.com/heye0507/happy-whale-eda-baseline
PATH = '../input/happy-whale-and-dolphin'
def plot_id_whale_samples(row=5,path=PATH,folder='train_images'):
    path = PATH + '/' + folder
    n = row * row
    df1 = df.loc[df['class'] == 'whale']
    df1 = df1.loc[df1['individual_id'] == df1['individual_id'].value_counts().index[0]]
    _,ax = plt.subplots(row,row,figsize=(15,10))
    
    for i in range(n):
        idx = np.random.randint(len(df1))
        fname = df1.iloc[idx]['image']
        label = df1.iloc[idx]['individual_id']
        img = cv2.imread(path + '/' + fname)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        ax[i//row][i%row].imshow(img)
        ax[i//row][i%row].set_title(label)
        ax[i//row][i%row].axis('off')

In [None]:
plot_id_whale_samples(2)

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 5px;
              color:#2596be;">
<span style="font-size:20px;"> Most Frequent Dolphin ID </span>
</p>
</div>

In [None]:
# Function from https://www.kaggle.com/heye0507/happy-whale-eda-baseline
PATH = '../input/happy-whale-and-dolphin'
def plot_id_dolphin_samples(row=5,path=PATH,folder='train_images'):
    path = PATH + '/' + folder
    n = row * row
    df1 = df.loc[df['class'] == 'dolphin']
    df1 = df1.loc[df1['individual_id'] == df1['individual_id'].value_counts().index[0]]
    _,ax = plt.subplots(row,row,figsize=(15,10))
    
    for i in range(n):
        idx = np.random.randint(len(df1))
        fname = df1.iloc[idx]['image']
        label = df1.iloc[idx]['individual_id']
        img = cv2.imread(path + '/' + fname)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        ax[i//row][i%row].imshow(img)
        ax[i//row][i%row].set_title(label)
        ax[i//row][i%row].axis('off')

In [None]:
plot_id_dolphin_samples(2)

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<h1 style="padding: 10px;
              color:#2596be;">
<span style="font-size:30px;"> Image Transformation </span>

    In Progress...
</h1>
</div>

In [None]:
# Function from https://www.kaggle.com/gpreda/resize-images-of-happy-whales-and-dolphins/notebook
def create_resized_images(image_path, image_output_path):
    image_list = os.listdir(image_path)
    for image in tqdm(image_list):
        img = cv2.imread(os.path.join(image_path, image))
        img = cv2.resize(img, NEW_SIZE[::-1], interpolation=INTERPOLATION)
        cv2.imwrite(os.path.join(image_output_path , image), img)
        
def plot_image_samples_test(images_folder):
    root_path = "/kaggle/working/"
    images_folder=images_folder

    f, ax = plt.subplots(4, 4, figsize=(16,16))
    file_list = list(os.listdir(root_path+images_folder))
    for i in range(16):
        file = file_list[i]
        img = cv2.imread(root_path+images_folder+file)
        ax[i//4, i%4].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
        ax[i//4, i%4].set_title("Test image: "+file+f"\nsize: {img.shape}")
        ax[i//4, i%4].axis('off')

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#2596be;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 5px;
              color:white;">
<span style="font-size:15px;"> Resize Images 128x128 </span>
</p>
</div>


In [None]:
# Code from https://www.kaggle.com/gpreda/resize-images-of-happy-whales-and-dolphins/notebook

# os.makedirs('/kaggle/working/train_images_128')
# # os.makedirs('/kaggle/working/test_images_128')

# NEW_SIZE = (128, 128) # (height, width)
# INTERPOLATION = cv2.INTER_CUBIC

# train_image_path = "../input/happy-whale-and-dolphin/train_images"
# train_image_output_path = "/kaggle/working/train_images_128"
# create_resized_images(train_image_path, train_image_output_path)

In [None]:
# plot_image_samples_test('train_images_128/')

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#2596be;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 10px;
              color:white;">
<span style="font-size:25px;"> Data Augmentation </span>
In Progress...
</p>
</div>

Code taken from https://www.kaggle.com/kohjiahng/whaleeda-aug

In [None]:
# def to_tensor(arr):
#     return torch.tensor(arr).permute(2,0,1)
# def to_array(tensor):
#     return np.array(tensor).transpose((1, 2, 0))
# def plot_augmentations(paths, aug_transform, figsize, n_augs=4):
#     aug_transform = transforms.Compose([
#         transforms.RandomResizedCrop((256,256)),
#         aug_transform
#     ])
#     fig, axes = plt.subplots(len(paths), n_augs + 1, figsize=figsize)
#     plt.tight_layout(rect=[0, 0.03, 1, 0.94])
    
#     axes[0][0].set_title('Original', fontsize=30)
#     for i in range(n_augs):
#         axes[0][i+1].set_title(f"Augmentation {i+1}", fontsize=30)
    
#     for row, path in enumerate(paths):
#         img = read_image(path)
#         axes[row][0].imshow(img)
#         axes[row][0].axis('off')

#         for aug in range(1, n_augs+1):
#             img_t = to_tensor(img)
#             axes[row][aug].imshow(to_array(aug_transform(img_t)))
#             axes[row][aug].axis('off')

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#2596be;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 5px;
              color:white;">
<span style="font-size:15px;"> Flipping Images </span>
</p>
</div>

In [None]:
# random_paths = df['path'].sample(4, random_state = 0)

# hflip_transform = transforms.RandomHorizontalFlip(p=0.5)
# plot_augmentations(random_paths, hflip_transform, (40,20))

# plt.suptitle('Horizontal Flip with probability 0.5', fontsize=35)

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#2596be;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 5px;
              color:white;">
<span style="font-size:15px;"> Changing Brightness</span>
</p>
</div>

In [None]:
# brightness_transform = transforms.ColorJitter(brightness=0.5)
# plot_augmentations(random_paths, brightness_transform, (40,20))

# plt.suptitle('Random brightness change with brightness=0.5', fontsize=35)

In [None]:
# all_aug = transforms.Compose([
#     hflip_transform,
#     brightness_transform,
#     pov_transform,
#     rot_transform
# ])
# plot_augmentations(random_paths, all_aug, (40,20))
# plt.suptitle('All Augmentations combined', fontsize=35)

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#2596be;
           font-size:110%;
           font-family:Serif;
           letter-spacing:0.5px">

<p style="padding: 10px;
              color:white;">
<span style="font-size:25px;"> Modeling </span>
In Progress...
</p>
</div>