# BirdVision

## Dataset outline

Bird species dataset from Kaggle https://www.kaggle.com/datasets/gpiosenka/100-bird-species/data?select=birds.csv

Key points about the dataset:

- 224 X 224 X 3 px (RGB) in jpg format
- Data set includes a train set, validation set and test set. 
- Each set contains 525 sub directories, one for each bird species. 
- The `filepaths` column contains the relative file path to an image file. 
- The labels column contains the bird species class name associated with the image file. 
- The scientific label column contains the *latin* scientific name for the species. 
- The data set column denotes which dataset (train, test or valid) the filepath resides in. 
- The class_id column contains the class index value associated with the image file's class.

## `birds.csv` exploration

The dataset also includes a file named birds.csv which contains 5 columns: `class id`, `filepaths`, `labels`, `data set`, `scientific name` 

1. Inspect the dataset 
2. Create a new dataframe with the distinct bird species

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
df = pd.read_csv('birds.csv')
df.head(5)

Unnamed: 0,class id,filepaths,labels,data set,scientific name
0,0.0,train/ABBOTTS BABBLER/001.jpg,ABBOTTS BABBLER,train,MALACOCINCLA ABBOTTI
1,0.0,train/ABBOTTS BABBLER/007.jpg,ABBOTTS BABBLER,train,MALACOCINCLA ABBOTTI
2,0.0,train/ABBOTTS BABBLER/008.jpg,ABBOTTS BABBLER,train,MALACOCINCLA ABBOTTI
3,0.0,train/ABBOTTS BABBLER/009.jpg,ABBOTTS BABBLER,train,MALACOCINCLA ABBOTTI
4,0.0,train/ABBOTTS BABBLER/002.jpg,ABBOTTS BABBLER,train,MALACOCINCLA ABBOTTI


In [3]:
df['data set'].value_counts()

train    84635
test      2625
valid     2625
Name: data set, dtype: int64

In [4]:
train = df[df['data set'].isin(['train'])]
val = df[df['data set'].isin(['valid'])]
test = df[df['data set'].isin(['test'])]

# unique species in train set
print(f"Unique species in train set:", len(train['scientific name'].unique()))

# unique species in validate set
print(f"Unique species in validate set:", len(val['scientific name'].unique()))

# unique species in test set
print(f"Unique species in test set:", len(test['scientific name'].unique()))

Unique species in train set: 522
Unique species in validate set: 522
Unique species in test set: 522


In [5]:
species_array = test['scientific name'].unique()
species_array

array(['MALACOCINCLA ABBOTTI', 'PAPASULA ABBOTTI', 'BUCORVUS ABYSSINICUS',
       'BALEARICA REGULORUM', 'CHRYSOCOCCYX CUPREUS',
       'LAGONOSTICTA RUBRICATA', 'HAEMATOPUS MOQUINI', 'TOCKUS FASCIATUS',
       'NETTAPUS AURITUS', 'DIOMEDEIDAE', 'PIPILO ABERTI',
       'PSITTACULA EUPATRIA', 'PYRRHOCORAX GRACULUS',
       'GEOTHLYPIS FLAVOVELATA', 'BOTAURUS LENTIGINOSUS',
       'FULICA AMERICANA', 'PHOENICOPTERUS RUBER', 'SPINUS TRISTIS',
       'FALCO SPARVERIUS', 'ANTHUS RUBESCENS', 'SETOPHAGA RUTICILLA',
       'TURDUS MIGRATORIUS', 'MARECA AMERICANA', 'CALLIPHLOX AMETHYSTINA',
       'CHLOEPHAGA MELANOPTERA', 'VANELLUS RESPLENDENS',
       'SPINUS SPINESCENS', 'ANHINGA ANHINGA', 'MAGUMMA PARVA',
       'CALYPTE ANNA', 'THAMNOPHILIDAE', 'EUPHONIA MUSICA',
       'HIMATIONE SANGUINEA', 'STRUTHIDEA CINEREA',
       'ANTILOPHIA BOKERMANNI', 'OCEANODROMA HOMOCHROA',
       'GEOKICHLA CINEREA', 'NIPPONIA NIPPON', 'EURYSTOMUS ORIENTALIS',
       'MEROPS ORIENTALIS', 'ANASTOMUS OSCITANS',

In [6]:
species = pd.DataFrame(species_array, columns=['species_name'])
species.tail()

Unnamed: 0,species_name
517,CALIDRIS ALPINA
518,CHIONIS ALBUS
519,SARKIDIORNIS MELANOTOS
520,ORTALIS CINEREICEPS
521,NOTHARCHUS PECTORALIS


In [7]:
# export the `species` dataframe to csv
species.to_csv('species_list.csv', index=False)

In [8]:
# open `species_list.csv` to check
bird_list = pd.read_csv('species_list.csv')
bird_list['species_name'].duplicated().sum() # no duplicates

0

### Species from dataframe that are either frequently seen, occasionally migrate through, or are rarer but can be spotted in various parts of Asia

- Merops orientalis (Little Green Bee-eater)
- Hirundo rustica (Barn Swallow)
- Sturnus vulgaris (Common Starling)
- Pica pica (Eurasian Magpie)
- Passer domesticus (House Sparrow)
- Columba livia (Rock Dove)
- Anas platyrhynchos (Mallard)
- Acridotheres tristis (Common Myna)
- Parus major (Great Tit)
- Aquila chrysaetos (Golden Eagle)
- Anas crecca (Common Teal)
- Falco peregrinus (Peregrine Falcon)
- Calidris alpina (Dunlin)

13 species in total

> need geographic distribution of each bird!!!

## Calling eBird API 

In [9]:
# Practise calling public API for information on the species