# Data Analytics Capstone Project
### Beth Harvey
### October 22, 2023

# Introduction

As climate change, habitat loss, and disease continue to threaten wildlife throughout the world, detailed research is becoming more and more important. In 2023 alone, the U.S. Fish and Wildlife service took 21 species off of the Endangered Species Act list because they are now considered extinct [1]. Ten of those 21 species are birds. More research and resources are needed to develop effective conservation efforts. 

Some key components of bird conservation research is being able to accurately classify individual specimens, as well as being able to identify potential connections between different species. One of the most valuable resources for an effort like this is an extensive collection of detailed measurements for a wide variety of species.

## Project Goals

The end goal of this project is to build a model that can predict what family and/or species a bird falls into based on several physical measurements. This could help researchers more accurately classify individuals when two or more species are possible. The model will be trained and tested using the AVONET dataset [3], a collective effort by researchers and volunteers to provide measurements of over 90,000 specimens of over 11,000 bird species. 

## Data

The data used for this project is from a large, collective effort known as the AVONET dataset. It is a collection of physical measurements, habitat information, and geographic information for over 90,000 specimens representing over 11,000 species of birds [3]. The majority of individuals measured were museum specimens, though many live birds were included as well. The raw data can be downloaded as Microsoft Excel files from https://figshare.com/s/b990722d72a26b5bfead [2]. The physical measurements are described in the Metadata sheet of Supplementary dataset 1 from the link above, and they are also shown in the figure below [3].

![Illustration of bird showing measurements used](measurements-fig.jpg)

The measurements are:

   1. Beak length from tip to base of skull
   2. Beak length from front edge of nostril to tip
   3. Beak width from front edges of nostrils
   4. Beak depth from front edges of nostrils
   5. Lower leg (tarsus) length
   6. Wing length from bend to end of longest feather 
   7. Wing length from bend to end of outer secondary (shorter) feather
   8. Length from tip of outer secondary (shorter) feather to tip of longest feather
   9. Tail length [3]


The remaining measurements are the mass of the bird and the "hand-wing index," which is calculated using measurements 6 and 8. Information about the bird's habitat, migration pattern(s), eating habits, conservation status, and geographic location are also included [2]. The physical measurements and age, sex, and country are all from the AVONET Raw Data sheet of Supplemental dataset 1, and the geographic and behavior information come from the AVONET3 BirdTree sheet of the same file [2]. The physical measurements shown in the figure above and the mass and hand-wing index are the key features used for this model.

In [1]:
# Import modules
import pandas as pd

In [16]:
# Import data
avonet_raw = pd.read_csv('avonet_raw.csv')
birdtree = pd.read_csv('avonet3_birdtree.csv')

# Data Inspection and Cleaning

### Merging Datasets

In [5]:
# Get desired additional features from BirdTree file
birdtree_features = birdtree[['Species3', 'Family3', 'Order3', 'Habitat', 'Habitat.Density', 'Migration', 
                              'Trophic.Level', 'Trophic.Niche','Primary.Lifestyle']]

# Rename column to match avonet_raw column for merging
birdtree_features = birdtree_features.rename(columns = {'Species3': 'Species3_BirdTree', 'Family3': 'Family',
                                                       'Order3': 'Order'})

In [12]:
# Merge dataframes on the Avibase ID
avonet = pd.merge(avonet_raw, birdtree_features, how = 'left', on = 'Species3_BirdTree')
avonet.head()

Unnamed: 0,Avibase.ID,Species1_BirdLife,Species2_eBird,eBird.species.group,Species3_BirdTree,Data.type,Source,Specimen.number,Sex,Age,...,Protocol,Publication,Family,Order,Habitat,Habitat.Density,Migration,Trophic.Level,Trophic.Niche,Primary.Lifestyle
0,AVIBASE-B3F5E5E2,Abeillia abeillei,Abeillia abeillei,Abeillia abeillei,Abeillia abeillei,1.0,NHMUK,11.4.1890,F,0.0,...,1.0,,Trochilidae,Apodiformes,Forest,2.0,1.0,Herbivore,Nectarivore,Aerial
1,AVIBASE-B3F5E5E2,Abeillia abeillei,Abeillia abeillei,Abeillia abeillei,Abeillia abeillei,1.0,NHMUK,12.23.1890,F,0.0,...,1.0,,Trochilidae,Apodiformes,Forest,2.0,1.0,Herbivore,Nectarivore,Aerial
2,AVIBASE-B3F5E5E2,Abeillia abeillei,Abeillia abeillei,Abeillia abeillei,Abeillia abeillei,1.0,NHMUK,1913.3.20.972,M,0.0,...,1.0,,Trochilidae,Apodiformes,Forest,2.0,1.0,Herbivore,Nectarivore,Aerial
3,AVIBASE-B3F5E5E2,Abeillia abeillei,Abeillia abeillei,Abeillia abeillei,Abeillia abeillei,1.0,NHMUK,87.3.22.1574,M,0.0,...,1.0,,Trochilidae,Apodiformes,Forest,2.0,1.0,Herbivore,Nectarivore,Aerial
4,AVIBASE-B3F5E5E2,Abeillia abeillei,Abeillia abeillei,Abeillia abeillei,Abeillia abeillei,1.0,NHMUK,,M,0.0,...,0.0,,Trochilidae,Apodiformes,Forest,2.0,1.0,Herbivore,Nectarivore,Aerial


In [13]:
# View shape and size of dataset
avonet.shape
avonet.columns

Index(['Avibase.ID', 'Species1_BirdLife', 'Species2_eBird',
       'eBird.species.group', 'Species3_BirdTree', 'Data.type', 'Source',
       'Specimen.number', 'Sex', 'Age', 'Locality', 'Country_WRI', 'Country',
       'Beak.Length_Culmen', 'Beak.Length_Nares', 'Beak.Width', 'Beak.Depth',
       'Tarsus.Length', 'Wing.Length', 'Kipps.Distance', 'Secondary1',
       'Hand-wing.Index', 'Tail.Length', 'Measurer', 'Protocol', 'Publication',
       'Family', 'Order', 'Habitat', 'Habitat.Density', 'Migration',
       'Trophic.Level', 'Trophic.Niche', 'Primary.Lifestyle'],
      dtype='object')

In [14]:
avonet.describe(include='all')

Unnamed: 0,Avibase.ID,Species1_BirdLife,Species2_eBird,eBird.species.group,Species3_BirdTree,Data.type,Source,Specimen.number,Sex,Age,...,Protocol,Publication,Family,Order,Habitat,Habitat.Density,Migration,Trophic.Level,Trophic.Niche,Primary.Lifestyle
count,90370,90033,90005,90189,90303,90020.0,90013,73793,90020,90020.0,...,90020.0,3309,90303,90303,90140,90266.0,90260.0,90303,90298,90303
unique,11237,11020,10522,11228,9988,,80,71860,3,,...,,3,194,40,12,,,4,10,5
top,AVIBASE-F9D288FC,Zosterops chloris,Zosterops flavissimus,Zosterops flavissimus,Zosterops chloris,,NHMUK,NOLABEL,M,,...,,Cardona-Salazar et al. 2020 https://esajournal...,Tyrannidae,Passeriformes,Forest,,,Carnivore,Invertivore,Insessorial
freq,565,784,565,565,784,,46060,102,45523,,...,,3305,4760,63634,54298,,,49180,46260,57801
mean,,,,,,1.157487,,,,0.014097,...,0.570473,,,,,1.690637,1.290106,,,
std,,,,,,0.364262,,,,0.117891,...,0.495011,,,,,0.750236,0.632359,,,
min,,,,,,1.0,,,,0.0,...,0.0,,,,,1.0,1.0,,,
25%,,,,,,1.0,,,,0.0,...,0.0,,,,,1.0,1.0,,,
50%,,,,,,1.0,,,,0.0,...,1.0,,,,,2.0,1.0,,,
75%,,,,,,1.0,,,,0.0,...,1.0,,,,,2.0,1.0,,,


In [15]:
avonet.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90371 entries, 0 to 90370
Data columns (total 34 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Avibase.ID           90370 non-null  object 
 1   Species1_BirdLife    90033 non-null  object 
 2   Species2_eBird       90005 non-null  object 
 3   eBird.species.group  90189 non-null  object 
 4   Species3_BirdTree    90303 non-null  object 
 5   Data.type            90020 non-null  float64
 6   Source               90013 non-null  object 
 7   Specimen.number      73793 non-null  object 
 8   Sex                  90020 non-null  object 
 9   Age                  90020 non-null  float64
 10  Locality             0 non-null      float64
 11  Country_WRI          77245 non-null  object 
 12  Country              0 non-null      float64
 13  Beak.Length_Culmen   74222 non-null  float64
 14  Beak.Length_Nares    55738 non-null  float64
 15  Beak.Width           70324 non-null 

### References

1. Schuldheisz, C.: Fish and wildlife service delists 21 species from the endan-
gered species act due to extinction, https://www.fws.gov/press-release/2023-10/21-
species-delisted-endangered-species-act-due-extinction
2. Tobias: ”avonet: morphological, ecological and geographical data for all birds”.
https://figshare.com/s/b990722d72a26b5bfead (2022), accessed on October 18,
2023
3. Tobias, J.A., Sheard, C., Pigot, A.L., Devenish, A.J.M., Yang, J., Sayol, F., Neate-
Clegg, M.H.C., Alioravainen, N., Weeks, T.L., Barber, R.A., Walkden, P.A., Mac-
Gregor, H.E.A., Jones, S.E.I., Vincent, C., Phillips, A.G., Marples, N.M., Monta ̃no-
Centellas, F.A., Leandro-Silva, V., Claramunt, S., Darski, B., Freeman, B.G., Breg-
man, T.P., Cooney, C.R., Hughes, E.C., Capp, E.J.R., Varley, Z.K., Friedman,
N.R., Korntheuer, H., Corrales-Vargas, A., Trisos, C.H., Weeks, B.C., Hanz, D.M.,
T ̈opfer, T., Bravo, G.A., Remeˇs, V., Nowak, L., Carneiro, L.S., Moncada R., A.J.,
Matysiokov ́a, B., Baldassarre, D.T., Mart ́ınez-Salinas, A., Wolfe, J.D., Chapman,
P.M., Daly, B.G., Sorensen, M.C., Neu, A., Ford, M.A., Mayhew, R.J., Fabio Sil-
veira, L., Kelly, D.J., Annorbah, N.N.D., Pollock, H.S., Grabowska-Zhang, A.M.,
McEntee, J.P., Carlos T. Gonzalez, J., Meneses, C.G., Mu ̃noz, M.C., Powell, L.L.,
Jamie, G.A., Matthews, T.J., Johnson, O., Brito, G.R.R., Zyskowski, K., Crates,
R., Harvey, M.G., Jurado Zevallos, M., Hosner, P.A., Bradfer-Lawrence, T., Maley,
J.M., Stiles, F.G., Lima, H.S., Provost, K.L., Chibesa, M., Mashao, M., Howard,
J.T., Mlamba, E., Chua, M.A.H., Li, B., G ́omez, M.I., Garc ́ıa, N.C., P ̈ackert,
M., Fuchs, J., Ali, J.R., Derryberry, E.P., Carlson, M.L., Urriza, R.C., Brzeski,
Bird Classification Model 5
K.E., Prawiradilaga, D.M., Rayner, M.J., Miller, E.T., Bowie, R.C.K., Lafontaine,
R.M., Scofield, R.P., Lou, Y., Somarathna, L., Lepage, D., Illif, M., Neuschulz,
E.L., Templin, M., Dehling, D.M., Cooper, J.C., Pauwels, O.S.G., Analuddin, K.,
Fjelds ̊a, J., Seddon, N., Sweet, P.R., DeClerck, F.A.J., Naka, L.N., Brawn, J.D.,
Aleixo, A., B ̈ohning-Gaese, K., Rahbek, C., Fritz, S.A., Thomas, G.H., Schleuning,
M.: Avonet: morphological, ecological and geographical data for all birds. Ecology
Letters 25(3), 581–597 (2022). https://doi.org/https://doi.org/10.1111/ele.13898,
https://onlinelibrary.wiley.com/doi/abs/10.1111/ele.13898

In [None]:
# Display system data
from watermark import watermark
print(watermark())