Introduction:
Birdwatching, and associated activities such as identifying birds, has been a common hobby fr centuries. In just February of this year, the world record for most birds having been seen across the world, 10 000 birds, was broken. A new bird species, the Cinnyris infrenatus, was found as recently as 2022 and there will never be a guarantee that humans have identified all bird species. If non-experts come across a bird which happens to be an unknown species, the only form of evidence may be a picture. From these pictures we would likely be able to identify dimensions such as the bird's wingspan, beak size and more. 
Thus, our question is:
Based on an unknown bird's dimentions, what family does the given bird belong to?

We will be using the AVONET database, compiled by Tobias et al. (2022). It is compiled in an excel sheet. We will primarily be focusing on page 2 of the excel sheet "AVONET1_BirdLife". This dataset describes more than 90 000 individual birds form over 11 000 species. The individuals are grouped into species and each observation includes the order and family of each species of bird as well as the average beak length, width and depth, wingspan, tarsus length, kipps distance, tail length and mass among all recorded individuals of the species. There is also location information in the form of the average coordinates at which the birds were found as well as the habitat in which they inhabited. 

Tobias, J. A., Sheard, C., Pigot, A. L., Devenish, A. J., Yang, J., Sayol, F., Neate‐Clegg, M. H., Alioravainen, N., Weeks, T. L., Barber, R. A., Walkden, P. A., MacGregor, H. E., Jones, S. E., Vincent, C., Phillips, A. G., Marples, N. M., Montaño‐Centellas, F. A., Leandro‐Silva, V., Claramunt, S., … Schleuning, M. (2022). Avonet: Morphological, ecological and geographical data for all birds. Ecology Letters, 25(3), 571–707. https://doi.org/10.1111/ele.13898 

Methods:

Expected outcomes and significance:
We expect to find a strong correlation between bird dimensions and their family, even across different species. Should the dimensions themselves not have strong correlation with their families, we expect the ratios of these dimensions should also show a strong relationship between species of the same family. From this we will be able to predict the family of unknown birds based only on their dimensions. 

If non-experts come across new bird species, it could be vital to be able to identify the bird's close biological relatives (i.e the bird's family on the tree of life) for scientists to identify important biological traits without being present to study them. As mentioned above, it is likely evidence will only be able to exist in the form of an image and so more specific behavioural data may not be able to be taken. 

If our hypothesis is proven, then it could lead to questions about how or why certain species, despite living on opposite sides of the world, develop similar features.

In [None]:
library(repr)
library(tidyverse)
library(tidymodels)
library(readxl)
library(openxlsx)
options(repr.matrix.max.rows = 10)

In [None]:
url1<-'https://github.com/danizenarosa/dsci-100-grp43/blob/main/AVONET%20Supplementary%20dataset%201.xlsx'
p1f <- tempfile()
download.file(url1, p1f, mode="wb")
p1<-read_excel(path = p1f, sheet = 2)
p1

In [None]:
#url <- "https://github.com/danizenarosa/dsci-100-grp43/blob/main/AVONET%20Supplementary%20dataset%201.xlsx"
#read_excel(url, sheet = "AVONET1_BirdLife")

#bird_data <- read_excel("data/birds.xlsx", sheet = "AVONET1_BirdLife")
#bird_data

In [None]:
library(httr)
packageVersion("readxl")
# [1] ‘0.1.1’
url = "https://github.com/danizenarosa/dsci-100-grp43/raw/main/data/birds.xlsx"
GET(url, write_disk(tf <- tempfile(fileext = ".xlsx")))
df <- read_excel(tf, 2)
df

In [None]:
names(df)

In [None]:
bird_count <- group_by(bird_data, Family1) |>
    summarize(count = n()) |>
    arrange(count = desc(count)) |>
    slice(1:10)
bird_count