# Official Documentation

From the official Xeno Canto documentation:


> Use the grp tag to narrow down your search to a specific group. This tag is particularly useful in combination with one of the other tags. Valid group values are birds, grasshoppers and bats. You can also use their respective ids (1 to 3), so grp:2 will restrict your search to grasshoppers. Soundscapes are a special case, as these recordings may include multiple groups. Use grp:soundscape or grp:0 to search these.

> Genus is part of a species' scientific name, so it is searched by default when performing a basic search (as mentioned above). But you can use the gen tag to limit your search query only to the genus field. So gen:zonotrichia will find all recordings of sparrows in the genus Zonotrichia. Similarly, ssp can be used to search for subspecies. These fields use a 'starts with' rather than 'contains' query and accept a 'matches' operator.

## Query API by Bird Genus

In [3]:
import os
import requests
import pandas as pd

In [4]:
# Fetch sounds from xeno-canto
def fetch_sounds(bird_name):
    sounds = requests.get(f'https://xeno-canto.org/api/2/recordings?query=grp:birds+gen:{bird_name}')
    sounds = sounds.json()
    num_pages = sounds['numPages']
    i = 1
    all_results = []
    while i <= num_pages:
        sounds = requests.get(f'https://xeno-canto.org/api/2/recordings?query=grp:birds+gen:{bird_name}&page={i}')
        all_results.append(sounds.json())
        i += 1
    
    all_recordings = []
    for sound in all_results:
        for recording in sound['recordings']:
            all_recordings.append(recording)

    return all_recordings


# Fetch sounds
hawk_sounds = fetch_sounds('buteo')
heron_sounds = fetch_sounds('ardea')
goose_sounds = fetch_sounds('branta')

# Create data frames
goose_sounds_df = pd.DataFrame(goose_sounds)
heron_sounds_df = pd.DataFrame(heron_sounds)
hawk_sounds_df = pd.DataFrame(hawk_sounds)

# Apply labels
hawk_sounds_df['label'] = 'hawk'
goose_sounds_df['label'] = 'goose'
heron_sounds_df['label'] = 'heron'

recordings_df = pd.concat([goose_sounds_df, heron_sounds_df, hawk_sounds_df])

In [26]:
# Save recordings to file
recordings_df.to_csv('data/recordings.csv', index=False)

In [27]:
recordings_df[['id', 'gen', 'en', 'cnt', 'label']]

Unnamed: 0,id,gen,en,cnt,label
0,936105,Branta,Brant Goose,United Kingdom,goose
1,934302,Branta,Brant Goose,Sweden,goose
2,906056,Branta,Brant Goose,Finland,goose
3,898133,Branta,Brant Goose,Ireland,goose
4,882013,Branta,Brant Goose,France,goose
...,...,...,...,...,...
2578,296984,Buteo,Jackal Buzzard,South Africa,hawk
2579,279894,Buteo,Jackal Buzzard,South Africa,hawk
2580,62422,Buteo,Jackal Buzzard,South Africa,hawk
2581,397385,Buteo,Jackal Buzzard,South Africa,hawk


In [None]:
recordings_df[['label']].value_counts()

## Fetch MP3 Data and Save Locally

In [14]:
labels = ['heron']
sounds = [heron_sounds]

for i in range(len(labels)):
    recordings = sounds[i]
    for j in range(len(recordings)):
        file_str = recordings[j]['file']
        if file_str == '':
            continue
        bird_sound = requests.get(file_str)
        id = recordings[j]['id']
        p = f'data/{labels[i]}_{id}.mp3'
        with open(p, 'wb') as f:
            f.write(bird_sound.content)

In [19]:
# List files in data directory
data_files = os.listdir('data')
num_geese = len([f for f in data_files if f.startswith('goose')])
num_herons = len([f for f in data_files if f.startswith('heron')])
num_hawks = len([f for f in data_files if f.startswith('hawk')])
print(f'Num saved Goose Sounds: {num_geese}')
print(f'Num saved Hawk Sounds: {num_hawks}')
print(f'Num saved Heron Sounds: {num_herons}')

Num Goose Sounds: 1635
Num Hawk Sounds: 2583
Num Heron Sounds: 1982
