# Official Documentation

From the official Xeno Canto documentation:


> Use the grp tag to narrow down your search to a specific group. This tag is particularly useful in combination with one of the other tags. Valid group values are birds, grasshoppers and bats. You can also use their respective ids (1 to 3), so grp:2 will restrict your search to grasshoppers. Soundscapes are a special case, as these recordings may include multiple groups. Use grp:soundscape or grp:0 to search these.

> Genus is part of a species' scientific name, so it is searched by default when performing a basic search (as mentioned above). But you can use the gen tag to limit your search query only to the genus field. So gen:zonotrichia will find all recordings of sparrows in the genus Zonotrichia. Similarly, ssp can be used to search for subspecies. These fields use a 'starts with' rather than 'contains' query and accept a 'matches' operator.

## Query API by Bird Genus

In [1]:
import numpy as np
import requests
import json
import pandas as pd

In [2]:
# Fetch sounds from xeno-canto
hawk_sounds = requests.get('https://xeno-canto.org/api/2/recordings?query=grp:birds+gen:buteo')
hawk_sounds = hawk_sounds.json()

heron_sounds = requests.get('https://xeno-canto.org/api/2/recordings?query=grp:birds+gen:ardea')
heron_sounds = heron_sounds.json()

goose_sounds = requests.get('https://xeno-canto.org/api/2/recordings?query=grp:birds+gen:branta')
goose_sounds = goose_sounds.json()

# Create data frames
goose_sounds_df = pd.DataFrame(goose_sounds['recordings'])
heron_sounds_df = pd.DataFrame(heron_sounds['recordings'])
hawk_sounds_df = pd.DataFrame(hawk_sounds['recordings'])

# Apply labels
hawk_sounds_df['label'] = 'hawk'
goose_sounds_df['label'] = 'goose'
heron_sounds_df['label'] = 'heron'

recordings_df = pd.concat([goose_sounds_df, heron_sounds_df, hawk_sounds_df])

print("Num hawk sounds: ", hawk_sounds['numRecordings'])
print("Num heron sounds: ", heron_sounds['numRecordings'])
print("Num goose sounds: ", goose_sounds['numRecordings'])

Num hawk sounds:  2583
Num heron sounds:  2788
Num goose sounds:  1635


In [3]:
# Save recordings to file
recordings_df.to_csv('data/recordings.csv', index=False)

## Fetch MP3 Data and Save Locally

In [2]:
labels = ['hawk', 'heron', 'goose']
sounds = [hawk_sounds, heron_sounds, goose_sounds]

batch_size = 20
batch = []

for i in range(len(labels)):
    recordings = sounds[i]['recordings']
    for j in range(len(recordings)):
        bird_sound = requests.get(recordings[j]['file'])
        batch.append((f'data/{labels[i]}_{j+1}.mp3', bird_sound.content))

        # Save batches of downloaded files
        if len(batch) >= batch_size:
            for file_name, content in batch:
                with open(file_name, 'wb') as f:
                    f.write(content)
            batch.clear() 

# Save any remaining files in the last batch
if batch:
    for file_name, content in batch:
        with open(file_name, 'wb') as f:
            f.write(content)