Skip to content

On Real Song Downloading. #7

@HildaNya

Description

@HildaNya

Hi SONICS team,
Thank you for the inspiring work and the detailed writing. We would like to use the SONICS dataset in our research on AI-generated music. However, downloading the YouTube videos for the “real songs” portion has become a significant bottleneck.
We understand that you cannot distribute the real audio due to copyright constraints. That said, could you kindly point us to any more efficient strategies for retrieval (e.g., recommended tools/configs, batching practices, or any workflow you found reliable)?
We are using the youtube_id column in real_songs.csv and running yt_dlp to download each ID. We are encountering two main issues: A non-trivial fraction of IDs return “unavailable” or restricted audio. For available IDs, downloads become extremely slow when performed at scale.
We greatly appreciate any advice. Thank you again for making the SONICS dataset available to the community!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions