-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Hi SONICS team,
Thank you for the inspiring work and the detailed writing. We would like to use the SONICS dataset in our research on AI-generated music. However, downloading the YouTube videos for the “real songs” portion has become a significant bottleneck.
We understand that you cannot distribute the real audio due to copyright constraints. That said, could you kindly point us to any more efficient strategies for retrieval (e.g., recommended tools/configs, batching practices, or any workflow you found reliable)?
We are using the youtube_id column in real_songs.csv and running yt_dlp to download each ID. We are encountering two main issues: A non-trivial fraction of IDs return “unavailable” or restricted audio. For available IDs, downloads become extremely slow when performed at scale.
We greatly appreciate any advice. Thank you again for making the SONICS dataset available to the community!