|
55 | 55 | "2. The mixture annotation in JAMS format (detailed) and in a simpligied tabular format (python list or csv).\n",
|
56 | 56 | "3. The audio of each processed stem (or sound event) used to create the mixture.\n",
|
57 | 57 | "\n",
|
58 |
| - "<img src=\"https://www.justinsalamon.com/uploads/4/3/9/4/4394963/scaper-diagram_orig.png\">" |
| 58 | + "```{figure} ../images/data/scaper_diagram.png\n", |
| 59 | + "---\n", |
| 60 | + "height: 400px\n", |
| 61 | + "name: fig-scaper\n", |
| 62 | + "---\n", |
| 63 | + "Block diagram of automatic mixing via Scaper.\n", |
| 64 | + "```" |
59 | 65 | ]
|
60 | 66 | },
|
61 | 67 | {
|
|
64 | 70 | "source": [
|
65 | 71 | "### Read more\n",
|
66 | 72 | "\n",
|
67 |
| - "You can learn more about Scaper by reading the scaper-paper:\n", |
| 73 | + "You can learn more about Scaper by reading the scaper-paper: [Scaper: A library for soundscape synthesis and augmentation](http://www.justinsalamon.com/uploads/4/3/9/4/4394963/salamon_scaper_waspaa_2017.pdf)\n", |
68 | 74 | "\n",
|
69 |
| - "\n", |
70 |
| - "[Scaper: A library for soundscape synthesis and augmentation](http://www.justinsalamon.com/uploads/4/3/9/4/4394963/salamon_scaper_waspaa_2017.pdf)<br/>\n", |
71 |
| - "J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello<br/>\n", |
72 |
| - "In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, Oct. 2017.\n", |
| 75 | + "```\n", |
| 76 | + "@inproceedings{Salamon:Scaper:WASPAA:17,\n", |
| 77 | + " author = {Salamon, J. and MacConnell, D. and Cartwright, M. and Li, P. and Bello, J.~P.},\n", |
| 78 | + " title = {Scaper: A Library for Soundscape Synthesis and Augmentation},\n", |
| 79 | + " booktitle. = {IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},\n", |
| 80 | + " month = {Oct.},\n", |
| 81 | + " year = {2017},\n", |
| 82 | + " pages = {344--348}\n", |
| 83 | + " doi = {10.5281/zenodo.1117372},\n", |
| 84 | + " url = {https://doi.org/10.5281/zenodo.1117372}\n", |
| 85 | + "}\n", |
| 86 | + "```\n", |
73 | 87 | "\n",
|
74 | 88 | "Please cite this paper if you use Scaper in your work. You do not need to read the paper to complete this tutorial."
|
75 | 89 | ]
|
|
89 | 103 | },
|
90 | 104 | {
|
91 | 105 | "cell_type": "code",
|
92 |
| - "execution_count": 1, |
| 106 | + "execution_count": null, |
93 | 107 | "metadata": {},
|
94 | 108 | "outputs": [],
|
95 | 109 | "source": [
|
|
99 | 113 | "!pip install git+https://github.com/source-separation/tutorial"
|
100 | 114 | ]
|
101 | 115 | },
|
| 116 | + { |
| 117 | + "cell_type": "markdown", |
| 118 | + "metadata": {}, |
| 119 | + "source": [ |
| 120 | + "To keep the tutorial page clean, we'll hide Python warnings:" |
| 121 | + ] |
| 122 | + }, |
| 123 | + { |
| 124 | + "cell_type": "code", |
| 125 | + "execution_count": 1, |
| 126 | + "metadata": {}, |
| 127 | + "outputs": [], |
| 128 | + "source": [ |
| 129 | + "# To keep things clean we'll hide all warnings\n", |
| 130 | + "import warnings\n", |
| 131 | + "warnings.filterwarnings('ignore')" |
| 132 | + ] |
| 133 | + }, |
102 | 134 | {
|
103 | 135 | "cell_type": "markdown",
|
104 | 136 | "metadata": {},
|
|
184 | 216 | "metadata": {},
|
185 | 217 | "outputs": [],
|
186 | 218 | "source": [
|
187 |
| - "from pathlib import Path # utility for folder management\n", |
| 219 | + "from pathlib import Path\n", |
188 | 220 | "\n",
|
189 | 221 | "# create foreground folder\n",
|
190 | 222 | "fg_folder = Path('~/.nussl/ismir2020-tutorial/foreground').expanduser() \n",
|
191 | 223 | "fg_folder.mkdir(parents=True, exist_ok=True) \n",
|
192 | 224 | "\n",
|
193 | 225 | "# create background folder - we need to provide one even if we don't use it\n",
|
194 | 226 | "bg_folder = Path('~/.nussl/ismir2020-tutorial/background').expanduser()\n",
|
195 |
| - "bg_folder.mkdir(parents=True, exist_ok=True)\n", |
196 |
| - "\n", |
| 227 | + "bg_folder.mkdir(parents=True, exist_ok=True)" |
| 228 | + ] |
| 229 | + }, |
| 230 | + { |
| 231 | + "cell_type": "code", |
| 232 | + "execution_count": 8, |
| 233 | + "metadata": {}, |
| 234 | + "outputs": [], |
| 235 | + "source": [ |
197 | 236 | "# For each item (track) in the train set, iterate over its sources (stems),\n",
|
198 | 237 | "# create a folder for the stem if it doesn't exist already (drums, bass, vocals, other) \n",
|
199 | 238 | "# and place the stem audio file in this folder, using the song name as the filename\n",
|
|
210 | 249 | "cell_type": "markdown",
|
211 | 250 | "metadata": {},
|
212 | 251 | "source": [
|
213 |
| - "Now we have a folder called `foreground`, inside of which there are four stem folders: `bass`, `drums`, `vocals`, `other`, and inside each of these folders we have the audio files for all matching stems. I.e., in the `bass` folder we will have the bass stems from all the songs in the dataset, in the `drums` folder we'll have the drum stems from all songs, etc. Let's verify this:" |
| 252 | + "Now we have a folder called `foreground`, inside of which there are four stem folders: `bass`, `drums`, `vocals`, `other`, and inside each of these folders we have the audio files for all matching stems. I.e., in the `bass` folder we will have the bass stems from all the songs in the dataset, in the `drums` folder we'll have the drum stems from all songs, etc. We've renamed each stem file to the name of the song it belongs to. Let's verify this:" |
214 | 253 | ]
|
215 | 254 | },
|
216 | 255 | {
|
217 | 256 | "cell_type": "code",
|
218 |
| - "execution_count": 4, |
| 257 | + "execution_count": 23, |
219 | 258 | "metadata": {},
|
220 | 259 | "outputs": [
|
221 | 260 | {
|
222 | 261 | "name": "stdout",
|
223 | 262 | "output_type": "stream",
|
224 | 263 | "text": [
|
225 |
| - "drums\tfolder contains 94 audio files\n", |
226 |
| - "vocals\tfolder contains 94 audio files\n", |
227 |
| - "other\tfolder contains 94 audio files\n", |
228 |
| - "bass\tfolder contains 94 audio files\n" |
| 264 | + "\n", |
| 265 | + "drums\tfolder contains 94 audio files:\n", |
| 266 | + "\n", |
| 267 | + "\t\tA Classic Education - NightOwl.wav\n", |
| 268 | + "\t\tANiMAL - Clinic A.wav\n", |
| 269 | + "\t\tANiMAL - Easy Tiger.wav\n", |
| 270 | + "\t\tANiMAL - Rockshow.wav\n", |
| 271 | + "\t\tActions - Devil's Words.wav\n", |
| 272 | + "\t\t...\n", |
| 273 | + "\n", |
| 274 | + "vocals\tfolder contains 94 audio files:\n", |
| 275 | + "\n", |
| 276 | + "\t\tA Classic Education - NightOwl.wav\n", |
| 277 | + "\t\tANiMAL - Clinic A.wav\n", |
| 278 | + "\t\tANiMAL - Easy Tiger.wav\n", |
| 279 | + "\t\tANiMAL - Rockshow.wav\n", |
| 280 | + "\t\tActions - Devil's Words.wav\n", |
| 281 | + "\t\t...\n", |
| 282 | + "\n", |
| 283 | + "other\tfolder contains 94 audio files:\n", |
| 284 | + "\n", |
| 285 | + "\t\tA Classic Education - NightOwl.wav\n", |
| 286 | + "\t\tANiMAL - Clinic A.wav\n", |
| 287 | + "\t\tANiMAL - Easy Tiger.wav\n", |
| 288 | + "\t\tANiMAL - Rockshow.wav\n", |
| 289 | + "\t\tActions - Devil's Words.wav\n", |
| 290 | + "\t\t...\n", |
| 291 | + "\n", |
| 292 | + "bass\tfolder contains 94 audio files:\n", |
| 293 | + "\n", |
| 294 | + "\t\tA Classic Education - NightOwl.wav\n", |
| 295 | + "\t\tANiMAL - Clinic A.wav\n", |
| 296 | + "\t\tANiMAL - Easy Tiger.wav\n", |
| 297 | + "\t\tANiMAL - Rockshow.wav\n", |
| 298 | + "\t\tActions - Devil's Words.wav\n", |
| 299 | + "\t\t...\n" |
229 | 300 | ]
|
230 | 301 | }
|
231 | 302 | ],
|
232 | 303 | "source": [
|
233 | 304 | "import os\n",
|
| 305 | + "import glob\n", |
234 | 306 | "\n",
|
235 | 307 | "for folder in os.listdir(fg_folder):\n",
|
236 | 308 | " if folder[0] != '.': # ignore system folders\n",
|
237 | 309 | " stem_files = os.listdir(os.path.join(fg_folder, folder))\n",
|
238 |
| - " print(f\"{folder}\\tfolder contains {len(stem_files)} audio files\")" |
| 310 | + " print(f\"\\n{folder}\\tfolder contains {len(stem_files)} audio files:\\n\")\n", |
| 311 | + " for sf in sorted(stem_files)[:5]:\n", |
| 312 | + " print(f\"\\t\\t{sf}\")\n", |
| 313 | + " print(\"\\t\\t...\")" |
239 | 314 | ]
|
240 | 315 | },
|
241 | 316 | {
|
|
334 | 409 | "Next we need to add stems to our mixture. In Scaper we do this by adding \"events\", using the `add_event` function.\n",
|
335 | 410 | "\n",
|
336 | 411 | "For each event that we add we specify the following:\n",
|
337 |
| - "* `label`: the type of event (in our case drums, bass, vocals or other)\n", |
| 412 | + "* `label`: the type of event (in our case `drums`, `bass`, `vocals` or `other`)\n", |
338 | 413 | "* `source_file`: which audio file to use from all files matching the provided label\n",
|
339 | 414 | "* `source_time`: time offset for sampling the stem audio file, i.e., where to start in the source audio\n",
|
340 | 415 | "* `event_time`: offset for the start time of the event in the generated mixture\n",
|
|
398 | 473 | "Now that we have added events to our Scaper object, we can call `sc.generate()`: this will \"instatiate\" (sample concrete values from) the specification and use them to generate a mixture. Each call to `sc.generate()` will create different instatiation of the events' parameters and thus generate a different mixture."
|
399 | 474 | ]
|
400 | 475 | },
|
401 |
| - { |
402 |
| - "cell_type": "code", |
403 |
| - "execution_count": 8, |
404 |
| - "metadata": {}, |
405 |
| - "outputs": [], |
406 |
| - "source": [ |
407 |
| - "# To keep things clean we'll hide all warnings\n", |
408 |
| - "import warnings\n", |
409 |
| - "warnings.filterwarnings('ignore')" |
410 |
| - ] |
411 |
| - }, |
412 | 476 | {
|
413 | 477 | "cell_type": "code",
|
414 | 478 | "execution_count": 9,
|
|
1455 | 1519 | "\n",
|
1456 | 1520 | "### When do we want incoherent mixing?\n",
|
1457 | 1521 | "\n",
|
1458 |
| - "Incoherent mixing is clearly not representative of real-world recorded music. So why do we want to mix together stems in this way?\n", |
| 1522 | + "Incoherent mixing is clearly not representative of real-world recorded music. So why do we want to mix together stems in this way? It turns out, incoherent mixing is an important **data augmentation** technique when we train a one-vs-all (OVA) source separation network. \n", |
1459 | 1523 | "\n",
|
1460 |
| - "It turns out, incoherent mixing is an important **data augmentation** technique when we train a one-vs-all (OVA) source separation network. An OVA source separation model is trained to _only_ separate one source from the mixture and ignore all other sources. In contrast to OVA systems, multisource networks are trained to separate and output multiple sources simultaneously.\n", |
| 1524 | + "**A one-vs-all (OVA) source separation model** is trained to _only_ separate one source from the mixture and ignore all other sources. In contrast to OVA systems, multisource networks are trained to separate and output multiple sources simultaneously.\n", |
1461 | 1525 | "\n",
|
1462 | 1526 | "When we train an OVA network with incoherent mixing, we're teaching the network to **\"ignore everything you hear except my source\"**.\n",
|
1463 | 1527 | "\n",
|
|
1491 | 1555 | "2. Create a Scaper object\n",
|
1492 | 1556 | "3. Set the sample rate, reference dB, and channels (mono)\n",
|
1493 | 1557 | "4. Define a template of probabilistic event parameters\n",
|
1494 |
| - "5. Instatiate the template to randomly choose a song, a start time for the sources, a pitch shift and a time stretch\n", |
1495 |
| - "6. Reset the event specficiation, removing the added event\n", |
| 1558 | + "5. Add the template event and *instantiate* it (=sample concrete values) to randomly choose a song `source_file`, a start time for the sources `source_time`, a pitch shift value and a time stretch value.\n", |
| 1559 | + "6. Reset the Scaper's event specficiation\n", |
1496 | 1560 | "7. Replace the distributions for source time, pitch shift and time stretch in the template with the constant values we just sampled\n",
|
1497 |
| - "8. Iterate over the four stems (vocals, drums, bass, other) and add COHERENT events\n", |
| 1561 | + "8. Iterate over the four stems (vocals, drums, bass, other) and add COHERENT stems.\n", |
| 1562 | + " * **By keeping the `source_file` path fixed except for changing the parent folder (`voice`, `drums`, `bass`, or `other`) we ensure all stems in the mixture come from the same song. This is critical for achieving a COHERENT mix.**\n", |
| 1563 | + "\n", |
1498 | 1564 | "\n",
|
1499 | 1565 | "```{note}\n",
|
1500 |
| - "To ensure coherent source files (all from the same song) in step 8, we leverage the fact that all the stems from the same song have the same filename. All we have to do is replace the sampled source file's parent folder name to the label of the stem being added in each iteration of the loop, which will give the correct path to the stem source file for that label.\n", |
| 1566 | + "To ensure coherent source files (all from the same song) in step 8, we leverage the fact that all the stems from the same song have the same filename. All we have to do is replace the sampled source file's parent folder name to the label of the stem being added in each iteration of the loop (voice, drums, bass, or other), which will give the correct path to the stem source file for that label.\n", |
1501 | 1567 | "```"
|
1502 | 1568 | ]
|
1503 | 1569 | },
|
|
1550 | 1616 | " \n",
|
1551 | 1617 | "# 7. Replace the distributions for source time, pitch shift and\n",
|
1552 | 1618 | "# time stretch with the constant values we just sampled, to \n",
|
1553 |
| - "# ensure our added events (stems) are coherent. \n", |
| 1619 | + "# ensure our added events (stems) are coherent. \n", |
| 1620 | + "# NOTE: the source_file has also been sampled, and we'll keep\n", |
| 1621 | + "# the sampled file to denote which song we'll be mixing.\n", |
1554 | 1622 | "event_parameters['source_time'] = ('const', event.source_time)\n",
|
1555 | 1623 | "event_parameters['pitch_shift'] = ('const', event.pitch_shift)\n",
|
1556 | 1624 | "event_parameters['time_stretch'] = ('const', event.time_stretch)\n",
|
|
0 commit comments