Skip to content

Commit c93e70b

Browse files
author
Ethan Manilow
committed
Small fixes
1 parent 3942d90 commit c93e70b

File tree

10 files changed

+87
-46
lines changed

10 files changed

+87
-46
lines changed

book/approaches/deep/building_blocks.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -638,11 +638,11 @@ When computing losses with spectrograms, we compare the spectrogram
638638
of the true source to the input spectrogram with the network's mask
639639
applied. Given some ground truth STFT for source $i$
640640
$S_i \in \mathbb{C}^{F\times T}$, an input
641-
mixture $X \in \mathbb{C}^{F\times T}$, and a net's estimated
641+
mixture $Y \in \mathbb{C}^{F\times T}$, and a net's estimated
642642
mask $\hat{M}_i \in \mathbb{R}^{F\times T}$ we compute the loss like
643643

644644
$$
645-
\mathcal{L}_{\text{spec}} = \Big\| S_i - \hat{M}_i \odot |X| \Big\|_p,
645+
\mathcal{L}_{\text{spec}} = \Big\| |S_i| - \hat{M}_i \odot |Y| \Big\|_p,
646646
$$
647647

648648
where$\odot$ denotes element-wise product adn $p$ is the _norm_ of
@@ -661,7 +661,7 @@ the _Magnitude Spectrum Approximation_ or MSA {cite}`weninger2014discriminativel
661661
This is just the same equation as above unmodified:
662662

663663
$$
664-
\text{MSA} = |S_i| - \hat{M}_i \odot |X|
664+
\text{MSA} = |S_i| - \hat{M}_i \odot |Y|
665665
$$
666666

667667

@@ -671,16 +671,16 @@ the phase data by including it in our target calculation like so
671671

672672

673673
$$
674-
\text{tPSA} = \hat{M}_{i} \odot |X| - \operatorname{T}_{0}^{|X|}\left(|S_i| \odot \cos(\angle S_i - \angle X)\right)
674+
\text{tPSA} = \hat{M}_{i} \odot |Y| - \operatorname{T}_{0}^{|Y|}\left(|S_i| \odot \cos(\angle S_i - \angle Y)\right)
675675
$$
676676

677677

678678
where $\angle S_i$ is the true
679-
phase of Source i, $\angle X$ is the mixture phase, and
680-
$\operatorname{T}_{0}^{|X|}(x)= \min(\max(x,0),|X|)$ is a truncation
679+
phase of Source i, $\angle Y$ is the mixture phase, and
680+
$\operatorname{T}_{0}^{|Y|}(x)= \min(\max(x,0),|Y|)$ is a truncation
681681
function ensuring the target can be reached with a sigmoid activation function.
682682
Specifically, we incorporate constructive and destructive interference
683-
of the source and mixture into the target with the term $\cos(\angle S_i - \angle X)$.
683+
of the source and mixture into the target with the term $\cos(\angle S_i - \angle Y)$.
684684

685685

686686
```{tip}

book/basics/evaluation.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@
6666
"\\text{SAR} := 10 \\log_{10} \\left( \\frac{\\| s_{\\text{target}} + e_{\\text{interf}} + e_{\\text{noise}} \\|^2}{ \\| e_{\\text{artif}} \\|^2} \\right)\n",
6767
"$$\n",
6868
"\n",
69-
"This is usually interpreted as the amount of unwanted \\text{artif}acts a source \n",
69+
"This is usually interpreted as the amount of unwanted artifacts a source \n",
7070
"estimate has with relation to the true source.\n",
7171
"\n",
7272
"\n",
@@ -81,7 +81,7 @@
8181
"[\"bleed\", or \"leakage\"](https://en.wikipedia.org/wiki/Spill_(audio)). \n",
8282
"\n",
8383
"\n",
84-
"**Source-to-Interference Ratio (SIR)**\n",
84+
"**Source-to-Distortion Ratio (SDR)**\n",
8585
"\n",
8686
"$$\n",
8787
"\\text{SDR} := 10 \\log_{10} \\left( \\frac{\\| s_{\\text{target}} \\|^2}{ \\| e_{\\text{interf}} + e_{\\text{noise}} + e_{\\text{artif}} \\|^2} \\right)\n",
@@ -94,7 +94,7 @@
9494
"\n",
9595
"```{note}\n",
9696
"As of this writing (October 2020), the best reported SDR for singing\n",
97-
"voice separation is $7.24 dB$. {cite}`takahashi2020d3net`\n",
97+
"voice separation on MUSDB18 is $7.24 dB$. {cite}`takahashi2020d3net`\n",
9898
"```\n",
9999
"\n",
100100
"\n",

book/basics/phase.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@ the source estimation.
1818
alt: Phase is an important component of sound.
1919
name: circle_phase
2020
---
21-
An audio signal's phase is fundamental to representing the signal.
21+
Phase is the instantaneous amplitude of an audio signal. Phase is a fundamental part of representing
22+
the signal.
23+
Adapted from [Wikimedia](https://commons.wikimedia.org/wiki/File:Phase_shifter_using_IQ_modulator.gif).
2224
```
2325

2426
An audio signal, $y(t)$, composed of exactly one sine wave,
@@ -92,19 +94,19 @@ than at the lower frequencies.
9294
alt: Phase is sensitive to frequency and its initial starting point.
9395
name: phase_sensitivity
9496
---
95-
Getting a snapshot of the phase (the black dotted vertical line) is very
97+
Getting a snapshot of the phase (the black dotted vertical lines) is very
9698
sensitive to the frequencies and initial phases of the sine waves. This
9799
is similar to what happens when take an STFT: many snapshots of sine waves
98100
with many frequencies and initial phase offsets.
99101
```
100102

101103

102-
The gif above shows two sine waves. They both start at A440, or 440 Hz. But then the bottom one
103-
gradually changes frequency up an octave higher (880 Hz). The dotted black
104-
line shows a shapshot of the phase as the frequency changes. The initial phase also changes
105-
in the interval $[0.0, 2\pi]$. Notice how sensitive the snapshot is to changes
106-
in the frequency and initial phase.
107-
104+
The gif above shows a sine wave with varying frequency and initial phase.
105+
The frequency starts at A440, or 440 Hz and gradually changes frequency up an octave higher (880
106+
Hz). The initial phase also changes in the interval $[0.0, 2\pi]$.
107+
The dotted black lines show two shapshots of the value of the sine wave as the frequency and
108+
initial phase both change.
109+
Notice how sensitive the snapshot are to changes in the frequency and initial phase.
108110

109111
Another big difficulty when dealing with phase is that humans do not always
110112
perceive phase differences, _i.e._,

book/basics/representations.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -313,7 +313,8 @@ STFT, $\log{|X|^2} \in \mathbb{R}^{T \times F}$.
313313

314314
```{tip}
315315
Even though it is hard to visualize the detail in a magnitude or power spectrogram,
316-
most source separation algorithms work completely fine on these representations.
316+
some source separation algorithms work completely fine on these representations, while
317+
some need log spectrograms. Make sure to set your spectrograms correctly!
317318
```
318319

319320

@@ -333,6 +334,7 @@ is being discussed when possible.
333334
---
334335
alt: A visual comparison of linear-scaled vs mel-spaced y axies.
335336
name: mel_spectrograms
337+
scale: 35%
336338
---
337339
A visual comparison of linear-scaled vs mel-spaced y axies.
338340
Lower frequencies have a larger representation in a mel-spaced spectrogram.

book/data/musdb18.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -520,4 +520,4 @@
520520
},
521521
"nbformat": 4,
522522
"nbformat_minor": 4
523-
}
523+
}

book/first_steps/byo_hpss.ipynb

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -238,7 +238,7 @@
238238
"cell_type": "markdown",
239239
"metadata": {},
240240
"source": [
241-
"And, as always, we can make an interactive version of this. Try whistling and clapping\n",
241+
"And, as always, we can make an interactive version of this. Try recording yourself whistling and clapping\n",
242242
"at the same time and see how the results sound!"
243243
]
244244
},
@@ -272,6 +272,38 @@
272272
"my_hpss.interact(share=True, source='microphone')"
273273
]
274274
},
275+
{
276+
"cell_type": "markdown",
277+
"metadata": {},
278+
"source": [
279+
"If you want to upload a song, you can also remove `source='microphone'` in the `interact()` call:"
280+
]
281+
},
282+
{
283+
"cell_type": "code",
284+
"execution_count": 1,
285+
"metadata": {},
286+
"outputs": [
287+
{
288+
"ename": "NameError",
289+
"evalue": "name 'my_hpss' is not defined",
290+
"output_type": "error",
291+
"traceback": [
292+
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
293+
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
294+
"\u001b[0;32m<ipython-input-1-af6b4bc55694>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;31m# interactively in Colab or Jupyter Notebook\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mmy_hpss\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minteract\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mshare\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
295+
"\u001b[0;31mNameError\u001b[0m: name 'my_hpss' is not defined"
296+
]
297+
}
298+
],
299+
"source": [
300+
"%%capture\n",
301+
"# Comment out the line above to run this cell\n",
302+
"# interactively in Colab or Jupyter Notebook\n",
303+
"\n",
304+
"my_hpss.interact(share=True)"
305+
]
306+
},
275307
{
276308
"cell_type": "markdown",
277309
"metadata": {},

book/first_steps/repetition.ipynb

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -419,9 +419,12 @@
419419
"outputs": [],
420420
"source": [
421421
"# Will make AudioSignal objects after we're run the algorithm\n",
422+
"repet = nussl.separation.primitive.Repet(mix)\n",
423+
"repet.run()\n",
422424
"repet_bg, repet_fg = repet.make_audio_signals()\n",
423425
"\n",
424426
"# Will run the algorithm and return AudioSignals in one step\n",
427+
"repet = nussl.separation.primitive.Repet(mix)\n",
425428
"repet_bg, repet_fg = repet()"
426429
]
427430
},

book/images/basics/circle_phase.gif

-43.5 KB
Loading
-2.47 MB
Loading

common/image_maker.py

Lines changed: 27 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,7 @@ def plt_set(t):
284284

285285
# Setting the y axis ticks at (0, 180, 360, 540, 720) degree phase
286286
ax2.set_xticks([0, 180, 360, 540, 720])
287+
# ax2.set_xlim(0, 720)
287288

288289
# Setting the position of the x and y axis
289290
ax2.spines['left'].set_position(('axes', 0.045))
@@ -307,8 +308,8 @@ def plt_set(t):
307308
0.0174533 * t)
308309

309310
# plotting I+Q arrow that moves along to show the current phase
310-
ax2.arrow(t, 0, 0, c1, length_includes_head='True', head_width=10, head_length=0.07,
311-
color='g')
311+
# ax2.arrow(t, 0, 0, c1, length_includes_head='True', head_width=10, head_length=0.07,
312+
# color='g')
312313

313314
# plotting I and Q amplitude arrows at position 180° and 90° respectively
314315
# ax2.arrow(180, 0, 0, 1 * np.cos(0.0174533 * t) * np.cos(0.0174533 * 180),
@@ -340,54 +341,55 @@ def phase_intersect():
340341
def make_frame(f2, phi_):
341342
plt.style.use('seaborn')
342343

343-
fig = plt.figure(figsize=(10, 5))
344+
fig = plt.figure(figsize=(9, 3))
344345

345346
max_t = 0.01
346347
time = np.linspace(0.0, max_t, 2000)
347-
intersect = max_t / 2
348-
f1 = 440.0 # A440
349-
sin1 = np.sin(2 * np.pi * f1 * time)
348+
intersect1 = max_t / 3
349+
intersect2 = intersect1 * 2
350350

351351
sin2 = np.sin(2 * np.pi * f2 * time + phi_)
352352

353-
plt.subplot(211)
354-
plt.plot(time, sin1)
355-
plt.axvline(x=intersect, ls='--', color='black', lw=1.0)
356-
sin1_val = np.sin(2 * np.pi * f1 * intersect)
357-
plt.text(intersect + 0.0001, 0, f'{sin1_val:+0.2f}')
358-
plt.title(f'Frequency {f1:0.2f} Hz, Initial Phase 0.00' + r'$\pi$')
359-
plt.ylabel('Amplitude')
360-
plt.ylim([-1.1, 1.1])
361-
plt.xlim([-0.00025, 0.01025])
353+
props = dict(boxstyle='round', facecolor='wheat', alpha=1.0)
362354

363-
plt.subplot(212)
364355
plt.plot(time, sin2, 'g')
365-
plt.axvline(x=intersect, ls='--', color='black', lw=1.0)
366-
sin2_val = np.sin(2 * np.pi * f2 * intersect + phi_)
367-
plt.text(intersect + 0.0001, 0, f'{sin2_val:+0.2f}')
356+
plt.axvline(x=intersect1, ls='--', color='black', lw=1.0)
357+
sin2_val1 = np.sin(2 * np.pi * f2 * intersect1 + phi_)
358+
plt.text(intersect1 - 0.00045, -1.275, 'Snapshot 1', bbox=props)
359+
plt.gcf().text(0.85, 0.65, 'Value at:')
360+
plt.gcf().text(0.85, 0.55, f'Snapshot 1 = {sin2_val1:+0.2f}')
361+
362+
plt.axvline(x=intersect2, ls='--', color='black', lw=1.0)
363+
sin2_val2 = np.sin(2 * np.pi * f2 * intersect2 + phi_)
364+
plt.text(intersect2 - 0.00045, -1.275, 'Snapshot 2', bbox=props)
365+
plt.gcf().text(0.85, 0.45, f'Snapshot 2 = {sin2_val2:+0.2f}')
366+
368367
plt.title(f'Frequency {f2:0.2f} Hz, Initial Phase {phi_ / np.pi:0.2f}' + r'$\pi$')
369368
plt.ylabel('Amplitude')
370369
plt.xlabel('Time (s)')
371-
plt.ylim([-1.1, 1.1])
372-
plt.xlim([-0.00025, 0.01025])
370+
plt.ylim([-1.4, 1.1])
371+
plt.xlim([0.0, 0.01])
372+
plt.subplots_adjust(right=0.85)
373373

374374
for ax in fig.axes:
375375
ax.label_outer()
376+
plt.tight_layout(rect=[0, 0, .85, 1.0])
376377
# plt.show()
377378

378379
# f2 = 523.25 # C above A440
379380
# make_frame(f2, 0.0)
380381
# return
381382
f2_min = 440.0
382-
f2_max = 659.25
383383
f2_max = 880.0
384384
f2_steps = np.hstack([np.linspace(f2_min, f2_max, 20),
385+
np.ones(20) * f2_max,
385386
np.linspace(f2_max, f2_min, 20),
386-
np.ones(50) * f2_min])
387-
phi_steps = np.hstack([np.zeros(40),
387+
np.ones(20) * f2_min])
388+
phi_steps = np.hstack([np.zeros(20),
388389
np.linspace(0.0, 2 * np.pi, 20),
390+
np.ones(20) * 2 * np.pi,
389391
np.linspace(2 * np.pi, 0.0, 20),
390-
np.zeros(10)])
392+
np.zeros(20)])
391393

392394
frames = [make_frame(f, p) for f, p in zip(f2_steps, phi_steps)]
393395
gif.save(frames, 'book/images/basics/phase_sensitivity.gif', duration=3.0)

0 commit comments

Comments
 (0)