Small fixes

Ethan Manilow · Ethan Manilow · commit c93e70b2746a · 2020-10-09T23:56:53.000-05:00
diff --git a/book/approaches/deep/building_blocks.md b/book/approaches/deep/building_blocks.md
@@ -638,11 +638,11 @@ When computing losses with spectrograms, we compare the spectrogram
 of the true source to the input spectrogram with the network's mask
 applied. Given some ground truth STFT for source $i$
 $S_i \in \mathbb{C}^{F\times T}$, an input
-mixture $X \in \mathbb{C}^{F\times T}$, and a net's estimated
+mixture $Y \in \mathbb{C}^{F\times T}$, and a net's estimated
 mask $\hat{M}_i \in \mathbb{R}^{F\times T}$ we compute the loss like
 
 $$
-\mathcal{L}_{\text{spec}} = \Big\| S_i - \hat{M}_i \odot |X| \Big\|_p,
+\mathcal{L}_{\text{spec}} = \Big\| |S_i| - \hat{M}_i \odot |Y| \Big\|_p,
 $$
 
 where$\odot$ denotes element-wise product adn $p$ is the _norm_ of
@@ -661,7 +661,7 @@ the _Magnitude Spectrum Approximation_ or MSA {cite}`weninger2014discriminativel
 This is just the same equation as above unmodified:
 
 $$
-\text{MSA} =  |S_i| - \hat{M}_i \odot |X|
+\text{MSA} =  |S_i| - \hat{M}_i \odot |Y|
 $$
 
 
@@ -671,16 +671,16 @@ the phase data by including it in our target calculation like so
 
 
 $$
-\text{tPSA} = \hat{M}_{i} \odot |X|  - \operatorname{T}_{0}^{|X|}\left(|S_i| \odot \cos(\angle S_i - \angle X)\right)
+\text{tPSA} = \hat{M}_{i} \odot |Y|  - \operatorname{T}_{0}^{|Y|}\left(|S_i| \odot \cos(\angle S_i - \angle Y)\right)
 $$
 
 
 where $\angle S_i$ is the true
-phase of Source i, $\angle X$ is the mixture phase, and
-$\operatorname{T}_{0}^{|X|}(x)= \min(\max(x,0),|X|)$ is a truncation
+phase of Source i, $\angle Y$ is the mixture phase, and
+$\operatorname{T}_{0}^{|Y|}(x)= \min(\max(x,0),|Y|)$ is a truncation
 function ensuring the target can be reached with a sigmoid activation function.
 Specifically, we incorporate constructive and destructive interference 
-of the source and mixture into the target with the term $\cos(\angle S_i - \angle X)$.
+of the source and mixture into the target with the term $\cos(\angle S_i - \angle Y)$.
 
 
 ```{tip}
diff --git a/book/basics/evaluation.ipynb b/book/basics/evaluation.ipynb
@@ -66,7 +66,7 @@
     "\\text{SAR} := 10 \\log_{10} \\left( \\frac{\\| s_{\\text{target}} + e_{\\text{interf}} + e_{\\text{noise}} \\|^2}{ \\| e_{\\text{artif}} \\|^2} \\right)\n",
     "$$\n",
     "\n",
-    "This is usually interpreted as the amount of unwanted \\text{artif}acts a source \n",
+    "This is usually interpreted as the amount of unwanted artifacts a source \n",
     "estimate has with relation to the true source.\n",
     "\n",
     "\n",
@@ -81,7 +81,7 @@
     "[\"bleed\", or \"leakage\"](https://en.wikipedia.org/wiki/Spill_(audio)). \n",
     "\n",
     "\n",
-    "**Source-to-Interference Ratio (SIR)**\n",
+    "**Source-to-Distortion Ratio (SDR)**\n",
     "\n",
     "$$\n",
     "\\text{SDR} := 10 \\log_{10} \\left( \\frac{\\| s_{\\text{target}} \\|^2}{ \\| e_{\\text{interf}} + e_{\\text{noise}} + e_{\\text{artif}} \\|^2} \\right)\n",
@@ -94,7 +94,7 @@
     "\n",
     "```{note}\n",
     "As of this writing (October 2020), the best reported SDR for singing\n",
-    "voice separation is $7.24 dB$. {cite}`takahashi2020d3net`\n",
+    "voice separation on MUSDB18 is $7.24 dB$. {cite}`takahashi2020d3net`\n",
     "```\n",
     "\n",
     "\n",
diff --git a/book/basics/phase.md b/book/basics/phase.md
@@ -18,7 +18,9 @@ the source estimation.
 alt: Phase is an important component of sound.
 name: circle_phase
 ---
-An audio signal's phase is fundamental to representing the signal.
+Phase is the instantaneous amplitude of an audio signal. Phase is a fundamental part of representing
+the signal.
+Adapted from [Wikimedia](https://commons.wikimedia.org/wiki/File:Phase_shifter_using_IQ_modulator.gif).
 ```
 
 An audio signal, $y(t)$, composed of exactly one sine wave,
@@ -92,19 +94,19 @@ than at the lower frequencies.
 alt: Phase is sensitive to frequency and its initial starting point.
 name: phase_sensitivity
 ---
-Getting a snapshot of the phase (the black dotted vertical line) is very
+Getting a snapshot of the phase (the black dotted vertical lines) is very
 sensitive to the frequencies and initial phases of the sine waves. This
 is similar to what happens when take an STFT: many snapshots of sine waves
 with many frequencies and initial phase offsets.
 ```
 
 
-The gif above shows two sine waves. They both start at A440, or 440 Hz. But then the bottom one
-gradually changes frequency up an octave higher (880 Hz). The dotted black
-line shows a shapshot of the phase as the frequency changes. The initial phase also changes
-in the interval $[0.0, 2\pi]$. Notice how sensitive the snapshot is to changes
-in the frequency and initial phase.
-
+The gif above shows a sine wave with varying frequency and initial phase.
+The frequency starts at A440, or 440 Hz and gradually changes frequency up an octave higher (880
+Hz). The initial phase also changes in the interval $[0.0, 2\pi]$.
+The dotted black lines show two shapshots of the value of the sine wave as the frequency and
+initial phase both change.
+Notice how sensitive the snapshot are to changes in the frequency and initial phase.
 
 Another big difficulty when dealing with phase is that humans do not always
 perceive phase differences, _i.e._,
diff --git a/book/basics/representations.md b/book/basics/representations.md
@@ -313,7 +313,8 @@ STFT, $\log{|X|^2} \in \mathbb{R}^{T \times F}$.
 
 ```{tip}
 Even though it is hard to visualize the detail in a magnitude or power spectrogram,
-most source separation algorithms work completely fine on these representations.
+some source separation algorithms work completely fine on these representations, while
+some need log spectrograms. Make sure to set your spectrograms correctly!
 ```
 
 
@@ -333,6 +334,7 @@ is being discussed when possible.
 ---
 alt: A visual comparison of linear-scaled vs mel-spaced y axies.
 name: mel_spectrograms
+scale: 35%
 ---
 A visual comparison of linear-scaled vs mel-spaced y axies.
 Lower frequencies have a larger representation in a mel-spaced spectrogram.
diff --git a/book/data/musdb18.ipynb b/book/data/musdb18.ipynb
@@ -520,4 +520,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 4
-}
+}
diff --git a/book/first_steps/byo_hpss.ipynb b/book/first_steps/byo_hpss.ipynb
@@ -238,7 +238,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "And, as always, we can make an interactive version of this. Try whistling and clapping\n",
+    "And, as always, we can make an interactive version of this. Try recording yourself whistling and clapping\n",
     "at the same time and see how the results sound!"
    ]
   },
@@ -272,6 +272,38 @@
     "my_hpss.interact(share=True, source='microphone')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If you want to upload a song, you can also remove `source='microphone'` in the `interact()` call:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "NameError",
+     "evalue": "name 'my_hpss' is not defined",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
+      "\u001b[0;32m<ipython-input-1-af6b4bc55694>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      2\u001b[0m \u001b[0;31m# interactively in Colab or Jupyter Notebook\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mmy_hpss\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minteract\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mshare\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+      "\u001b[0;31mNameError\u001b[0m: name 'my_hpss' is not defined"
+     ]
+    }
+   ],
+   "source": [
+    "%%capture\n",
+    "# Comment out the line above to run this cell\n",
+    "# interactively in Colab or Jupyter Notebook\n",
+    "\n",
+    "my_hpss.interact(share=True)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
diff --git a/book/first_steps/repetition.ipynb b/book/first_steps/repetition.ipynb
@@ -419,9 +419,12 @@
    "outputs": [],
    "source": [
     "# Will make AudioSignal objects after we're run the algorithm\n",
+    "repet = nussl.separation.primitive.Repet(mix)\n",
+    "repet.run()\n",
     "repet_bg, repet_fg = repet.make_audio_signals()\n",
     "\n",
     "# Will run the algorithm and return AudioSignals in one step\n",
+    "repet = nussl.separation.primitive.Repet(mix)\n",
     "repet_bg, repet_fg = repet()"
    ]
   },
diff --git a/book/images/basics/circle_phase.gif b/book/images/basics/circle_phase.gif
diff --git a/book/images/basics/phase_sensitivity.gif b/book/images/basics/phase_sensitivity.gif
diff --git a/common/image_maker.py b/common/image_maker.py
@@ -284,6 +284,7 @@ def plt_set(t):
 
         # Setting the y axis ticks at (0, 180, 360, 540, 720) degree phase
         ax2.set_xticks([0, 180, 360, 540, 720])
+        # ax2.set_xlim(0, 720)
 
         # Setting the position of the x and y axis
         ax2.spines['left'].set_position(('axes', 0.045))
@@ -307,8 +308,8 @@ def plt_set(t):
             0.0174533 * t)
 
         # plotting I+Q arrow that moves along to show the current phase
-        ax2.arrow(t, 0, 0, c1, length_includes_head='True', head_width=10, head_length=0.07,
-                  color='g')
+        # ax2.arrow(t, 0, 0, c1, length_includes_head='True', head_width=10, head_length=0.07,
+        #           color='g')
 
         # plotting I and Q amplitude arrows at position 180° and 90° respectively
         # ax2.arrow(180, 0, 0, 1 * np.cos(0.0174533 * t) * np.cos(0.0174533 * 180),
@@ -340,54 +341,55 @@ def phase_intersect():
     def make_frame(f2, phi_):
         plt.style.use('seaborn')
 
-        fig = plt.figure(figsize=(10, 5))
+        fig = plt.figure(figsize=(9, 3))
 
         max_t = 0.01
         time = np.linspace(0.0, max_t, 2000)
-        intersect = max_t / 2
-        f1 = 440.0  # A440
-        sin1 = np.sin(2 * np.pi * f1 * time)
+        intersect1 = max_t / 3
+        intersect2 = intersect1 * 2
 
         sin2 = np.sin(2 * np.pi * f2 * time + phi_)
 
-        plt.subplot(211)
-        plt.plot(time, sin1)
-        plt.axvline(x=intersect, ls='--', color='black', lw=1.0)
-        sin1_val = np.sin(2 * np.pi * f1 * intersect)
-        plt.text(intersect + 0.0001, 0, f'{sin1_val:+0.2f}')
-        plt.title(f'Frequency {f1:0.2f} Hz, Initial Phase 0.00' + r'$\pi$')
-        plt.ylabel('Amplitude')
-        plt.ylim([-1.1, 1.1])
-        plt.xlim([-0.00025, 0.01025])
+        props = dict(boxstyle='round', facecolor='wheat', alpha=1.0)
 
-        plt.subplot(212)
         plt.plot(time, sin2, 'g')
-        plt.axvline(x=intersect, ls='--', color='black', lw=1.0)
-        sin2_val = np.sin(2 * np.pi * f2 * intersect + phi_)
-        plt.text(intersect + 0.0001, 0, f'{sin2_val:+0.2f}')
+        plt.axvline(x=intersect1, ls='--', color='black', lw=1.0)
+        sin2_val1 = np.sin(2 * np.pi * f2 * intersect1 + phi_)
+        plt.text(intersect1 - 0.00045, -1.275, 'Snapshot 1', bbox=props)
+        plt.gcf().text(0.85, 0.65, 'Value at:')
+        plt.gcf().text(0.85, 0.55, f'Snapshot 1 = {sin2_val1:+0.2f}')
+
+        plt.axvline(x=intersect2, ls='--', color='black', lw=1.0)
+        sin2_val2 = np.sin(2 * np.pi * f2 * intersect2 + phi_)
+        plt.text(intersect2 - 0.00045, -1.275, 'Snapshot 2', bbox=props)
+        plt.gcf().text(0.85, 0.45, f'Snapshot 2 = {sin2_val2:+0.2f}')
+
         plt.title(f'Frequency {f2:0.2f} Hz, Initial Phase {phi_ / np.pi:0.2f}' + r'$\pi$')
         plt.ylabel('Amplitude')
         plt.xlabel('Time (s)')
-        plt.ylim([-1.1, 1.1])
-        plt.xlim([-0.00025, 0.01025])
+        plt.ylim([-1.4, 1.1])
+        plt.xlim([0.0, 0.01])
+        plt.subplots_adjust(right=0.85)
 
         for ax in fig.axes:
             ax.label_outer()
+        plt.tight_layout(rect=[0, 0, .85, 1.0])
         # plt.show()
 
     # f2 = 523.25  # C above A440
     # make_frame(f2, 0.0)
     # return
     f2_min = 440.0
-    f2_max = 659.25
     f2_max = 880.0
     f2_steps = np.hstack([np.linspace(f2_min, f2_max, 20),
+                          np.ones(20) * f2_max,
                          np.linspace(f2_max, f2_min, 20),
-                         np.ones(50) * f2_min])
-    phi_steps = np.hstack([np.zeros(40),
+                         np.ones(20) * f2_min])
+    phi_steps = np.hstack([np.zeros(20),
                           np.linspace(0.0, 2 * np.pi, 20),
+                           np.ones(20) * 2 * np.pi,
                           np.linspace(2 * np.pi, 0.0, 20),
-                          np.zeros(10)])
+                          np.zeros(20)])
 
     frames = [make_frame(f, p) for f, p in zip(f2_steps, phi_steps)]
     gif.save(frames, 'book/images/basics/phase_sensitivity.gif', duration=3.0)