Update Jupyter Notebooks introduction part and add README

Azure · Jan 31, 2018 · c9826e7 · c9826e7
1 parent 4ae0c5b
commit c9826e7
Show file tree

Hide file tree

Showing 11 changed files with 4,537 additions and 148 deletions.
diff --git a/...pynb_checkpoints/Deep Learning for Audio Part 1 - Audio Processing-Copy1-checkpoint.ipynb b/...pynb_checkpoints/Deep Learning for Audio Part 1 - Audio Processing-Copy1-checkpoint.ipynb
diff --git a/...dio/.ipynb_checkpoints/Deep Learning for Audio Part 1 - Audio Processing-checkpoint.ipynb b/...dio/.ipynb_checkpoints/Deep Learning for Audio Part 1 - Audio Processing-checkpoint.ipynb
diff --git a/...ckpoints/Deep Learning for Audio Part 2a - Pre-process UrbanSound Datset-checkpoint.ipynb b/...ckpoints/Deep Learning for Audio Part 2a - Pre-process UrbanSound Datset-checkpoint.ipynb
diff --git a/...eep Learning for Audio Part 2b - Train and Predict on UrbanSound dataset-checkpoint.ipynb b/...eep Learning for Audio Part 2b - Train and Predict on UrbanSound dataset-checkpoint.ipynb
diff --git a/Tutorials/DeepLearningForAudio/Deep Learning for Audio Part 1 - Audio Processing-Copy1.ipynb b/Tutorials/DeepLearningForAudio/Deep Learning for Audio Part 1 - Audio Processing-Copy1.ipynb
diff --git a/Tutorials/DeepLearningForAudio/Deep Learning for Audio Part 1 - Audio Processing.ipynb b/Tutorials/DeepLearningForAudio/Deep Learning for Audio Part 1 - Audio Processing.ipynb
diff --git a/...eepLearningForAudio/Deep Learning for Audio Part 2a - Pre-process UrbanSound Datset.ipynb b/...eepLearningForAudio/Deep Learning for Audio Part 2a - Pre-process UrbanSound Datset.ipynb
@@ -6,7 +6,7 @@
     "collapsed": true
    },
    "source": [
-    "# Pre-process UrbanSound8K audio features"
+    "# Deep Learning for Audio Part 2a - Pre-process UrbanSound Datset"
    ]
   },
   {
@@ -20,18 +20,18 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This is the Jupyter Notebook for the blog post **Hearing AI: Getting Started with Deep Learning for Audio on Azure**. In this jupyter notebook, we will process the audio files and extract the useful features that will be fed into a Convolutional Neural Network. \n",
+    "In this jupyter notebook, we will process the audio files and extract the useful features that will be fed into a Convolutional Neural Network. \n",
     "\n",
     "\n",
     "\n",
-    "In this Jupyter Notebook, we will train and predict on [UrbanSound8K](https://serv.cusp.nyu.edu/projects/urbansounddataset/download-urbansound8k.html) dataset. There are a few published benchmarks, notebly mentioned in the papers below:\n",
+    "We will train and predict on [UrbanSound8K](https://serv.cusp.nyu.edu/projects/urbansounddataset/download-urbansound8k.html) dataset. There are a few published benchmarks, which are mentioned in the papers below:\n",
     "\n",
     "- [Environmental sound classification with convolutional neural networks](http://karol.piczak.com/papers/Piczak2015-ESC-ConvNet.pdf) by Karol J Piczak.\n",
     "- [Deep convolutional neural networks and data augmentation for environmental sound classification](https://arxiv.org/abs/1608.04363) by Justin Salamon and Juan Pablo Bello\n",
     "- [Learning from Between-class Examples for Deep Sound Recognition](https://arxiv.org/abs/1711.10282) by Yuji Tokozume, Yoshitaka Ushiku, Tatsuya Harada\n",
     "\n",
     "\n",
-    "Amonog all of them, one of the state-of-art result is from the last paper by Tokozume et al., where the best error rate they get is 21.7%. In this tutorial we will show you how to build a neural network that can achieve the state-of-art performance using Azure.\n",
+    "The state-of-art result is from the last paper by Tokozume et al., where the best error rate achieved is 21.7%. In this tutorial we will show you how to build a neural network that can achieve the state-of-art performance using Azure.\n",
     "\n",
     "\n",
     "This jupyter notebook borrows some of the pre-processing code on the Github Repo here: http://aqibsaeed.github.io/2016-09-24-urban-sound-classification-part-2/, but with a lot of modifications. It is tested with **Python3.5**, **Keras 2.1.2** and **Tensorflow 1.4.0**."
@@ -48,7 +48,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We will use librosa as our audio processing library. For more details on librosa, please refer to the librosa documenent [here](https://librosa.github.io/librosa/tutorial.html). We also need to install a bunch of libraries. Most of them are python packages, but you still need to install a few audio processing library in apt-get fashion, which librosa depends on.\n",
+    "We will use librosa as our audio processing library. For more details on librosa, please refer to the librosa documenent [here](https://librosa.github.io/librosa/tutorial.html). We also need to install a bunch of libraries. Most of them are python packages, but you still may need to install a few audio processing libraries using apt-get:\n",
     "\n",
     "`sudo apt-get install -y --no-install-recommends \\\n",
     "        openmpi-bin \\\n",
@@ -59,11 +59,9 @@
     "        pkg-config`\n",
     "        \n",
     "        \n",
-    "We also need to install librosa in pip:\n",
+    "We also need to install librosa and a few other deep learning libraries in pip:\n",
     "\n",
-    "`pip install librosa pydot graphviz keras tensorflow-gpu`\n",
-    "\n",
-    "If you are running DSVM, most of them should already be covered."
+    "`pip install librosa pydot graphviz keras tensorflow-gpu`\n"
    ]
   },
   {
@@ -89,7 +87,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 1,
    "metadata": {
     "collapsed": true,
     "scrolled": true
@@ -112,9 +110,8 @@
     "parent_dir = \"/mnt/UrbanSound8K/audio\"\n",
     "\n",
     "# specify bands that you want to use. This is also the \"height\" of the spectrogram image\n",
-    "n_bands = 140\n",
+    "n_bands = 150\n",
     "# specify frames that you want to use. This is also the \"width\" of the spectrogram image\n",
-    "# TODO: we should really specify window_size and infer number of frames as output, but anyway...\n",
     "n_frames = 150\n",
     "\n",
     "\n",
@@ -140,7 +137,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 2,
    "metadata": {
     "collapsed": true
    },
@@ -158,7 +155,6 @@
     "        fs = target_fs\n",
     "    return audio, fs\n",
     "\n",
-    "# TODO: chage function name - this is the transpose of the original\n",
     "def pad_trunc_seq_rewrite(x, max_len):\n",
     "    \"\"\"Pad or truncate a sequence data to a fixed length.\n",
     "\n",
@@ -172,7 +168,6 @@
     "\n",
     "    if x.shape[1] < max_len:\n",
     "        pad_shape = (x.shape[0], max_len - x.shape[1])\n",
-    "        # TODO: move this to config file\n",
     "        pad = np.ones(pad_shape) * np.log(1e-8)\n",
     "        #x_new = np.concatenate((x, pad), axis=1)\n",
     "        x_new = np.hstack((x, pad))\n",
@@ -228,7 +223,6 @@
     "                return_onesided=True,\n",
     "                mode='magnitude')\n",
     "            x = np.dot(x.T, melW.T)\n",
-    "            # TODO: place 1e-8 into config file\n",
     "            x = np.log(x + 1e-8)\n",
     "            x = x.astype(np.float32).T\n",
     "            x = pad_trunc_seq_rewrite(x, frames)\n",
@@ -283,67 +277,65 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Saving fold2\n",
       "Saving fold1\n",
-      "Saving fold7\n",
-      "Saving fold5\n",
       "Saving fold3\n",
-      "Saving fold4\n",
+      "Saving fold7\n",
+      "Saving fold6\n",
+      "Saving fold2\n",
       "Saving fold9\n",
-      "Saving fold10\n",
       "Saving fold8\n",
-      "Saving fold6\n",
-      "File /mnt/UrbanSound8K/audio/fold1/87275-1-1-0.wav is shorter than window size - DISCARDING - look into making the window larger.\n",
+      "Saving fold5\n",
+      "Saving fold10\n",
+      "Saving fold4\n",
       "File /mnt/UrbanSound8K/audio/fold1/87275-1-2-0.wav is shorter than window size - DISCARDING - look into making the window larger.\n",
+      "File /mnt/UrbanSound8K/audio/fold1/87275-1-1-0.wav is shorter than window size - DISCARDING - look into making the window larger.\n",
       "Features of fold6  =  (823, 150, 150, 3)\n",
       "Labels of fold6  =  (823, 10)\n",
       "Saved /mnt/us8k-150bands-150frames-3channel/fold6_x.npy\n",
       "Saved /mnt/us8k-150bands-150frames-3channel/fold6_y.npy\n",
-      "Features of fold3  =  (925, 140, 150, 3)\n",
+      "Features of fold7  =  (838, 150, 150, 3)\n",
+      "Labels of fold7  =  (838, 10)\n",
+      "Features of fold3  =  (925, 150, 150, 3)\n",
       "Labels of fold3  =  (925, 10)\n",
+      "Saved /mnt/us8k-150bands-150frames-3channel/fold7_x.npy\n",
+      "Saved /mnt/us8k-150bands-150frames-3channel/fold7_y.npy\n",
       "Saved /mnt/us8k-150bands-150frames-3channel/fold3_x.npy\n",
       "Saved /mnt/us8k-150bands-150frames-3channel/fold3_y.npy\n",
-      "Features of fold8  =  (806, 140, 150, 3)\n",
+      "Features of fold8  =  (806, 150, 150, 3)\n",
       "Labels of fold8  =  (806, 10)\n",
       "Saved /mnt/us8k-150bands-150frames-3channel/fold8_x.npy\n",
       "Saved /mnt/us8k-150bands-150frames-3channel/fold8_y.npy\n",
-      "Features of fold7  =  (838, 150, 150, 3)\n",
-      "Labels of fold7  =  (838, 10)\n",
-      "Saved /mnt/us8k-150bands-150frames-3channel/fold7_x.npy\n",
-      "Saved /mnt/us8k-150bands-150frames-3channel/fold7_y.npy\n",
-      "Features of fold5  =  (936, 150, 150, 3)\n",
-      "Labels of fold5  =  (936, 10)\n",
-      "Saved /mnt/us8k-150bands-150frames-3channel/fold5_x.npy\n",
-      "Saved /mnt/us8k-150bands-150frames-3channel/fold5_y.npy\n",
-      "Features of fold9  =  (816, 150, 150, 3)\n",
-      "Labels of fold9  =  (816, 10)\n",
-      "Saved /mnt/us8k-150bands-150frames-3channel/fold10_x.npy\n",
-      "Saved /mnt/us8k-150bands-150frames-3channel/fold10_y.npy\n",
-      "Saved /mnt/us8k-150bands-150frames-3channel/fold9_x.npy\n",
-      "Saved /mnt/us8k-150bands-150frames-3channel/fold9_y.npy\n",
       "Features of fold10  =  (837, 150, 150, 3)\n",
       "Labels of fold10  =  (837, 10)\n",
       "Saved /mnt/us8k-150bands-150frames-3channel/fold10_x.npy\n",
       "Saved /mnt/us8k-150bands-150frames-3channel/fold10_y.npy\n",
-      "Features of fold1  =  (871, 150, 150, 3)\n",
-      "Labels of fold1  =  (871, 10)\n",
-      "Saved /mnt/us8k-150bands-150frames-3channel/fold1_x.npy\n",
-      "Saved /mnt/us8k-150bands-150frames-3channel/fold1_y.npy\n",
+      "Features of fold9  =  (816, 150, 150, 3)\n",
+      "Labels of fold9  =  (816, 10)\n",
+      "Saved /mnt/us8k-150bands-150frames-3channel/fold9_x.npy\n",
+      "Saved /mnt/us8k-150bands-150frames-3channel/fold9_y.npy\n",
+      "Features of fold5  =  (936, 150, 150, 3)\n",
+      "Labels of fold5  =  (936, 10)\n",
+      "Saved /mnt/us8k-150bands-150frames-3channel/fold5_x.npy\n",
+      "Saved /mnt/us8k-150bands-150frames-3channel/fold5_y.npy\n",
       "Features of fold2  =  (888, 150, 150, 3)\n",
       "Labels of fold2  =  (888, 10)\n",
       "Saved /mnt/us8k-150bands-150frames-3channel/fold2_x.npy\n",
       "Saved /mnt/us8k-150bands-150frames-3channel/fold2_y.npy\n",
+      "Features of fold1  =  (871, 150, 150, 3)\n",
+      "Labels of fold1  =  (871, 10)\n",
+      "Saved /mnt/us8k-150bands-150frames-3channel/fold1_x.npy\n",
+      "Saved /mnt/us8k-150bands-150frames-3channel/fold1_y.npy\n",
       "Features of fold4  =  (990, 150, 150, 3)\n",
       "Labels of fold4  =  (990, 10)\n",
       "Saved /mnt/us8k-150bands-150frames-3channel/fold4_x.npy\n",
       "Saved /mnt/us8k-150bands-150frames-3channel/fold4_y.npy\n",
-      "CPU times: user 765 ms, sys: 283 ms, total: 1.05 s\n",
-      "Wall time: 11min 19s\n"
+      "CPU times: user 823 ms, sys: 208 ms, total: 1.03 s\n",
+      "Wall time: 20min 42s\n"
      ]
     }
    ],
    "source": [
-    "% % time\n",
+    "%%time\n",
     "# use this to process the audio files into numpy arrays\n",
     "def save_folds(data_dir, k, bands, frames):\n",
     "    fold_name = 'fold' + str(k)\n",
@@ -391,7 +383,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.3"
+   "version": "3.5.2"
   },
   "toc": {
    "nav_menu": {