Skip to content

Commit

Permalink
Update Jupyter Notebooks introduction part and add README
Browse files Browse the repository at this point in the history
  • Loading branch information
xiaoyongzhumsft committed Jan 31, 2018
1 parent 4ae0c5b commit c9826e7
Show file tree
Hide file tree
Showing 11 changed files with 4,537 additions and 148 deletions.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Expand Up @@ -6,7 +6,7 @@
"collapsed": true
},
"source": [
"# Pre-process UrbanSound8K audio features"
"# Deep Learning for Audio Part 2a - Pre-process UrbanSound Datset"
]
},
{
Expand All @@ -20,18 +20,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This is the Jupyter Notebook for the blog post **Hearing AI: Getting Started with Deep Learning for Audio on Azure**. In this jupyter notebook, we will process the audio files and extract the useful features that will be fed into a Convolutional Neural Network. \n",
"In this jupyter notebook, we will process the audio files and extract the useful features that will be fed into a Convolutional Neural Network. \n",
"\n",
"\n",
"\n",
"In this Jupyter Notebook, we will train and predict on [UrbanSound8K](https://serv.cusp.nyu.edu/projects/urbansounddataset/download-urbansound8k.html) dataset. There are a few published benchmarks, notebly mentioned in the papers below:\n",
"We will train and predict on [UrbanSound8K](https://serv.cusp.nyu.edu/projects/urbansounddataset/download-urbansound8k.html) dataset. There are a few published benchmarks, which are mentioned in the papers below:\n",
"\n",
"- [Environmental sound classification with convolutional neural networks](http://karol.piczak.com/papers/Piczak2015-ESC-ConvNet.pdf) by Karol J Piczak.\n",
"- [Deep convolutional neural networks and data augmentation for environmental sound classification](https://arxiv.org/abs/1608.04363) by Justin Salamon and Juan Pablo Bello\n",
"- [Learning from Between-class Examples for Deep Sound Recognition](https://arxiv.org/abs/1711.10282) by Yuji Tokozume, Yoshitaka Ushiku, Tatsuya Harada\n",
"\n",
"\n",
"Amonog all of them, one of the state-of-art result is from the last paper by Tokozume et al., where the best error rate they get is 21.7%. In this tutorial we will show you how to build a neural network that can achieve the state-of-art performance using Azure.\n",
"The state-of-art result is from the last paper by Tokozume et al., where the best error rate achieved is 21.7%. In this tutorial we will show you how to build a neural network that can achieve the state-of-art performance using Azure.\n",
"\n",
"\n",
"This jupyter notebook borrows some of the pre-processing code on the Github Repo here: http://aqibsaeed.github.io/2016-09-24-urban-sound-classification-part-2/, but with a lot of modifications. It is tested with **Python3.5**, **Keras 2.1.2** and **Tensorflow 1.4.0**."
Expand All @@ -48,7 +48,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We will use librosa as our audio processing library. For more details on librosa, please refer to the librosa documenent [here](https://librosa.github.io/librosa/tutorial.html). We also need to install a bunch of libraries. Most of them are python packages, but you still need to install a few audio processing library in apt-get fashion, which librosa depends on.\n",
"We will use librosa as our audio processing library. For more details on librosa, please refer to the librosa documenent [here](https://librosa.github.io/librosa/tutorial.html). We also need to install a bunch of libraries. Most of them are python packages, but you still may need to install a few audio processing libraries using apt-get:\n",
"\n",
"`sudo apt-get install -y --no-install-recommends \\\n",
" openmpi-bin \\\n",
Expand All @@ -59,11 +59,9 @@
" pkg-config`\n",
" \n",
" \n",
"We also need to install librosa in pip:\n",
"We also need to install librosa and a few other deep learning libraries in pip:\n",
"\n",
"`pip install librosa pydot graphviz keras tensorflow-gpu`\n",
"\n",
"If you are running DSVM, most of them should already be covered."
"`pip install librosa pydot graphviz keras tensorflow-gpu`\n"
]
},
{
Expand All @@ -89,7 +87,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 1,
"metadata": {
"collapsed": true,
"scrolled": true
Expand All @@ -112,9 +110,8 @@
"parent_dir = \"/mnt/UrbanSound8K/audio\"\n",
"\n",
"# specify bands that you want to use. This is also the \"height\" of the spectrogram image\n",
"n_bands = 140\n",
"n_bands = 150\n",
"# specify frames that you want to use. This is also the \"width\" of the spectrogram image\n",
"# TODO: we should really specify window_size and infer number of frames as output, but anyway...\n",
"n_frames = 150\n",
"\n",
"\n",
Expand All @@ -140,7 +137,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 2,
"metadata": {
"collapsed": true
},
Expand All @@ -158,7 +155,6 @@
" fs = target_fs\n",
" return audio, fs\n",
"\n",
"# TODO: chage function name - this is the transpose of the original\n",
"def pad_trunc_seq_rewrite(x, max_len):\n",
" \"\"\"Pad or truncate a sequence data to a fixed length.\n",
"\n",
Expand All @@ -172,7 +168,6 @@
"\n",
" if x.shape[1] < max_len:\n",
" pad_shape = (x.shape[0], max_len - x.shape[1])\n",
" # TODO: move this to config file\n",
" pad = np.ones(pad_shape) * np.log(1e-8)\n",
" #x_new = np.concatenate((x, pad), axis=1)\n",
" x_new = np.hstack((x, pad))\n",
Expand Down Expand Up @@ -228,7 +223,6 @@
" return_onesided=True,\n",
" mode='magnitude')\n",
" x = np.dot(x.T, melW.T)\n",
" # TODO: place 1e-8 into config file\n",
" x = np.log(x + 1e-8)\n",
" x = x.astype(np.float32).T\n",
" x = pad_trunc_seq_rewrite(x, frames)\n",
Expand Down Expand Up @@ -283,67 +277,65 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Saving fold2\n",
"Saving fold1\n",
"Saving fold7\n",
"Saving fold5\n",
"Saving fold3\n",
"Saving fold4\n",
"Saving fold7\n",
"Saving fold6\n",
"Saving fold2\n",
"Saving fold9\n",
"Saving fold10\n",
"Saving fold8\n",
"Saving fold6\n",
"File /mnt/UrbanSound8K/audio/fold1/87275-1-1-0.wav is shorter than window size - DISCARDING - look into making the window larger.\n",
"Saving fold5\n",
"Saving fold10\n",
"Saving fold4\n",
"File /mnt/UrbanSound8K/audio/fold1/87275-1-2-0.wav is shorter than window size - DISCARDING - look into making the window larger.\n",
"File /mnt/UrbanSound8K/audio/fold1/87275-1-1-0.wav is shorter than window size - DISCARDING - look into making the window larger.\n",
"Features of fold6 = (823, 150, 150, 3)\n",
"Labels of fold6 = (823, 10)\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold6_x.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold6_y.npy\n",
"Features of fold3 = (925, 140, 150, 3)\n",
"Features of fold7 = (838, 150, 150, 3)\n",
"Labels of fold7 = (838, 10)\n",
"Features of fold3 = (925, 150, 150, 3)\n",
"Labels of fold3 = (925, 10)\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold7_x.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold7_y.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold3_x.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold3_y.npy\n",
"Features of fold8 = (806, 140, 150, 3)\n",
"Features of fold8 = (806, 150, 150, 3)\n",
"Labels of fold8 = (806, 10)\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold8_x.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold8_y.npy\n",
"Features of fold7 = (838, 150, 150, 3)\n",
"Labels of fold7 = (838, 10)\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold7_x.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold7_y.npy\n",
"Features of fold5 = (936, 150, 150, 3)\n",
"Labels of fold5 = (936, 10)\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold5_x.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold5_y.npy\n",
"Features of fold9 = (816, 150, 150, 3)\n",
"Labels of fold9 = (816, 10)\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold10_x.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold10_y.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold9_x.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold9_y.npy\n",
"Features of fold10 = (837, 150, 150, 3)\n",
"Labels of fold10 = (837, 10)\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold10_x.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold10_y.npy\n",
"Features of fold1 = (871, 150, 150, 3)\n",
"Labels of fold1 = (871, 10)\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold1_x.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold1_y.npy\n",
"Features of fold9 = (816, 150, 150, 3)\n",
"Labels of fold9 = (816, 10)\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold9_x.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold9_y.npy\n",
"Features of fold5 = (936, 150, 150, 3)\n",
"Labels of fold5 = (936, 10)\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold5_x.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold5_y.npy\n",
"Features of fold2 = (888, 150, 150, 3)\n",
"Labels of fold2 = (888, 10)\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold2_x.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold2_y.npy\n",
"Features of fold1 = (871, 150, 150, 3)\n",
"Labels of fold1 = (871, 10)\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold1_x.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold1_y.npy\n",
"Features of fold4 = (990, 150, 150, 3)\n",
"Labels of fold4 = (990, 10)\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold4_x.npy\n",
"Saved /mnt/us8k-150bands-150frames-3channel/fold4_y.npy\n",
"CPU times: user 765 ms, sys: 283 ms, total: 1.05 s\n",
"Wall time: 11min 19s\n"
"CPU times: user 823 ms, sys: 208 ms, total: 1.03 s\n",
"Wall time: 20min 42s\n"
]
}
],
"source": [
"% % time\n",
"%%time\n",
"# use this to process the audio files into numpy arrays\n",
"def save_folds(data_dir, k, bands, frames):\n",
" fold_name = 'fold' + str(k)\n",
Expand Down Expand Up @@ -391,7 +383,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
"version": "3.5.2"
},
"toc": {
"nav_menu": {
Expand Down

0 comments on commit c9826e7

Please sign in to comment.