
# Librosa tutorial

- Version: 0.6.3
- Tutorial home: https://github.com/librosa/tutorial
- Librosa home: http://librosa.github.io/
- User forum: https://groups.google.com/forum/#!forum/librosa

## 环境

假设已经安装了 [Anaconda](https://anaconda.org/).

如果没有环境，使用下面命令创建一个：

```bash
conda create --name YOURNAME scipy jupyter ipython
```
(使用 `YOURNAME` 来代替新的环境名)

然后使用下面命令来激活新的环境：
```bash
source activate YOURNAME
```


## 安装 librosa
Librosa 使用下面命令进行安装 [🔗]:

```bash
conda install -c conda-forge librosa
```

注意：Windows 需要单独安装音频解码库，这里推荐使用 [ffmpeg](http://ffmpeg.org/).

## 测试

开始 Jupyter:
```bash
jupyter notebook
```
然后打开一个notebook，执行下面命令：

In [4]:
import librosa
print(librosa.__version__)

0.6.3


In [5]:
y, sr = librosa.load(librosa.util.example_audio_file())
print(len(y), sr)

(1355168, 22050)


### 注：上面的load的缺省sr=22050，如果需要原始的音频采样率sr=None

# librosa文档!


Librosa有大量的例子文档，请参阅：http://librosa.github.io/librosa/

# 约定

- 所有数据是基本的 `numpy` 类型
- **Audio buffers**（音频数据缓存） 称为 `y`
- **Sampling rate**（采样率）称为 `sr`
- The last axis is time-like:
        y[1000] 是第1001各样本
        S[:, 100] 是第101的个S的帧
- **Defaults** （缺省）`sr=22050`, `hop_length=512`

# 今天要学习的内容

- `librosa.core`
- `librosa.feature`
- `librosa.display`
- `librosa.beat`
- `librosa.segment`
- `librosa.decompose`

# `librosa.core`

- Low-level audio processes（底层的音频处理）
- Unit conversion（单元转换）
- Time-frequency representations（时间-频率变换）


使用原始采样率加载音频文件，使用 `sr=None`

In [6]:
y_orig, sr_orig = librosa.load(librosa.util.example_audio_file(),
                     sr=None)
print(len(y_orig), sr_orig)

(2710336, 44100)


Resampling is easy（重新采样非常容易）

In [7]:
sr = 22050

y = librosa.resample(y_orig, sr_orig, sr)

print(len(y), sr)

(1355168, 22050)


But what's that in seconds?（但是时间多长？）

In [8]:
print(librosa.samples_to_time(len(y), sr))

61.4588662132


## 频谱表示

短时傅立叶变换是信号处理的基础。

`librosa.stft` 返回一个复数矩阵 `D`.

`D[f, t]` 是：以频率 `f`, 时间（帧） `t` 的 FFT 值 .

In [9]:
D = librosa.stft(y)
print(D.shape, D.dtype)

((1025, 2647), dtype('complex64'))


Often, we only care about the magnitude.（通常，我们比较关心幅值）

`D` 包含幅值 *magnitude* `S` 和相角 *phase* $\phi$.

$$
D_{ft} = S_{ft} \exp\left(j \phi_{ft}\right)
$$

In [10]:
import numpy as np

In [11]:
S, phase = librosa.magphase(D)
print(S.dtype, phase.dtype, np.allclose(D, S * phase))

(dtype('float32'), dtype('complex64'), True)


## Constant-Q transforms

The CQT gives a logarithmically spaced frequency basis.

This representation is more natural for many analysis tasks.

In [None]:
C = librosa.cqt(y, sr=sr)

print(C.shape, C.dtype)

## Exercise 0

- Load a different audio file
- Compute its STFT with a different hop length

In [None]:
# Exercise 0 solution

y2, sr2 = librosa.load(   )

D = librosa.stft(y2, hop_length=   )

# `librosa.feature`

- Standard features:
    - `librosa.feature.melspectrogram`
    - `librosa.feature.mfcc`
    - `librosa.feature.chroma`
    - Lots more...
- Feature manipulation:
    - `librosa.feature.stack_memory`
    - `librosa.feature.delta`

Most features work either with audio or STFT input

In [None]:
melspec = librosa.feature.melspectrogram(y=y, sr=sr)

# Melspec assumes power, not energy as input
melspec_stft = librosa.feature.melspectrogram(S=S**2, sr=sr)

print(np.allclose(melspec, melspec_stft))

# `librosa.display`

- Plotting routines for spectra and waveforms

- **Note**: major overhaul coming in 0.5

In [None]:
# Displays are built with matplotlib 
import matplotlib.pyplot as plt

# Let's make plots pretty
import matplotlib.style as ms
ms.use('seaborn-muted')

# Render figures interactively in the notebook
%matplotlib nbagg

# IPython gives us an audio widget for playback
from IPython.display import Audio

## Waveform display

In [None]:
plt.figure()
librosa.display.waveplot(y=y, sr=sr)

## A basic spectrogram display

In [None]:
plt.figure()
librosa.display.specshow(melspec, y_axis='mel', x_axis='time')
plt.colorbar()

## Exercise 1

* Pick a feature extractor from the `librosa.feature` submodule and plot the output with `librosa.display.specshow`


* **Bonus**: Customize the plot using either `specshow` arguments or `pyplot` functions

In [None]:
# Exercise 1 solution

X = librosa.feature.XX()

plt.figure()

librosa.display.specshow(    )

# `librosa.beat`

- Beat tracking and tempo estimation

The beat tracker returns the estimated tempo and beat positions (measured in frames)

In [None]:
tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
print(tempo)
print(beats)

Let's sonify it!

In [None]:
clicks = librosa.clicks(frames=beats, sr=sr, length=len(y))

Audio(data=y + clicks, rate=sr)

Beats can be used to downsample features

In [None]:
chroma = librosa.feature.chroma_cqt(y=y, sr=sr)
chroma_sync = librosa.feature.sync(chroma, beats)

In [None]:
plt.figure(figsize=(6, 3))
plt.subplot(2, 1, 1)
librosa.display.specshow(chroma, y_axis='chroma')
plt.ylabel('Full resolution')
plt.subplot(2, 1, 2)
librosa.display.specshow(chroma_sync, y_axis='chroma')
plt.ylabel('Beat sync')

# `librosa.segment`

- Self-similarity / recurrence
- Segmentation

Recurrence matrices encode self-similarity

    R[i, j] = similarity between frames (i, j)
    
Librosa computes recurrence between `k`-nearest neighbors.

In [None]:
R = librosa.segment.recurrence_matrix(chroma_sync)

In [None]:
plt.figure(figsize=(4, 4))
librosa.display.specshow(R)

We can include affinity weights for each link as well.

In [None]:
R2 = librosa.segment.recurrence_matrix(chroma_sync,
                                       mode='affinity',
                                       sym=True)

In [None]:
plt.figure(figsize=(5, 4))
librosa.display.specshow(R2)
plt.colorbar()

## Exercise 2

* Plot a recurrence matrix using different  features
* **Bonus**: Use a custom distance metric

In [None]:
# Exercise 2 solution

# `librosa.decompose`

- `hpss`: Harmonic-percussive source separation
- `nn_filter`: Nearest-neighbor filtering, non-local means, Repet-SIM
- `decompose`: NMF, PCA and friends

Separating harmonics from percussives is easy

In [None]:
D_harm, D_perc = librosa.decompose.hpss(D)

y_harm = librosa.istft(D_harm)

y_perc = librosa.istft(D_perc)

In [None]:
Audio(data=y_harm, rate=sr)

In [None]:
Audio(data=y_perc, rate=sr)

NMF is pretty easy also!

In [None]:
# Fit the model
W, H = librosa.decompose.decompose(S, n_components=16, sort=True)

In [None]:
plt.figure(figsize=(6, 3))
plt.subplot(1, 2, 1), plt.title('W')
librosa.display.specshow(librosa.logamplitude(W**2), y_axis='log')
plt.subplot(1, 2, 2), plt.title('H')
librosa.display.specshow(H, x_axis='time')

In [None]:
# Reconstruct the signal using only the first component
S_rec = W[:, :1].dot(H[:1, :])

y_rec = librosa.istft(S_rec * phase)

In [None]:
Audio(data=y_rec, rate=sr)

## Exercise 3

- Compute a chromagram using only the harmonic component
- **Bonus**: run the beat tracker using only the percussive component

# Wrapping up

- This was just a brief intro, but there's lots more!

- Read the docs: http://librosa.github.io/librosa/
- And the example gallery: http://librosa.github.io/librosa_gallery/
- We'll be sprinting all day.  Get involved! https://github.com/librosa/librosa/issues/395