nips2015: Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
论文目的是利用历史数据预测往后降水量 (precipitation nowcasting)
作者将此问题视为输入和输出都是时空序列的 时空序列预测问题。
通过给FC-LSTM的 输入到隐层 和隐层到隐层 转移都加上卷积机构,作者提出了 convolutional LSTM (convLSTM)
作者将实验结果与FC-LSTM与state-of-art方法ROVER算法比较,convLSTM更胜一筹。
将空间区域划分为$M\times N$的网格,每个网格中有$P$个观测指标,则一个观测值可表示为$\mathcal{X}\in\mathrm{R}^{P\times M \times N}$,则给定前$J$个观测值,预测后$K$个值的问题可形式化为如下: $$ \large \tilde{\mathcal{X}}{t+1},\cdots,\tilde{\mathcal{X}}{t+K} = \text{arg max}{\mathcal{X}{t+1},\cdots,\mathcal{X}{t+K}}\quad p(\mathcal{X}{t+1}, \cdots, \mathcal{X}{t+K} | \hat{\mathcal{X}}{t-J+1}, \hat{\mathcal{X}}{t-J+2},\cdots,\hat{\mathcal{X}}{t}) $$
在本问题中,每个时间点上获得的观测值为一个二维的雷达图像(radar echo map),文中将图像划分为网格,并且将每个网格中的==???==(view the pixels inside a patch as its measurements ),则此问题转化为一个时空序列预测问题。
问题复杂度与可解性:
Although the number of free variables in a length-K sequence can be up to O(MKNKPK), in practice we may exploit the structure of the space of possible predictions to reduce the dimensionality and hence make the problem tractable.
略去对LSTM的介绍,下面对FC-LSTM作简要介绍:$\circ$符号为==?Hadamard乘==, $$ \begin{aligned} i_t =& \sigma(W_{xi}x_t+W_{hi}h_{t-1}+\textcolor{red}{W_{ci}\circ{c}{t-1}}+b_i)\ f_t =& \sigma(W{xf}x_t+W_{hf}h_{t-1}+\textcolor{red}{W_{cf}\circ{c}{t-1}}+b_f)\ c_t =& f_t \circ c{t-1} + i_t \circ \tanh (W_{xc}x_t+W_{hc}h_{t-1}+b_c)\ o_t =& \sigma(W_{xo}x_t+W_{ho}h_{t-1}+\textcolor{red}{W_{co}\circ{c}_{t}}+b_o) \end{aligned} $$ 即为在LSTM单元内部添加了三个peepholes,如下图所示:
图片来自link。
其中,
the input, cell output and states are all 1D vectors.
FC-LSTM缺点:
-
尽管能发掘时序相关性,但包含过多的空间信息冗余(contains too much redundancy for spatial data)
-
在input-to-state和state-to-state转移中使用的全连接并未包含任何的空间信息。
convLSTM特性:所有输入$\mathcal{X}1, \cdots,\mathcal{X}t$,cell输出$\mathcal{C}1,\cdots,\mathcal{C}t$,隐层状态$\mathcal{H}1,\cdots,\mathcal{H}t$,以及门输出$i_t, f_t, o_t$均为三维张量,其中最后两个维度为空间信息。 $$ \begin{aligned} i{t} &=\sigma\left(W{x i} * \mathcal{X}{t}+W{h i} * \mathcal{H}{t-1}+W{c i} \circ \mathcal{C}{t-1}+b{i}\right) \ f_{t} &=\sigma\left(W_{x f} * \mathcal{X}{t}+W{h f} * \mathcal{H}{t-1}+W{c f} \circ \mathcal{C}{t-1}+b{f}\right) \ \mathcal{C}{t} &=f{t} \circ \mathcal{C}{t-1}+i{t} \circ \tanh \left(W_{x c} * \mathcal{X}{t}+W{h c} * \mathcal{H}{t-1}+b{c}\right) \ o_{t} &=\sigma\left(W_{x o} * \mathcal{X}{t}+W{h o} * \mathcal{H}{t-1}+W{c o} \circ \mathcal{C}{t}+b{o}\right) \ \mathcal{H}{t} &=o{t} \circ \tanh \left(\mathcal{C}_{t}\right) \end{aligned} $$