***keras.layers.TimeDistributed(layer)

This wrapper applies a layer to every temporal slice of an input.
The input should be at least 3D, and the dimension of index one will be considered to be the 
temporal dimension.
Consider a batch of 32 samples, where each sample is a sequence of 10 vectors of 16 dimensions. 
The batch input shape 
of the layer is then (32, 10, 16), and the input_shape, not including the samples dimension, is 
(10, 16).
You can then use TimeDistributed to apply a Dense layer to each of the 10 timesteps, independently.

as the first layer in a model<br>
    model = Sequential()<br>
    model.add(TimeDistributed(Dense(8), input_shape=(10, 16)))<br>

now model.output_shape == (None, 10, 8)

The output will then have shape (32, 10, 8).
In subsequent layers, there is no need for the input_shape:<br>
    model.add(TimeDistributed(Dense(32)))<br>
now model.output_shape == (None, 10, 32)


In [1]:
from numpy import array
length = 5
seq = array([i/float(length) for i in range(length)])
print(seq)

[0.  0.2 0.4 0.6 0.8]


In [7]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import TimeDistributed
from keras.layers import LSTM

model = Sequential()
model.add(LSTM(5, input_shape = (10, 20), return_sequences = True))
model.add(TimeDistributed(Dense(1)))
print(model.output_shape)

(None, 10, 1)


In [8]:
model = Sequential()
model.add(LSTM(5, input_shape = (10, 20), return_sequences = True))
model.add((Dense(1)))
print(model.output_shape)

(None, 10, 1)


In keras - while building a sequential model - usually the second dimension 
(one after sample dimension) - is related to a time dimension. This means that if for example, 
your data is 5-dim with (sample, time, width, length, channel) you could apply a convolutional 
layer using TimeDistributed (which is applicable to 4-dim with (sample, width, length, channel)) 
along a time dimension (applying the same layer to each time slice) in order to obtain 5-d output.

The case with Dense is that in keras from version 2.0 Dense is by default applied to only last 
dimension (e.g. if you apply Dense(10) to input with shape (n, m, o, p) you'll get output with 
shape (n, m, o, 10)) so in your case Dense and TimeDistributed(Dense) are equivalent.
