Hello Aymeric.
thank you for your great examples. I visualized the weights of the first convolutional layer in your MNIST example and noticed that they change only slightly during training. On the other hand, in original TF MNIST convolutional example they change drastically, "feature patterns" are clearly visible at the end. I noticed that they use MomentumOptimizer, while you are using AdamOptimizer. Could this be the reason for different behaviors? Still, it seems rather strange to me that we don't have to change the weights much during training.
I wonder, what do you think about it, may be you can provide some links to documents that can help me to find an explanation for the issue?