Maybe I do not understand this paper throughly, but can someone explain this?
The posterior z is modelled as diagonal Gaussian. And in the Zero initialization part, ensures that the posterior distribution as a simple normal distribution.
If it is a simple distribution, why a complex prior flow is needed to learn its distribution?