Disable Flash Attention for inference

Flash Attention can only be used with fp16 and bf16, not with fp32. Therefore, we should make flash attention optional in our codebase so that one can deactivate it during inference in exchange for higher precision.