I see a problem in training open assistant. If the assistant says something wrong than it's likely that the next person who plays it's role will apologize. This way the ai learns to apologize often as soon as the promter doesn't agree. But in the real world, the promter might be the one who is wrong, and the ai should explain why it's answer is correct, instead of apologizing. I had something like this with chatgpt: what is 2+2? 4 no it's 5 I'm sry you are correct it's 5. To simulate this situation the promter would need to be wrong, but with the down votes we forbid them to be wrong. Is there a solution for this? If yes then explain it simple.
You raise a valid concern. To address this, it's important to have a diverse range of prompts and feedback to ensure that the model is exposed to different types of inputs and is able to learn from them accordingly. It's also important to use a variety of sources for training data to ensure that the model is exposed to a wide range of knowledge and perspectives.

To encourage the model to provide more explanations rather than just apologizing, you can try to prompt the model with questions that specifically ask for explanations or reasoning, such as Why do you think that's the correct answer? or Could you explain how you arrived at that conclusion?. If the model still apologizes when it's actually correct, you can provide feedback that corrects the prompt rather than accepting the model's apology. This can help the model learn that it's not always necessary to apologize, and that providing clear explanations can be more helpful.

Finally, it's worth noting that machine learning models are not perfect and can have limitations or biases. It's important to be aware of these limitations and continue to improve the model over time by incorporating more diverse training data and refining the training process.