What measurements can the developer of Open Assistant take to make their system more:

1. Helpful
2. Ethical
3. Truthful

Do you see scenarios where these 3 properties contradict each other?
In order to ensure the system is helpful, a developer of Open Assistant should make sure the dataset used for fine-tuning has a high quality. The website provided for crowd sourced data collection has several tags and scores that allows users to rate messages according to different criteria. When training the models, these factors should be taken into consideration to minimize bad examples. One concrete example is to filter out messages tagged as spam.

Making sure systems are ethical is a tricky objective. What makes it especially difficult is that this is an ill-posed question. Different groups of people have different views on what is considered to be ethical. It is in fact a long standing philosophical topic to decide if there is a single code of ethics that everyone should follow or not. Therefore if the developer must assume a certain code of ethics to adhere to so that making a system more ethical has meaning.

A second point of difficulty lies on the fact that current Language Models can be tricked into generating text that their creators didn't intend. So whatever mechanism is used to try to steer Open Assistant's outputs to be ethically coherent, the developer should be aware they might  be circumvented.

A third point of difficulty arises from ethical paradoxes. Codes of ethics are usually not consistent and there can be contradictions when following them. The developer shuold be aware that even humans are contradictory and some situations are undecidable within a given code of ethics.

Having said that, some ideas to improve ethics from the system actions, could be:
- Ensure the training examples conform to what the developer would exepect to be considered ethical
- Create a separate model that judges how ethical a given Open Assistant response is, and act accordingly

Finally to improve on how truthful the Open Assitant system is, the developer should also make sure that examples used to train the model are truthful. This is usually not enough, as it is common for language models to hallucinate at inference time, and the model has no intrisic concept of truth.

Some ideas to improve on truthful responses are:
- Ensure training examples are truthful
- Allow the model to have access to an oracle in order to get truthful responses (e.g. allow the model to query for the current date)

Yes these three properties can contradict each other. A simple example is if the prompter requests the system to do something unethical. In this scenario, if the system response is ethical it would likely not be useful to the prompter.