Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bias of ggml-alpaca-7b-q4.bin #34

Closed
jellomaster opened this issue Mar 17, 2023 · 3 comments
Closed

Bias of ggml-alpaca-7b-q4.bin #34

jellomaster opened this issue Mar 17, 2023 · 3 comments

Comments

@jellomaster
Copy link

Start by asking: Is Hillary Clinton good?
Follow with: Is Donald Trump good?
and after that: Is Joe Biden good?

@anzz1
Copy link

anzz1 commented Mar 17, 2023

You have a fundamental misunderstanding on how natural language AI works. The "AI" is not a sentient being, it does not have any inherent bias. Any bias that the AI exhibits is a result from the model and a direct function to whatever bias the source material has which the AI was trained on. The model used in alpaca.cpp is simply an quantized (you can think of it as compression which essentially takes shortcuts, reducing the amount of resources required but it also might reduce the quality of output) version of the LLaMA 7B model from Meta fine-tuned with the instruction-following dataset from Stanford-Alpaca which makes it better at answering prompts.

As seen in the LLaMA model research paper, this is the data where the model was trained on:

image

Blog post about LLaMA: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
Research paper: https://arxiv.org/abs/2302.13971

Stanford-Alpaca fine-tuning data: https://github.com/tatsu-lab/stanford_alpaca#data-release

Having biases in AI models is a well-known problem which I'm not sure is even a solvable one as long as the source material used is produced by humans. As you might know, humans are known to have biases and it is close to impossible to source tons of written material which wouldn't have any biases whatsoever. Humans are flawed so inherently any language models will be flawed too. This can be alleviated by manually fine-tuning the models to have "less bias" but that only leads to the model having whatever biases the one doing the fine-tuning would hold as every decision a human makes does include their biases too, unconscious or not. Even the concept of "bias" can mean different things to different people.

At the very fundamental level and very generally speaking, AI language models do not generate "truth" but they rather generate "consensus". As I do not want this to dwelve into a discussion about politics or the human condition, it's best left as an exercise to the reader to think on how consensus does not equal truth. I am not saying it's impossible to eventually create an AI model which only generates an objective truth, by for example training it only on verifiable scientific data, but I am saying that this model isn't it.

TL;DR; Do not except this or any other AI models for that matter to generate only the truth or have no biases. Quite the contrary, expect them to be wrong, have biases, and lie, just like any other written work by a human can do.

@fastrocket
Copy link

Gpt-4 has fixed some of the anti-Trump bias. You can now ask for articles on why Trump and MAGA are good directly. So it's doable.

@anzz1
Copy link

anzz1 commented Mar 21, 2023

Gpt-4 has fixed some of the anti-Trump bias. You can now ask for articles on why Trump and MAGA are good directly. So it's doable.

You really think that is "fixing" it? You missed the point entirely. Please read the comment again and apply some thought. To be perfectly clear, that is a rhetorical question and no further reply is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants