Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is LLaMA response to queries in the conversation so wrong? #466

Open
Harsh-raj opened this issue Oct 31, 2023 · 0 comments
Open

Why is LLaMA response to queries in the conversation so wrong? #466

Harsh-raj opened this issue Oct 31, 2023 · 0 comments

Comments

@Harsh-raj
Copy link

Harsh-raj commented Oct 31, 2023

I have made a cumulative prompt structure for doing conversation with LLaMA-2 using lit-llama repository such that it can remember context of previous prompts and their responses from the conversation. Here is a snippet of a conversation with LLaMA which had bizzare responses.

Loading model ...
Time to load model: 20.19 seconds.
Prompt:Name all the districts in the Indian state of Bihar.
LLaMA: There are 38 districts in the Indian state of Bihar. The districts are:

1. Aurangabad
2. Begusarai
3. Bhagalpur
4. Bhojpur
5. Buxar
6. Darbhanga
7. Gaya
8. Hajipur
9. Jamui
10. Khagaria
11. Kishanganj
12. Lakhisarai
13. Madhubani
14. Munger
15. Nalanda
16. Nawada
17. Patna
18. Purnia
19. Rohtas
20. Samastipur
21. Saran
22. Sheohar
23. Sitamarhi
24. Siwan
25. Supaul
26. Vaishali
27. West Champaran
28. East Champaran
29. Sitamarhi
30. Darbhanga
31. Khagaria
32. Begusarai
33. Purnia
34. Kishanganj
35. Aurangabad
36. Gaya
37. Lakhisarai
38. Nalanda


Time for inference: 8.09 sec total, 126.64 tokens/sec
Memory used: 14.12 GB
Prompt:Name all the ministers in the cabinet of prime minister of India
LLaMA: India has several hill stations, including:

1. Shimla - Himachal Pradesh
2. Darjeeling - West Bengal
3. Gangtok - Sikkim
4. Kodaikanal - Tamil Nadu
5. Ooty - Tamil Nadu
6. Kashmir - Jammu and Kash -


Time for inference: 18.84 sec total, 54.35 tokens/sec
Memory used: 14.23 GB
Prompt:

======================================================================================
One explaination for this could be somehow the response form previous prompt is getting printed here in the next prompt. Why is this happening?

Following is the conversation loop i used in the lora.py script in generate folder of the repo.

conversation = ""
    tokenizer = Tokenizer(tokenizer_path)
    while True:
        prompt = input("Prompt:")  # type: ignore
        if not prompt:
            break
        sample = {"instruction": prompt, "input": input_inst}
        
        #prompt = generate_prompt(sample)
        conversation = generate_conversation(sample, conversation) #testing for conversation
        # print(f"\n\n{conversation}\n\n")
        #encoded = tokenizer.encode(prompt, bos=True, eos=False, device=model.device)
        encoded = tokenizer.encode(conversation, bos=True, eos=False, device=model.device) #testing for conversation
        
        t0 = time.perf_counter()
        output = generate(
            model,
            idx=encoded,
            max_seq_length=max_new_tokens,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            top_k=top_k,
            eos_id=tokenizer.eos_id
        )
        t = time.perf_counter() - t0

        output = tokenizer.decode(output)
        output = output.split("### Response:")[-1].strip()
        print(f"LLaMA: {output}")
        
        conversation+=f" {output}\n\n" #testing for conversation

        print(f"\n\nTime for inference: {t:.02f} sec total, {max_new_tokens / t:.02f} tokens/sec", file=sys.stderr)
        if fabric.device.type == "cuda":
            print(f"Memory used: {torch.cuda.max_memory_reserved() / 1e9:.02f} GB", file=sys.stderr)

also i used custom function generate_conversation in perpare_alpaca.py script

def generate_conversation(example, conversation: str):
    """Custom function to generate a message to do conversation with the model by making it remember the previous prompts"""
    if example["input"]:
        conversation+=f"### Instruction:\n{example['instruction']}\n\n### Input:\n{example['input']}\n\n### Response:"
        return (
            #"Below is an instruction that describes a task, paired with an input that provides further context. "
            #"Write a response that appropriately completes the request.\n\n"
            conversation
        )
    conversation+=f"### Instruction:\n{example['instruction']}\n\n### Response:"
    return (
        #"Below is an instruction that describes a task. "
        #"Write a response that appropriately completes the request.\n\n"
        conversation
    )

rest of the code is not touched in the repository, I am first doing fine-tuning of the LLaMA model then i am trying to do conversation. Please help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant