Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama3 does not return pure json #521

Closed
barsuna opened this issue May 21, 2024 · 5 comments
Closed

llama3 does not return pure json #521

barsuna opened this issue May 21, 2024 · 5 comments

Comments

@barsuna
Copy link

barsuna commented May 21, 2024

Testing gpt-researcher with llama3, i found that 3 times out of 4 llama3 will respond with json + some verbiage to prompt in generate_search_queries_prompt.

Not sure it is worth changing the prompt for sake of llama3 alone, but for documentation purposes here is the updated prompt that seems to work every time

before:

f'You must respond with a list of strings in the following format: ["query 1", "query 2", "query 3"].'

after

f'Your response must include list of the query strings in json format and nothing else. For example: ["query 1", "query 2", "query 3"]'
@Dilip-17
Copy link

Dilip-17 commented May 22, 2024

Hey @barsuna. I was searching how to use llama with gpt researcher and stumbled upon this post. If possible, could you tell me how to get gpt researcher to work with llama 3?

@barsuna
Copy link
Author

barsuna commented May 23, 2024

@Dilip-17 there was same question on another issue, i added some pointers there

#520

the challenge is mostly not how to run, but having the gpu memory necessary to run llama3 - even the borderline usable (imo, opinions are divided on this) 4-bit quantized 70b model takes about ~43GB, i'd recommend Q6 which is close to 60GB

@assafelovic
Copy link
Owner

Hey working with different LLMs (other than the default OpenAI) required extra manual tweaking. Would love to learn from your experience if you find ways to make the code more generic!

@barsuna
Copy link
Author

barsuna commented May 28, 2024

To its credit, llama3 worked pretty much out of box with gpt-researcher (the only tweak needed was the prompt change above). It seems it is possible to stretch the context window to 16k without tuning (though i've done very limited testing of that).

So far progress with llama3 was difficult for things requiring function calling and in-prompt memory - autonomous agents, with single or 1 by 1 prompting agents things seem to be better.

Of course the main challenge remains the quality of reports, i'm currently trying to compare llama3 vs gpt4, it seems both are challenged somewhat and my belief is the likely direction to solve this is to balance automation/augmentation - let user do more if they wished.

Havent measured quality of embeddings and its impact on quality of report much either.

@assafelovic
Copy link
Owner

Great thank you for the feedback @barsuna ! Closing for now but feel free to open new threads if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants