Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document the meaning/impact of configuration options on report quality #522

Open
barsuna opened this issue May 21, 2024 · 1 comment
Open

Comments

@barsuna
Copy link

barsuna commented May 21, 2024

Having done some testing i wonder how does one influence the quality of the report via configuration, what are best practices if any

I.e. what is the impact of various knobs available in the config:

  • what has larger impact on the substance of report: smarter model or better search engine (i.e. choosing google vs ddg) and of course the keywords in the topic. Looking at keywords generated, some of them seem decent even with llama3 (testing now with gpt3/claude)... so tend to gravitate to the better search engine having more impact

  • FAST_TOKEN_LIMIT/SMART_TOKEN_LIMIT the bigger the better? With local models tokens are cheap, so perhaps we can foresee some 'high token count' mode? With llama3 i used 6k and 8k for both, but not sure i can see perceivable difference in the report. If one would use 32k token or more (like latest mixtral8x22 has iirc 64k tokens window) one could possibly put more material into the window to generate from?

  • SUMMARY_TOKEN_LIMIT - the bigger the better? Any relations that needs to be respected vs FAST_TOKEN_LIMIT ?

  • TOTAL_WORDs - less content thrown away, possibly meaning for colorful reports? Should this be a fraction of SMART_TOKEN_LIMIT?

  • TEMPERATURE - 0 more pedantic vs larger number more colorful? Having glanced at some queries, i seem to recall this was always 0, perhaps not used currently?

One of the highly desirable outcomes i'm looking for is how to control the level of conversation in the reports - i.e. for beginner in the given field vs for the expert in the given field. It seems even though the system prompt always says the model is the expert etc - a lot of general fluff is scooped up by search, i wonder if there is good way or best practice to influence that, perhaps adding some qualifications to the topic, to avoid trivialities etc. Or perpaps have a different sets of prompts for different target audiences... or add extra llm call to classify retrieved content on intro/middle/expert level or perhaps let the user curate subtopics / queries / search results .. before accepting them into 'processing'

Thank you for the awesome project!

@ElishaKay
Copy link
Contributor

ElishaKay commented May 22, 2024

@barsuna great energies and direction

Try Lang Smith:

https://smith.langchain.com/

I know at least for the multi_agents feature, as long as you add a langchain api key to your .env file, you'll get a rich log of the input and output for every step of the backend server process (pretty awesome)

@hwchase17 are those Langsmith logs out of the box for every gptResearcher report type or just specific ones?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants