Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ question answering, summarization, and contradiction detection.
- [Reusing Index](#reusing-index)
- [Running on LitQA v2](#running-on-litqa-v2)
- [Using Clients Directly](#using-clients-directly)
- [Settings Cheatsheet](#settings-cheatsheet)
- [Where do I get papers?](#where-do-i-get-papers)
- [Zotero](#zotero)
- [Paper Scraper](#paper-scraper)
Expand Down Expand Up @@ -788,6 +789,71 @@ details = await client.query(

will return much faster than the first query and we'll be certain the authors match.

## Settings Cheatsheet

| Setting | Default | Description |
| -------------------------------------------- | -------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| `llm` | `"gpt-4o-2024-08-06"` | Default LLM for most things, including answers. Should be 'best' LLM. |
| `llm_config` | `None` | Optional configuration for `llm`. |
| `summary_llm` | `"gpt-4o-2024-08-06"` | Default LLM for summaries and parsing citations. |
| `summary_llm_config` | `None` | Optional configuration for `summary_llm`. |
| `embedding` | `"text-embedding-3-small"` | Default embedding model for texts. |
| `embedding_config` | `None` | Optional configuration for `embedding`. |
| `temperature` | `0.0` | Temperature for LLMs. |
| `batch_size` | `1` | Batch size for calling LLMs. |
| `texts_index_mmr_lambda` | `1.0` | Lambda for MMR in text index. |
| `verbosity` | `0` | Integer verbosity level for logging (0-3). 3 = all LLM/Embeddings calls logged. |
| `answer.evidence_k` | `10` | Number of evidence pieces to retrieve. |
| `answer.evidence_detailed_citations` | `True` | Include detailed citations in summaries. |
| `answer.evidence_retrieval` | `True` | Use retrieval vs processing all docs. |
| `answer.evidence_summary_length` | `"about 100 words"` | Length of evidence summary. |
| `answer.evidence_skip_summary` | `False` | Whether to skip summarization. |
| `answer.answer_max_sources` | `5` | Max number of sources for an answer. |
| `answer.max_answer_attempts` | `None` | Max attempts to generate an answer. |
| `answer.answer_length` | `"about 200 words, but can be longer"` | Length of final answer. |
| `answer.max_concurrent_requests` | `4` | Max concurrent requests to LLMs. |
| `answer.answer_filter_extra_background` | `False` | Whether to cite background info from model. |
| `answer.get_evidence_if_no_contexts` | `True` | Allow lazy evidence gathering. |
| `parsing.chunk_size` | `5000` | Characters per chunk (0 for no chunking). |
| `parsing.page_size_limit` | `1,280,000` | Character limit per page. |
| `parsing.use_doc_details` | `True` | Whether to get metadata details for docs. |
| `parsing.overlap` | `250` | Characters to overlap chunks. |
| `parsing.defer_embedding` | `False` | Whether to defer embedding until summarization. |
| `parsing.chunking_algorithm` | `ChunkingOptions.SIMPLE_OVERLAP` | Algorithm for chunking. |
| `parsing.doc_filters` | `None` | Optional filters for allowed documents. |
| `parsing.use_human_readable_clinical_trials` | `False` | Parse clinical trial JSONs into readable text. |
| `prompt.summary` | `summary_prompt` | Template for summarizing text, must contain variables matching `summary_prompt`. |
| `prompt.qa` | `qa_prompt` | Template for QA, must contain variables matching `qa_prompt`. |
| `prompt.select` | `select_paper_prompt` | Template for selecting papers, must contain variables matching `select_paper_prompt`. |
| `prompt.pre` | `None` | Optional pre-prompt templated with just the original question to append information before a qa prompt. |
| `prompt.post` | `None` | Optional post-processing prompt that can access PQASession fields. |
| `prompt.system` | `default_system_prompt` | System prompt for the model. |
| `prompt.use_json` | `True` | Whether to use JSON formatting. |
| `prompt.summary_json` | `summary_json_prompt` | JSON-specific summary prompt. |
| `prompt.summary_json_system` | `summary_json_system_prompt` | System prompt for JSON summaries. |
| `prompt.context_outer` | `CONTEXT_OUTER_PROMPT` | Prompt for how to format all contexts in generate answer. |
| `prompt.context_inner` | `CONTEXT_INNER_PROMPT` | Prompt for how to format a single context in generate answer. Must contain 'name' and 'text' variables. |
| `agent.agent_llm` | `"gpt-4o-2024-08-06"` | Model to use for agent. |
| `agent.agent_llm_config` | `None` | Optional configuration for `agent_llm`. |
| `agent.agent_type` | `"ToolSelector"` | Type of agent to use. |
| `agent.agent_config` | `None` | Optional kwarg for AGENT constructor. |
| `agent.agent_system_prompt` | `env_system_prompt` | Optional system prompt message. |
| `agent.agent_prompt` | `env_reset_prompt` | Agent prompt. |
| `agent.return_paper_metadata` | `False` | Whether to include paper title/year in search tool results. |
| `agent.search_count` | `8` | Search count. |
| `agent.timeout` | `500.0` | Timeout on agent execution (seconds). |
| `agent.should_pre_search` | `False` | Whether to run search tool before invoking agent. |
| `agent.tool_names` | `None` | Optional override on tools to provide the agent. |
| `agent.max_timesteps` | `None` | Optional upper limit on environment steps. |
| `agent.index.name` | `None` | Optional name of the index. |
| `agent.index.paper_directory` | `Current working directory` | Directory containing papers to be indexed. |
| `agent.index.manifest_file` | `None` | Path to manifest CSV with document attributes. |
| `agent.index.index_directory` | `pqa_directory("indexes")` | Directory to store PQA indexes. |
| `agent.index.use_absolute_paper_directory` | `False` | Whether to use absolute paper directory path. |
| `agent.index.recurse_subdirectories` | `True` | Whether to recurse into subdirectories when indexing. |
| `agent.index.concurrency` | `5` | Number of concurrent filesystem reads. |
| `agent.index.sync_with_paper_directory` | `True` | Whether to sync index with paper directory on load. |

## Where do I get papers?

Well that's a really good question! It's probably best to just download PDFs of papers you think will help answer your question and start from there.
Expand Down
Loading
Loading