Future-House · jamesbraza · Jan 8, 2025 · Jan 8, 2025 · Jan 8, 2025 · Jan 8, 2025
diff --git a/README.md b/README.md
@@ -38,6 +38,7 @@ question answering, summarization, and contradiction detection.
   - [Reusing Index](#reusing-index)
   - [Running on LitQA v2](#running-on-litqa-v2)
   - [Using Clients Directly](#using-clients-directly)
+- [Settings Cheatsheet](#settings-cheatsheet)
 - [Where do I get papers?](#where-do-i-get-papers)
   - [Zotero](#zotero)
   - [Paper Scraper](#paper-scraper)
@@ -788,6 +789,71 @@ details = await client.query(
 
 will return much faster than the first query and we'll be certain the authors match.
 
+## Settings Cheatsheet
+
+| Setting                                      | Default                                | Description                                                                                             |
+| -------------------------------------------- | -------------------------------------- | ------------------------------------------------------------------------------------------------------- |
+| `llm`                                        | `"gpt-4o-2024-08-06"`                  | Default LLM for most things, including answers. Should be 'best' LLM.                                   |
+| `llm_config`                                 | `None`                                 | Optional configuration for `llm`.                                                                       |
+| `summary_llm`                                | `"gpt-4o-2024-08-06"`                  | Default LLM for summaries and parsing citations.                                                        |
+| `summary_llm_config`                         | `None`                                 | Optional configuration for `summary_llm`.                                                               |
+| `embedding`                                  | `"text-embedding-3-small"`             | Default embedding model for texts.                                                                      |
+| `embedding_config`                           | `None`                                 | Optional configuration for `embedding`.                                                                 |
+| `temperature`                                | `0.0`                                  | Temperature for LLMs.                                                                                   |
+| `batch_size`                                 | `1`                                    | Batch size for calling LLMs.                                                                            |
+| `texts_index_mmr_lambda`                     | `1.0`                                  | Lambda for MMR in text index.                                                                           |
+| `verbosity`                                  | `0`                                    | Integer verbosity level for logging (0-3). 3 = all LLM/Embeddings calls logged.                         |
+| `answer.evidence_k`                          | `10`                                   | Number of evidence pieces to retrieve.                                                                  |
+| `answer.evidence_detailed_citations`         | `True`                                 | Include detailed citations in summaries.                                                                |
+| `answer.evidence_retrieval`                  | `True`                                 | Use retrieval vs processing all docs.                                                                   |
+| `answer.evidence_summary_length`             | `"about 100 words"`                    | Length of evidence summary.                                                                             |
+| `answer.evidence_skip_summary`               | `False`                                | Whether to skip summarization.                                                                          |
+| `answer.answer_max_sources`                  | `5`                                    | Max number of sources for an answer.                                                                    |
+| `answer.max_answer_attempts`                 | `None`                                 | Max attempts to generate an answer.                                                                     |
+| `answer.answer_length`                       | `"about 200 words, but can be longer"` | Length of final answer.                                                                                 |
+| `answer.max_concurrent_requests`             | `4`                                    | Max concurrent requests to LLMs.                                                                        |
+| `answer.answer_filter_extra_background`      | `False`                                | Whether to cite background info from model.                                                             |
+| `answer.get_evidence_if_no_contexts`         | `True`                                 | Allow lazy evidence gathering.                                                                          |
+| `parsing.chunk_size`                         | `5000`                                 | Characters per chunk (0 for no chunking).                                                               |
+| `parsing.page_size_limit`                    | `1,280,000`                            | Character limit per page.                                                                               |
+| `parsing.use_doc_details`                    | `True`                                 | Whether to get metadata details for docs.                                                               |
+| `parsing.overlap`                            | `250`                                  | Characters to overlap chunks.                                                                           |
+| `parsing.defer_embedding`                    | `False`                                | Whether to defer embedding until summarization.                                                         |
+| `parsing.chunking_algorithm`                 | `ChunkingOptions.SIMPLE_OVERLAP`       | Algorithm for chunking.                                                                                 |
+| `parsing.doc_filters`                        | `None`                                 | Optional filters for allowed documents.                                                                 |
+| `parsing.use_human_readable_clinical_trials` | `False`                                | Parse clinical trial JSONs into readable text.                                                          |
+| `prompt.summary`                             | `summary_prompt`                       | Template for summarizing text, must contain variables matching `summary_prompt`.                        |
+| `prompt.qa`                                  | `qa_prompt`                            | Template for QA, must contain variables matching `qa_prompt`.                                           |
+| `prompt.select`                              | `select_paper_prompt`                  | Template for selecting papers, must contain variables matching `select_paper_prompt`.                   |
+| `prompt.pre`                                 | `None`                                 | Optional pre-prompt templated with just the original question to append information before a qa prompt. |
+| `prompt.post`                                | `None`                                 | Optional post-processing prompt that can access PQASession fields.                                      |
+| `prompt.system`                              | `default_system_prompt`                | System prompt for the model.                                                                            |
+| `prompt.use_json`                            | `True`                                 | Whether to use JSON formatting.                                                                         |
+| `prompt.summary_json`                        | `summary_json_prompt`                  | JSON-specific summary prompt.                                                                           |
+| `prompt.summary_json_system`                 | `summary_json_system_prompt`           | System prompt for JSON summaries.                                                                       |
+| `prompt.context_outer`                       | `CONTEXT_OUTER_PROMPT`                 | Prompt for how to format all contexts in generate answer.                                               |
+| `prompt.context_inner`                       | `CONTEXT_INNER_PROMPT`                 | Prompt for how to format a single context in generate answer. Must contain 'name' and 'text' variables. |
+| `agent.agent_llm`                            | `"gpt-4o-2024-08-06"`                  | Model to use for agent.                                                                                 |
+| `agent.agent_llm_config`                     | `None`                                 | Optional configuration for `agent_llm`.                                                                 |
+| `agent.agent_type`                           | `"ToolSelector"`                       | Type of agent to use.                                                                                   |
+| `agent.agent_config`                         | `None`                                 | Optional kwarg for AGENT constructor.                                                                   |
+| `agent.agent_system_prompt`                  | `env_system_prompt`                    | Optional system prompt message.                                                                         |
+| `agent.agent_prompt`                         | `env_reset_prompt`                     | Agent prompt.                                                                                           |
+| `agent.return_paper_metadata`                | `False`                                | Whether to include paper title/year in search tool results.                                             |
+| `agent.search_count`                         | `8`                                    | Search count.                                                                                           |
+| `agent.timeout`                              | `500.0`                                | Timeout on agent execution (seconds).                                                                   |
+| `agent.should_pre_search`                    | `False`                                | Whether to run search tool before invoking agent.                                                       |
+| `agent.tool_names`                           | `None`                                 | Optional override on tools to provide the agent.                                                        |
+| `agent.max_timesteps`                        | `None`                                 | Optional upper limit on environment steps.                                                              |
+| `agent.index.name`                           | `None`                                 | Optional name of the index.                                                                             |
+| `agent.index.paper_directory`                | `Current working directory`            | Directory containing papers to be indexed.                                                              |
+| `agent.index.manifest_file`                  | `None`                                 | Path to manifest CSV with document attributes.                                                          |
+| `agent.index.index_directory`                | `pqa_directory("indexes")`             | Directory to store PQA indexes.                                                                         |
+| `agent.index.use_absolute_paper_directory`   | `False`                                | Whether to use absolute paper directory path.                                                           |
+| `agent.index.recurse_subdirectories`         | `True`                                 | Whether to recurse into subdirectories when indexing.                                                   |
+| `agent.index.concurrency`                    | `5`                                    | Number of concurrent filesystem reads.                                                                  |
+| `agent.index.sync_with_paper_directory`      | `True`                                 | Whether to sync index with paper directory on load.                                                     |
+
 ## Where do I get papers?
 
 Well that's a really good question! It's probably best to just download PDFs of papers you think will help answer your question and start from there.