# Advanced LLaMA Index

## Retrievers

#### 索引方式
- List Index:将node储存在一个序列中，node1->node2->node3,最后将所有节点放在response synthesis中，不会生成embedding所以速度很快
- Vector Store Index：在索引过程中生成embedding存入vector中，查询该token最相似的top k的nodes
- Tree Index：在查询时生成embedding，查询时从根节点向下遍历到叶节点，需要关注`child_branch_factor=...`的数量
- Keyword Table Index：构建关于关键字与对应节点的映射关系
- Summary Index: 很适合QA系统的开发，


## Routers

- 怎么从几个候选答案中找到>1个更加合适的选择
- 以selector modules或query engine或retriever的形式出现
- 选择是summary index query engine文本摘要还是vector index query engine语义搜索
- 选择是combine results together还是try out a bunch of choices at once
  - 有的时候会使用MRR(Mean Reciprocal Rank)来判断chunk的重要性

#### Selector
-

In [3]:
!pip install llama-index



In [4]:
import nest_asyncio
nest_asyncio.apply()
from llama_index import Document, VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.tools import RetrieverTool, ToolMetadata
from llama_index.tools.query_engine import QueryEngineTool
from llama_index.query_engine.router_query_engine import RouterQueryEngine
from llama_index.selectors.llm_selectors import LLMSingleSelector, LLMMultiSelector
from llama_index.selectors.pydantic_selectors import PydanticSingleSelector, PydanticMultiSelector

In [15]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2023-11-14 05:29:07--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’


2023-11-14 05:29:07 (2.78 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



In [17]:
from llama_index.llms import OpenAI
llm=OpenAI(model="gpt-3.5-turbo", api_key='  ', api_base='  ')

In [18]:
from llama_index import download_loader, ServiceContext
service_context=ServiceContext.from_defaults(llm=llm)

ValueError: ignored

### Router Query Engine
- select one from multiple

## Node PostProcessors

- Retrieve后，进行transformation, 过滤，和重新排序
- 经常和query engine在一起用

In [7]:
from llama_index.indices.postprocessor import SimilarityPostprocessor, CohereRerank, TimeWeightedPostprocessor
# SimilarityPostprocessor: 将低于设置的相似度分数门槛的nodes移除
# KeyNodePostprocessor: 确保specfic nodes包含、不包含在里面
# CohereRerank: 重新排序，returns Top N nodes
# TimeWeightedPostprocessor: return top K nodes with its recorded time

In [8]:
Similaritypp=SimilarityPostprocessor(similarity_cutoff=0.9)

In [9]:
from llama_index.indices.postprocessor.node import KeywordNodePostprocessor
KeywordNP=KeywordNodePostprocessor(required_keywords=['big'], exclude_keyword=['small'])

In [10]:
!pip install cohere
CR=CohereRerank(top_n=2,
        model="rerank-english-v2.0",
        api_key="sk-aJzbu0F3j7bstWlR3e4cA9Db59Ac4f669a9f471aFa66C458")

Collecting cohere
  Downloading cohere-4.34-py3-none-any.whl (48 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.2/48.2 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
Collecting backoff<3.0,>=2.0 (from cohere)
  Downloading backoff-2.2.1-py3-none-any.whl (15 kB)
Collecting fastavro==1.8.2 (from cohere)
  Downloading fastavro-1.8.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.7/2.7 MB[0m [31m31.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: fastavro, backoff, cohere
Successfully installed backoff-2.2.1 cohere-4.34 fastavro-1.8.2


In [11]:
TWP=TimeWeightedPostprocessor(time_decay=0.5, top_k=3, time_access_refresh=False)

In [None]:
from llama_index import ServiceContext
from llama_index.indices.postprocessor import EmbeddingRecencyPostprocessor
# EmbeddingRecencyPostprocessor:按照date排序及移除较旧（时间较早）nodes和在相似度门槛以下的，后返回top k个节点
# FixedRecencyPostprocessor: 返回按日期排序的节点
ERP=EmbeddingRecencyPostprocessor(service_context=service_context, date_key='2023-03-01'， top_k=2)
# service context:a bundle在indexing或querying阶段经常使用的资源，有一个设置的global configuration
  # llm：default设置为gpt-3.5-turbo
  # embed_model: 默认设置为BAAI/bge-small-en
  # node_sparser: convert document into nodes
  # prompt_helper
from llama_index import set_global_service_context
set_global_service_context(service_context)

In [12]:
from llama_index.indices.postprocessor import LongContextReorder
# 当最有价值的数据放在input的开头或结尾，解决上下文扩展问题

## Response Synthesizer合成器
- a query 和对应的set of retrieved text chunks
- 实现方式：
  - 遍历，使用树结构
  - 在retreiver或者node postprocessor之后

In [14]:
from llama_index.schema import Node, NodeWithScore
from llama_index import get_response_synthesizer
RS=get_response_synthesizer(structured_answer_filtering=True, response_mode="refine")
# Refine: 一个node一个node的遍历文本，精炼答案one by one。如果chunk过大，使用TokenTextSplitter
# Compact: 将文本集中在一起提炼，需要LLM处理的步骤就减少了

In [None]:
from llama_index.response_synthesizers import TreeSummarize, Refine
# Tree Summarize: 自下向上把2个小的答案结合成一个 最后来一个总结
summarizer=Refine(service_context=service_context, verbose=True)
response=summarizer.get_response("Who is Paul Graham?",[text])
print(response)

In [None]:
from llama_index.node_parser import SimpleNodeParser
# 如果针对特定的范围进行嵌入，需要加上 SentenceWindowNodeParser将文档分为单独句子，并捕获周围自己的窗口
# 将文档转化成nodes，同时保持与其他node的索引结构
parser=SimpleNodeParser()
nodes=parser.get_nodes_from_documents(documents)

## Query Engine

In [None]:
query_engine=index.as_query_engine(streaming=True)
response=query_engine.query("Your Question").print_response_stream()

## Chat Engine

In [None]:
query_engine=index.as_chat_engine(chat_modes=...)
# we can choose from react, openai, best, context, condense_question:从过去的检索信息中看有无可以覆盖到的, simple：直接问LLM 不会涉及到query engine
response=query_engine.chat("Your Question")