<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/docs/examples/query_engine/citation_query_engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="在 Colab 中打开"/></a>


# CitationQueryEngine（引文查询引擎）

本笔记本介绍了如何使用CitationQueryEngine（引文查询引擎）。

CitationQueryEngine（引文查询引擎）可以与任何现有的索引一起使用。


如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。


In [None]:
%pip install llama-index-embeddings-openai
%pip install llama-index-llms-openai

In [None]:
!pip install llama-index

## 设置


In [None]:
import os
from llama_index.llms.openai import OpenAI
from llama_index.core.query_engine import CitationQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

## 下载数据


In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

In [None]:
if not os.path.exists("./citation"):
    documents = SimpleDirectoryReader("./data/paul_graham").load_data()
    index = VectorStoreIndex.from_documents(
        documents,
    )
    index.storage_context.persist(persist_dir="./citation")
else:
    index = load_index_from_storage(
        StorageContext.from_defaults(persist_dir="./citation"),
    )

## 使用默认参数创建CitationQueryEngine


In [None]:
query_engine = CitationQueryEngine.from_args(    index,    similarity_top_k=3,    # 在这里我们可以控制引文来源的粒度， 默认值为512    citation_chunk_size=512,)

In [None]:
response = query_engine.query("What did the author do growing up?")

In [None]:
print(response)

Before college, the author worked on writing short stories and programming on an IBM 1401 using an early version of Fortran [1]. They later got a TRS-80 computer and wrote simple games, a program to predict rocket heights, and a word processor [2].


In [None]:
# 源节点为6，因为原始的大小为1024的节点被分成了更细粒度的节点print(len(response.source_nodes))

6


### 检查实际源代码
源代码从1开始计数，但是Python数组是从零开始计数的！

让我们确认源代码是否合理。


In [None]:
print(response.source_nodes[0].node.get_text())

Source 1:
What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.

The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.

The language we used was an early version of Fortran. You had to type programs on punch cards, then stack t

In [None]:
print(response.source_nodes[1].node.get_text())

Source 2:
[1]

The first of my friends to get a microcomputer built it himself. It was sold as a kit by Heathkit. I remember vividly how impressed and envious I felt watching him sitting in front of it, typing programs right into the computer.

Computers were expensive in those days and it took me years of nagging before I convinced my father to buy one, a TRS-80, in about 1980. The gold standard then was the Apple II, but a TRS-80 was good enough. This was when I really started programming. I wrote simple games, a program to predict how high my model rockets would fly, and a word processor that my father used to write at least one book. There was only room in memory for about 2 pages of text, so he'd write 2 pages at a time and then print them out, but it was a lot better than a typewriter.

Though I liked programming, I didn't plan to study it in college. In college I was going to study philosophy, which sounded much more powerful. It seemed, to my naive high school self, to be the s

## 调整设置

请注意，将块大小设置为大于节点的原始块大小将不会产生任何效果。

默认节点块大小为1024，因此在这里，我们不会使我们的引用节点更加细粒化。


In [None]:
query_engine = CitationQueryEngine.from_args(    index,    # 增加引文块的大小！    citation_chunk_size=1024,    similarity_top_k=3,)

In [None]:
response = query_engine.query("What did the author do growing up?")

In [None]:
print(response)

Before college, the author worked on writing short stories and programming on an IBM 1401 using an early version of Fortran [1].


In [None]:
# 现在应该有更少的源节点！print(len(response.source_nodes))

3


### 检查实际源码
源码从1开始计数，但是Python数组从零开始计数！

让我们确认源码是否合理。


In [None]:
print(response.source_nodes[0].node.get_text())

Source 1:
What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.

The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.

The language we used was an early version of Fortran. You had to type programs on punch cards, then stack t