# Retrieve

In [1]:

import { load } from 'dotenv'
const env = await load()

const process = {
    env
}

In [5]:
import { TextLoader } from 'langchain/document_loaders/fs/text'
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter'

const loader = new TextLoader('data/kong.txt')
const docs = await loader.load()

const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: 100,
    chunkOverlap: 20
})

const splitDocs = await splitter.splitDocuments(docs)

console.log(splitDocs)

[
  Document {
    pageContent: "鲁镇的酒店的格局，是和别处不同的：都是当街一个曲尺形的大柜台，柜里面预备着热水，可以随时温酒。做工的人，傍午傍晚散了工，每每花四文铜钱，买一碗酒，——这是二十多年前的事，现在每碗要涨到十文，——靠柜外",
    metadata: { source: "data/kong.txt", loc: { lines: { from: 1, to: 1 } } }
  },
  Document {
    pageContent: "年前的事，现在每碗要涨到十文，——靠柜外站着，热热的喝了休息；倘肯多花一文，便可以买一碟盐煮笋，或者茴香豆，做下酒物了，如果出到十几文，那就能买一样荤菜，但这些顾客，多是短衣帮，大抵没有这样阔绰。只有",
    metadata: { source: "data/kong.txt", loc: { lines: { from: 1, to: 1 } } }
  },
  Document {
    pageContent: "顾客，多是短衣帮，大抵没有这样阔绰。只有穿长衫的，才踱进店面隔壁的房子里，要酒要菜，慢慢地坐喝。",
    metadata: { source: "data/kong.txt", loc: { lines: { from: 1, to: 1 } } }
  },
  Document {
    pageContent: "我从十二岁起，便在镇口的咸亨酒店里当伙计，掌柜说，我样子太傻，怕侍候不了长衫主顾，就在外面做点事罢。外面的短衣主顾，虽然容易说话，但唠唠叨叨缠夹不清的也很不少。他们往往要亲眼看着黄酒从坛子里舀出，看",
    metadata: { source: "data/kong.txt", loc: { lines: { from: 3, to: 3 } } }
  },
  Document {
    pageContent: "。他们往往要亲眼看着黄酒从坛子里舀出，看过壶子底里有水没有，又亲看将壶子放在热水里，然后放心：在这严重监督下，羼水也很为难。所以过了几天，掌柜又说我干不了这事。幸亏荐头的情面大，辞退不得，便改为专管温",
    metadata: { source: "data/kong.txt", loc: { l

In [7]:
import { OpenAIEmbeddings } from '@langchain/openai'
const embeddings = new OpenAIEmbeddings()

In [9]:
console.log(splitDocs[0])

Document {
  pageContent: "鲁镇的酒店的格局，是和别处不同的：都是当街一个曲尺形的大柜台，柜里面预备着热水，可以随时温酒。做工的人，傍午傍晚散了工，每每花四文铜钱，买一碗酒，——这是二十多年前的事，现在每碗要涨到十文，——靠柜外",
  metadata: { source: "data/kong.txt", loc: { lines: { from: 1, to: 1 } } }
}


In [10]:
const res = await embeddings.embedQuery(splitDocs[0].pageContent)
console.log(res)

[
    0.041053332,  -0.026278032,   -0.01042348,   0.010131038, -0.013737827,
    -0.03409042,  -0.012728204, -0.0056643295, -0.0068689133, -0.029912673,
   -0.009908224,    0.07703766,  -0.041443255,  -0.017086986,  0.023938494,
   -0.030274743,  0.0010836032,  -0.018618828,  -0.026723659,   0.04255732,
    0.024704413,   0.013278274,  -0.018799864,  0.0060089934,  0.024356268,
   0.0037669356,  -0.024300564,  -0.019663265,  -0.009580968,  0.021167254,
  0.00012805231,   0.016432473,   0.012004061,   -0.00592892,  -0.02691862,
   -0.012714278,    0.03322702,   0.080992594,    0.05522982,  0.012484502,
    -0.00571307,  0.0062596584,  -0.023604274,  -0.038379572,  -0.00490189,
    -0.06745669, -0.0021463179, -0.0004356172,  0.0040176003, 0.0033126057,
   -0.014315748,  -0.018493496,   0.017435133,   -0.03863024,  0.023075093,
   -0.005218703,  -0.015053817,    0.05194333,   0.004981964, -0.020596296,
    -0.00523611,  -0.035037376,   -0.02761491,  -0.022838352,  -0.02084696,
    0.0153

## Memory Vector Store

Vector store is a data structure that stores vectors in a memory-efficient way. It is used in many machine learning algorithms, such as word2vec, doc2vec, etc. In this problem, you are asked to implement a vector store that supports the following operations:

OpenAI provides a memory vector store that supports to store vectors and retrieve the most similar vectors to a given vector. 

In [14]:
import { MemoryVectorStore } from 'langchain/vectorstores/memory'
const vectorStore = new MemoryVectorStore(embeddings)

await vectorStore.addDocuments(splitDocs)

In [15]:
// Use retrieval to find similar documents

const retriever = vectorStore.asRetriever(2) // return top 2 results

const res = await retriever.invoke('茴香豆的作用是什么？')

console.log(res)

[
  Document {
    pageContent: "有几回，邻居孩子听得笑声，也赶热闹，围住了孔乙己。他便给他们一人一颗。孩子吃完豆，仍然不散，眼睛都望着碟子。孔乙己着了慌，伸开五指将碟子罩住，弯腰下去说道，“不多了，我已经不多了。”直起身又看一看豆",
    metadata: { source: "data/kong.txt", loc: { lines: { from: 15, to: 15 } } }
  },
  Document {
    pageContent: "不多了，我已经不多了。”直起身又看一看豆，自己摇头说，“不多不多！多乎哉？不多也。”于是这一群孩子都在笑声里走散了。",
    metadata: { source: "data/kong.txt", loc: { lines: { from: 15, to: 15 } } }
  }
]


In [17]:
const res = await retriever.invoke('孔乙己是谁？')

console.log(res)

[
  Document {
    pageContent: "孔乙己是站着喝酒而穿长衫的唯一的人。他身材很高大；青白脸色，皱纹间时常夹些伤痕；一部乱蓬蓬的花白的胡子。穿的虽然是长衫，可是又脏又破，似乎十多年没有补，也没有洗。他对人说话，总是满口之乎者也，叫人半",
    metadata: { source: "data/kong.txt", loc: { lines: { from: 7, to: 7 } } }
  },
  Document {
    pageContent: "孔乙己是这样的使人快活，可是没有他，别人也便这么过。",
    metadata: { source: "data/kong.txt", loc: { lines: { from: 17, to: 17 } } }
  }
]


## Multi Query Retriever

The multi query retriever is a system that retrieves the most similar vectors to a given query vector. The system is implemented using a memory vector store. 

> To solve the disadavantages of LLM is to import more LLM.

In [None]:
const directory = 'db/kongyiji'
const embeddings = new OpenAIEmbeddings()
const vectorStore = new MemoryVectorStore(embeddings)