In [1]:
with open("real_estate_sales_data.txt") as f:
    real_estate_sales = f.read()

### 使用 CharacterTextSplitter 来进行文本分割

- 基于单字符来进行文本分割（separator）
- 基于字符数来决定文本块长度（chunk_size）

参考示例：

```python
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(        
    separator = "\n\n",
    chunk_size = 1000,
    chunk_overlap  = 200,
    length_function = len,
    is_separator_regex = False,
)
```


In [2]:
from langchain.text_splitter import CharacterTextSplitter

In [3]:
text_splitter = CharacterTextSplitter(        
    separator = r'\d+\.',
    chunk_size = 100,
    chunk_overlap  = 0,
    length_function = len,
    is_separator_regex = True,
)

In [4]:
docs = text_splitter.create_documents([real_estate_sales])

Created a chunk of size 102, which is longer than the specified 100
Created a chunk of size 106, which is longer than the specified 100
Created a chunk of size 103, which is longer than the specified 100
Created a chunk of size 105, which is longer than the specified 100
Created a chunk of size 110, which is longer than the specified 100
Created a chunk of size 106, which is longer than the specified 100
Created a chunk of size 103, which is longer than the specified 100
Created a chunk of size 109, which is longer than the specified 100
Created a chunk of size 101, which is longer than the specified 100
Created a chunk of size 104, which is longer than the specified 100


In [5]:
docs[0]

Document(page_content='当然，我很乐意帮助你！这里有一些实用的销售话术，希望能对你的工作有所帮助：')

In [6]:
len(docs)

66

### 使用 Faiss 作为向量数据库，持久化存储房产销售 问答对（QA-Pair）

In [7]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS

db = FAISS.from_documents(docs, OpenAIEmbeddings())

  warn_deprecated(


In [8]:
query = "这款车的油耗如何？"

In [9]:
answer_list = db.similarity_search(query)

In [10]:
for ans in answer_list:
    print(ans.page_content + "\n")

[客户问题] "这款车的油耗如何？"
   [销售回答] "这款车型的油耗在城市道路上大约是XX升/百公里，高速路上大约是XX升/百公里，相对来说非常节能高效。"

[客户问题] "这款车的引擎性能如何？"
    [销售回答] "这款车配备了高效可靠的引擎，提供充足的动力输出和顺畅的驾驶体验，无论在城市道路还是高速公路上都表现出色。"

[客户问题] "这款车的加速性能如何？"
   [销售回答] "这款车配备了高效动力系统，加速响应迅速，无论是从静止到起步还是在超车时，都能表现出色。"

[客户问题] "我可以了解一下这款车的燃料类型和排放标准吗？"
   [销售回答] "这款车可以选择多种燃料类型，如汽油、柴油或者混合动力，同时符合最新的排放标准，保护环境并提升燃油经济性。"


In [11]:
db.save_local("real_estates_sale")