# Chroma
개발자의 생산성에 초점을 맞춘 AI 기반의 오픈 소스 벡터 데이터베이스

In [1]:
# 경고 메시지 무시
import warnings

warnings.filterwarnings("ignore")

In [105]:
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import Chroma

from langchain.embeddings import HuggingFaceBgeEmbeddings


# 텍스트를 600자 단위로 분할
text_splitter = CharacterTextSplitter(chunk_size=600, chunk_overlap=0)
# TextLoader 를 통해 텍스트 파일을 로드
split_docs = TextLoader("data/langchainPaper.txt").load_and_split(text_splitter)

#################
# embedding model
#################
model_name = 'sentence-transformers/all-mpnet-base-v2' #'BM-K/KoSimCSE-roberta'
model_kwargs = {'device': 'mps'}
encode_kwargs = {'normalize_embeddings': True}

embeddings_model = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

# Chroma 를 통해 벡터 저장소 생성
chroma_db = Chroma.from_documents(split_docs, embeddings_model)


In [29]:
similar_docs = chroma_db.similarity_search("Langchain 잘 사용하는 방법 알려줘.")
print(similar_docs[0].page_content)

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


Abstract— Mental health challenges are on the rise in our 
modern society, and the imperative to address mental disorders, 
especially regarding anxiety, depression, and suicidal thoughts, 
underscores the need for effective interventions. This paper 
delves into the application of recent advancements in pretrained 
contextualized language models to introduce MindGuide, an 
innovative chatbot serving as a mental health assistant for 
individuals seeking guidance and support in these critical areas. 
MindGuide leverages the capabilities of LangChain and its 
ChatModels, specifically ChatOpenAI, as the bedrock of its 
reasoning engine. The system incorporates key features such as 
LangChain's ChatPrompt Template, HumanMessage Prompt 
Template, 
ConversationBufferMemory, 
and 
LLMChain, 
creating an advanced solution for early detection and 
comprehensive support within the field of mental health. 
Additionally, the paper discusses the implementation of 
Streamlit to enhance the user expe

In [30]:
similar_docs

[Document(page_content='Abstract— Mental health challenges are on the rise in our \nmodern society, and the imperative to address mental disorders, \nespecially regarding anxiety, depression, and suicidal thoughts, \nunderscores the need for effective interventions. This paper \ndelves into the application of recent advancements in pretrained \ncontextualized language models to introduce MindGuide, an \ninnovative chatbot serving as a mental health assistant for \nindividuals seeking guidance and support in these critical areas. \nMindGuide leverages the capabilities of LangChain and its \nChatModels, specifically ChatOpenAI, as the bedrock of its \nreasoning engine. The system incorporates key features such as \nLangChain\'s ChatPrompt Template, HumanMessage Prompt \nTemplate, \nConversationBufferMemory, \nand \nLLMChain, \ncreating an advanced solution for early detection and \ncomprehensive support within the field of mental health. \nAdditionally, the paper discusses the implementa

In [31]:
# retriever 생성
retriever = chroma_db.as_retriever() # search_kwargs={"k": 1})

# similarity_search 를 통해 유사도 높은 1개 문서를 검색
relevant_docs = retriever.get_relevant_documents("langchain 에 대하여 알려줘")

print(f"문서의 개수: {len(relevant_docs)}")
print("[검색 결과]\n")
print(relevant_docs[0].page_content)


  warn_deprecated(
Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


문서의 개수: 3
[검색 결과]

Abstract— Mental health challenges are on the rise in our 
modern society, and the imperative to address mental disorders, 
especially regarding anxiety, depression, and suicidal thoughts, 
underscores the need for effective interventions. This paper 
delves into the application of recent advancements in pretrained 
contextualized language models to introduce MindGuide, an 
innovative chatbot serving as a mental health assistant for 
individuals seeking guidance and support in these critical areas. 
MindGuide leverages the capabilities of LangChain and its 
ChatModels, specifically ChatOpenAI, as the bedrock of its 
reasoning engine. The system incorporates key features such as 
LangChain's ChatPrompt Template, HumanMessage Prompt 
Template, 
ConversationBufferMemory, 
and 
LLMChain, 
creating an advanced solution for early detection and 
comprehensive support within the field of mental health. 
Additionally, the paper discusses the implementation of 
Streamlit to en

`Maximal Marginal Relevance(MMR)` 검색

관련성 높은 항목을 선택 하는 동시에 내용의 다양성을 유지 하려는 방식을 이해할 수 있습니다.

In [36]:
retriever = chroma_db.as_retriever(search_type="mmr", search_kwargs={"k": 2})
relevant_docs = retriever.get_relevant_documents(
    "langchain 에 대하여 알려줘")
print(f"문서의 개수: {len(relevant_docs)}")
print("[검색 결과]\n")
for i in range(len(relevant_docs)):
    print(relevant_docs[i].page_content)
    print("===" * 20)


Number of requested results 20 is greater than number of elements in index 3, updating n_results = 3


문서의 개수: 2
[검색 결과]

Abstract— Mental health challenges are on the rise in our 
modern society, and the imperative to address mental disorders, 
especially regarding anxiety, depression, and suicidal thoughts, 
underscores the need for effective interventions. This paper 
delves into the application of recent advancements in pretrained 
contextualized language models to introduce MindGuide, an 
innovative chatbot serving as a mental health assistant for 
individuals seeking guidance and support in these critical areas. 
MindGuide leverages the capabilities of LangChain and its 
ChatModels, specifically ChatOpenAI, as the bedrock of its 
reasoning engine. The system incorporates key features such as 
LangChain's ChatPrompt Template, HumanMessage Prompt 
Template, 
ConversationBufferMemory, 
and 
LLMChain, 
creating an advanced solution for early detection and 
comprehensive support within the field of mental health. 
Additionally, the paper discusses the implementation of 
Streamlit to en

# FAISS
Facebook AI Similarity Search  
밀집 벡터의 효율적인 유사도 검색과 클러스터링을 위한 라이브러리

In [75]:
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain.embeddings import HuggingFaceBgeEmbeddings
from langchain_community.embeddings import HuggingFaceEmbeddings



# TextSplitter 를 통해 텍스트를 500자 단위로 분할
text_splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=50)
# TextLoader 를 통해 텍스트 파일을 로드
split_docs = TextLoader("data/langchainPaper.txt").load_and_split(text_splitter)

model_name = 'sentence-transformers/all-mpnet-base-v2'
# 'BAAI/bge-large-en-v1.5' #'jinaai/jina-embeddings-v2-base-en' #'sentence-transformers/all-mpnet-base-v2' #'BM-K/KoSimCSE-roberta'
model_kwargs = {'device': 'mps'}
encode_kwargs = {'normalize_embeddings': False}

embeddings_model = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)
# FAISS 를 통해 벡터 저장소 생성
faiss_db = FAISS.from_documents(split_docs, embeddings_model)


In [76]:
# 유사도 검색(쿼리)
similar_docs = faiss_db.similarity_search("Langchain 에 대한 내용을 알려줘")

print(f"문서의 개수: {len(similar_docs)}")
print("[검색 결과]\n")
print(similar_docs[0].page_content)


문서의 개수: 1
[검색 결과]

Abstract— Mental health challenges are on the rise in our 
modern society, and the imperative to address mental disorders, 
especially regarding anxiety, depression, and suicidal thoughts, 
underscores the need for effective interventions. This paper 
delves into the application of recent advancements in pretrained 
contextualized language models to introduce MindGuide, an 
innovative chatbot serving as a mental health assistant for 
individuals seeking guidance and support in these critical areas. 
MindGuide leverages the capabilities of LangChain and its 
ChatModels, specifically ChatOpenAI, as the bedrock of its 
reasoning engine. The system incorporates key features such as 
LangChain's ChatPrompt Template, HumanMessage Prompt 
Template, 
ConversationBufferMemory, 
and 
LLMChain, 
creating an advanced solution for early detection and 
comprehensive support within the field of mental health. 
Additionally, the paper discusses the implementation of 
Streamlit to en

In [41]:
faiss_retriever = faiss_db.as_retriever()
faiss_docs = faiss_retriever.invoke("What is Langchain?")
print(faiss_docs[0].page_content)

Abstract— Mental health challenges are on the rise in our 
modern society, and the imperative to address mental disorders, 
especially regarding anxiety, depression, and suicidal thoughts, 
underscores the need for effective interventions. This paper 
delves into the application of recent advancements in pretrained 
contextualized language models to introduce MindGuide, an 
innovative chatbot serving as a mental health assistant for 
individuals seeking guidance and support in these critical areas. 
MindGuide leverages the capabilities of LangChain and its 
ChatModels, specifically ChatOpenAI, as the bedrock of its 
reasoning engine. The system incorporates key features such as 
LangChain's ChatPrompt Template, HumanMessage Prompt 
Template, 
ConversationBufferMemory, 
and 
LLMChain, 
creating an advanced solution for early detection and 
comprehensive support within the field of mental health. 
Additionally, the paper discusses the implementation of 
Streamlit to enhance the user expe

In [88]:
#######
# score
#######
# 쿼리와 유사한 문서를 검색하고 유사도 점수와 함께 반환합니다.
# 반환되는 거리 점수는 L2 거리입니다. 따라서 점수가 낮을수록 더 좋은 결과 

docs_and_scores = faiss_db.similarity_search_with_score("What is Langchain?")
content, score = docs_and_scores[0]  # 문서와 점수 리스트에서 첫 번째 요소를 선택합니다
print("[Content]")
print(content.page_content)  # 선택된 문서의 page_content 속성을 출력합니다
print("\n[Score]")
print(score)  # 선택된 문서의 점수를 출력합니다


[Content]
Abstract— Mental health challenges are on the rise in our 
modern society, and the imperative to address mental disorders, 
especially regarding anxiety, depression, and suicidal thoughts, 
underscores the need for effective interventions. This paper 
delves into the application of recent advancements in pretrained 
contextualized language models to introduce MindGuide, an 
innovative chatbot serving as a mental health assistant for 
individuals seeking guidance and support in these critical areas. 
MindGuide leverages the capabilities of LangChain and its 
ChatModels, specifically ChatOpenAI, as the bedrock of its 
reasoning engine. The system incorporates key features such as 
LangChain's ChatPrompt Template, HumanMessage Prompt 
Template, 
ConversationBufferMemory, 
and 
LLMChain, 
creating an advanced solution for early detection and 
comprehensive support within the field of mental health. 
Additionally, the paper discusses the implementation of 
Streamlit to enhance the

In [94]:
# 벡터스토어 db 인스턴스를 생성
from langchain_community.vectorstores import FAISS
from langchain_community.vectorstores.utils import DistanceStrategy
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings_model = HuggingFaceEmbeddings(
    model_name='jhgan/ko-sbert-nli',
    model_kwargs={'device':'mps'},
    encode_kwargs={'normalize_embeddings':True},
)

vectorstore = FAISS.from_documents(split_docs,
                                   embedding = embeddings_model,
                                   distance_strategy = DistanceStrategy.COSINE
                                  )
vectorstore


<langchain_community.vectorstores.faiss.FAISS at 0x394941900>

In [95]:
docs = vectorstore.similarity_search("Langchain 잘 사용하는 방법 알려줘.")
print(len(docs))
print(docs[0].page_content)

1
Abstract— Mental health challenges are on the rise in our 
modern society, and the imperative to address mental disorders, 
especially regarding anxiety, depression, and suicidal thoughts, 
underscores the need for effective interventions. This paper 
delves into the application of recent advancements in pretrained 
contextualized language models to introduce MindGuide, an 
innovative chatbot serving as a mental health assistant for 
individuals seeking guidance and support in these critical areas. 
MindGuide leverages the capabilities of LangChain and its 
ChatModels, specifically ChatOpenAI, as the bedrock of its 
reasoning engine. The system incorporates key features such as 
LangChain's ChatPrompt Template, HumanMessage Prompt 
Template, 
ConversationBufferMemory, 
and 
LLMChain, 
creating an advanced solution for early detection and 
comprehensive support within the field of mental health. 
Additionally, the paper discusses the implementation of 
Streamlit to enhance the user ex

In [85]:
print(docs[0])

page_content='Abstract— Mental health challenges are on the rise in our \nmodern society, and the imperative to address mental disorders, \nespecially regarding anxiety, depression, and suicidal thoughts, \nunderscores the need for effective interventions. This paper \ndelves into the application of recent advancements in pretrained \ncontextualized language models to introduce MindGuide, an \ninnovative chatbot serving as a mental health assistant for \nindividuals seeking guidance and support in these critical areas. \nMindGuide leverages the capabilities of LangChain and its \nChatModels, specifically ChatOpenAI, as the bedrock of its \nreasoning engine. The system incorporates key features such as \nLangChain\'s ChatPrompt Template, HumanMessage Prompt \nTemplate, \nConversationBufferMemory, \nand \nLLMChain, \ncreating an advanced solution for early detection and \ncomprehensive support within the field of mental health. \nAdditionally, the paper discusses the implementation of \n

In [96]:
faiss_retriever = vectorstore.as_retriever()
faiss_docs = faiss_retriever.invoke("What is Langchain?")
print(faiss_docs[0].page_content)

Abstract— Mental health challenges are on the rise in our 
modern society, and the imperative to address mental disorders, 
especially regarding anxiety, depression, and suicidal thoughts, 
underscores the need for effective interventions. This paper 
delves into the application of recent advancements in pretrained 
contextualized language models to introduce MindGuide, an 
innovative chatbot serving as a mental health assistant for 
individuals seeking guidance and support in these critical areas. 
MindGuide leverages the capabilities of LangChain and its 
ChatModels, specifically ChatOpenAI, as the bedrock of its 
reasoning engine. The system incorporates key features such as 
LangChain's ChatPrompt Template, HumanMessage Prompt 
Template, 
ConversationBufferMemory, 
and 
LLMChain, 
creating an advanced solution for early detection and 
comprehensive support within the field of mental health. 
Additionally, the paper discusses the implementation of 
Streamlit to enhance the user expe

In [104]:
#######
# score
#######
# 쿼리와 유사한 문서를 검색하고 유사도 점수와 함께 반환합니다.
# 반환되는 거리 점수는 L2 거리입니다. 따라서 점수가 낮을수록 더 좋은 결과 
content, score = vectorstore.similarity_search_with_score(query="Langchain 잘 사용하는 방법 알려줘.")[0]
print("[Content]")
print(content.page_content)  # 선택된 문서의 page_content 속성을 출력합v니다
print("\n[Score]")
print(score)  # 선택된 문서의 점수를 출력합니다


[Content]
Abstract— Mental health challenges are on the rise in our 
modern society, and the imperative to address mental disorders, 
especially regarding anxiety, depression, and suicidal thoughts, 
underscores the need for effective interventions. This paper 
delves into the application of recent advancements in pretrained 
contextualized language models to introduce MindGuide, an 
innovative chatbot serving as a mental health assistant for 
individuals seeking guidance and support in these critical areas. 
MindGuide leverages the capabilities of LangChain and its 
ChatModels, specifically ChatOpenAI, as the bedrock of its 
reasoning engine. The system incorporates key features such as 
LangChain's ChatPrompt Template, HumanMessage Prompt 
Template, 
ConversationBufferMemory, 
and 
LLMChain, 
creating an advanced solution for early detection and 
comprehensive support within the field of mental health. 
Additionally, the paper discusses the implementation of 
Streamlit to enhance the

### faiss.save_local

In [43]:
# 로컬에 "MY_FIRST_DB_INDEX"라는 이름으로 데이터베이스를 저장합니다.
DB_INDEX = "MY_FIRST_FAISS_DB_INDEX"
faiss_db.save_local(DB_INDEX)

### FAISS.load_local

In [44]:
# 로컬에 저장된 데이터베이스를 불러와 new_db 변수에 할당합니다.
new_db = FAISS.load_local(DB_INDEX, embeddings_model,
                          allow_dangerous_deserialization=True)

query = "임베딩(Embedding)이란 무엇인가요?"

# new_db에서 query와 유사한 문서를 검색하여 docs 변수에 할당합니다.
docs = new_db.similarity_search(query)

# 문서 리스트의 첫 번째 문서를 가져옵니다.
docs[0]


Document(page_content='Abstract— Mental health challenges are on the rise in our \nmodern society, and the imperative to address mental disorders, \nespecially regarding anxiety, depression, and suicidal thoughts, \nunderscores the need for effective interventions. This paper \ndelves into the application of recent advancements in pretrained \ncontextualized language models to introduce MindGuide, an \ninnovative chatbot serving as a mental health assistant for \nindividuals seeking guidance and support in these critical areas. \nMindGuide leverages the capabilities of LangChain and its \nChatModels, specifically ChatOpenAI, as the bedrock of its \nreasoning engine. The system incorporates key features such as \nLangChain\'s ChatPrompt Template, HumanMessage Prompt \nTemplate, \nConversationBufferMemory, \nand \nLLMChain, \ncreating an advanced solution for early detection and \ncomprehensive support within the field of mental health. \nAdditionally, the paper discusses the implementat

In [46]:
print(docs[0].page_content)

Abstract— Mental health challenges are on the rise in our 
modern society, and the imperative to address mental disorders, 
especially regarding anxiety, depression, and suicidal thoughts, 
underscores the need for effective interventions. This paper 
delves into the application of recent advancements in pretrained 
contextualized language models to introduce MindGuide, an 
innovative chatbot serving as a mental health assistant for 
individuals seeking guidance and support in these critical areas. 
MindGuide leverages the capabilities of LangChain and its 
ChatModels, specifically ChatOpenAI, as the bedrock of its 
reasoning engine. The system incorporates key features such as 
LangChain's ChatPrompt Template, HumanMessage Prompt 
Template, 
ConversationBufferMemory, 
and 
LLMChain, 
creating an advanced solution for early detection and 
comprehensive support within the field of mental health. 
Additionally, the paper discusses the implementation of 
Streamlit to enhance the user expe

### FAISS 벡터 저장소를 병합

In [47]:
# db1 생성
db1 = FAISS.from_texts(["LangChain!!"], embeddings_model)
# db2 생성
db2 = FAISS.from_texts(["좋아요^^"], embeddings_model)

In [48]:
db1.docstore.__dict__

{'_dict': {'a3a97328-7c25-460a-a572-8cd2ba71cc51': Document(page_content='LangChain!!')}}

In [49]:
db2.docstore.__dict__

{'_dict': {'e48af268-823c-4f03-8018-f3cc6fc66d40': Document(page_content='좋아요^^')}}

In [50]:
db1.merge_from(db2)  # db1에 db2를 병합합니다.

In [52]:
db1.docstore.__dict__

{'_dict': {'a3a97328-7c25-460a-a572-8cd2ba71cc51': Document(page_content='LangChain!!'),
  'e48af268-823c-4f03-8018-f3cc6fc66d40': Document(page_content='좋아요^^')}}