### **Importando as bibliotecas**

In [1]:
import os
from dotenv import load_dotenv, find_dotenv

from langchain_cohere import CohereEmbeddings
from langchain_community.document_loaders import TextLoader # Permite carregar arquivos de texto
from langchain_text_splitters import CharacterTextSplitter # Permite dividir o texto em partes menores

from langchain_pinecone import PineconeVectorStore # Permite armazenar os vetores no Pinecone
from langchain.memory import ConversationBufferMemory

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

from langchain_groq import ChatGroq
from langchain import hub

### **Carregar as variáveis de ambiente**

In [2]:
load_dotenv(find_dotenv())

COHERE_API_KEY = os.getenv("COHERE_API_KEY")
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
GROQ_API_KEY = os.getenv("GROQ_API_KEY")

### **INSTANCIAR O BANCO DE DADOS DO PINECONE**

OBS: primeiro devemos criar o index no pinecone cloud

In [3]:
index_name = "rag-demo"
embeddings = CohereEmbeddings(
    model="embed-english-v3.0",
    cohere_api_key=COHERE_API_KEY
)

### **Criando a memória**

In [4]:
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

  memory = ConversationBufferMemory(


### **CARREGAMENTO/INGESTÃO DO DOCUMENTO**

In [None]:
print("🔥Carregando os documentos...\n\n")

PATH_FILE = "data\mediumblog1.txt"
loader = TextLoader(PATH_FILE)
document = loader.load()

In [8]:
document

[Document(metadata={'source': 'data\\mediumblog1.txt'}, page_content='Title: Vector Database: What is it and Why You Should Know It?\n\nAuthor: Ejiro Onose\nDate: December 22, 2023\n\nIf 2021 was the year of graph databases, 2023 is the year of vector databases â€” Chip Huen.\n\nGenerative AI and Large Language Models (LLMs) have become popular, and a vector database is one of the best tools to handle LLM data. Vector databases provide the ideal infrastructure for managing the complex, high-dimensional data that LLMs produce and rely upon.\n\nIn this article, Iâ€™ll explain what vector databases are, how they work, and introduce some top vector database tools.\n\n What is a Vector?\nIn machine learning (ML), a vector is a collection of numerical values that represents the features of multi-dimensional objects, such as words or images. For example, a vector representing an image might contain values related to pixel intensities and color channels.\n\n What are Embeddings?\nAn embedding 

### **FATIAMENTO/SPLITTING DOS DOCUMENTOS**

In [None]:
print("🔪 Splitting the document into smaller chunks...\n\n")

text_splitter = CharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=0
)
docs = text_splitter.split_documents(document)
print(f"Total chunks: {len(docs)}\n\n")

### **Instanciar o banco de dados**

Após a execução deste comando, os dados estarão disponíveis no banco de dados cloud do Pinecone

In [22]:
vectorstore_from_docs = PineconeVectorStore.from_documents(
    docs,
    index_name=index_name,
    embedding=embeddings
)

### **Realizando consultas no banco de dados vetorial**

In [25]:
vectorstore = PineconeVectorStore(
    index_name=index_name,
    embedding=embeddings
)

In [None]:
query = "What is Vector Store?"