# Quick Start Haystack

## 1. Iniciar

Haystack é uma estrutura Python de código aberto que ajuda os desenvolvedores a construir aplicativos personalizados com tecnologia LLM. Em março de 2024, lançaram o Haystack 2.0, uma atualização significativa. Para obter mais informações sobre o Haystack 2.0, é possível ler o [post de anúncio](https://haystack.deepset.ai/blog/haystack-2-release).

## 2. Instalação

Use pip para instalar o Haystack:

In [1]:
pip install haystack-ai

Collecting haystack-ai
  Downloading haystack_ai-2.0.0-py3-none-any.whl (265 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.1/265.1 KB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting networkx
  Using cached networkx-3.2.1-py3-none-any.whl (1.6 MB)
Collecting openai>=1.1.0
  Using cached openai-1.13.3-py3-none-any.whl (227 kB)
Collecting pyyaml
  Using cached PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (705 kB)
Collecting typing-extensions>=4.7
  Using cached typing_extensions-4.10.0-py3-none-any.whl (33 kB)
Collecting boilerpy3
  Downloading boilerpy3-1.0.7-py3-none-any.whl (22 kB)
Collecting tenacity
  Using cached tenacity-8.2.3-py3-none-any.whl (24 kB)
Collecting requests
  Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Collecting jinja2
  Using cached Jinja2-3.1.3-py3-none-any.whl (133 kB)
Collecting haystack-bm25
  Downloading haystack_bm25-1.0.2-py2.py3-none-any.whl (8.8 kB)
Collecting numpy


Para mais detalhes, consulte a [documentação da instalação](https://docs.haystack.deepset.ai/docs/installation?utm_campaign=developer-relations&utm_source=haystack&utm_medium=website).

## 3. Faça perguntas a uma página da web

Este é um pipeline muito simples que pode responder perguntas sobre o conteúdo de uma página web. Ele usa GPT-3.5-Turbo com o OpenAIGenerator.

Primeiro, instale o Haystack:

In [6]:
pip install haystack-ai

Note: you may need to restart the kernel to use updated packages.


In [9]:
import os
from haystack import Pipeline, PredefinedPipeline

os.environ["OPENAI_API_KEY"] = "Your OpenAI API Key"

pipeline = Pipeline.from_template(PredefinedPipeline.CHAT_WITH_WEBSITE)
result = pipeline.run({
    "fetcher": {"urls": ["https://haystack.deepset.ai/overview/quick-start"]},
    "prompt": {"query": "Como eu posso instalar o Haystack?"}}
)
print(result["llm"]["replies"][0])

Você pode instalar o Haystack utilizando o comando 'pip install haystack-ai'.


In [11]:
import os

from haystack import Pipeline
from haystack.components.fetchers import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

os.environ["OPENAI_API_KEY"] = "Your OpenAI API Key"

fetcher = LinkContentFetcher()
converter = HTMLToDocument()
prompt_template = """
According to the contents of this website:
{% for document in documents %}
  {{document.content}}
{% endfor %}
Answer the given question: {{query}}
Answer:
"""
prompt_builder = PromptBuilder(template=prompt_template)
llm = OpenAIGenerator()

pipeline = Pipeline()
pipeline.add_component("fetcher", fetcher)
pipeline.add_component("converter", converter)
pipeline.add_component("prompt", prompt_builder)
pipeline.add_component("llm", llm)

pipeline.connect("fetcher.streams", "converter.sources")
pipeline.connect("converter.documents", "prompt.documents")
pipeline.connect("prompt.prompt", "llm.prompt")

result = pipeline.run({"fetcher": {"urls": ["https://haystack.deepset.ai/overview/quick-start"]},
              "prompt": {"query": "Como posso construir meu primeiro pipeline RAG?"}})

print(result["llm"]["replies"][0])


Para construir seu primeiro pipeline RAG com o Haystack, você precisa seguir os seguintes passos:

1. Primeiramente, instale o Haystack e a integração Chroma (que será utilizada como o armazenamento de documentos). Você pode fazer isso executando o comando:
   ```pip install haystack-ai chroma-haystack```

2. Em seguida, você precisará criar um pipeline de indexação. Você pode fazer isso utilizando o seguinte código:
   ```python
   import os
   from haystack import Pipeline, PredefinedPipeline
   import urllib.request

   os.environ["OPENAI_API_KEY"] = "Sua chave da API do OpenAI"
   urllib.request.urlretrieve("https://www.gutenberg.org/cache/epub/7785/pg7785.txt", "davinci.txt")

   indexing_pipeline = Pipeline.from_template(PredefinedPipeline.INDEXING)
   indexing_pipeline.run(data={"sources": ["davinci.txt"]})
   ```

3. Por fim, construa o pipeline RAG com o Haystack e faça uma pergunta sobre os documentos indexados. Você pode fazer isso com o seguinte código:
   ```python
   rag_

<p align="center">
  <img src="../data/research-data/faca-perguntas-a-uma-pagina-da-web.png" alt="Faça perguntas a uma página da web">
</p>

## 4. Construa seu primeiro pipeline RAG

Para construir pipelines de pesquisa modernos com LLMs, você precisa de duas coisas: componentes poderosos e uma maneira fácil de reuni-los. O pipeline Haystack foi criado para essa finalidade e permite projetar e dimensionar suas interações com LLMs. Aprenda como criar pipelines aqui .

Ao conectar três componentes, um Retriever, um PromptBuilder e um Generator , você pode construir seu primeiro pipeline de Retrieval Augmented Generation (RAG) com Haystack.

Experimente como o Haystack responde a perguntas sobre os documentos fornecidos usando a abordagem RAG 👇

Primeiro, instale o Haystack e a integração do Chroma (vamos usá-lo como nosso armazenamento de documentos):

In [12]:
pip install haystack-ai chroma-haystack

Collecting chroma-haystack
  Downloading chroma_haystack-0.15.0-py3-none-any.whl (13 kB)
Collecting chromadb<0.4.20
  Downloading chromadb-0.4.19-py3-none-any.whl (505 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m506.0/506.0 KB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting opentelemetry-sdk>=1.2.0
  Downloading opentelemetry_sdk-1.23.0-py3-none-any.whl (105 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.7/105.7 KB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting kubernetes>=28.1.0
  Downloading kubernetes-29.0.0-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0mm
[?25hCollecting opentelemetry-instrumentation-fastapi>=0.41b0
  Downloading opentelemetry_instrumentation_fastapi-0.44b0-py3-none-any.whl (11 kB)
Collecting tokenizers>=0.13.2
  Using cached tokenizers-0.15.2-cp310-cp310-many

In [13]:
import os

from haystack import Pipeline, PredefinedPipeline
import urllib.request

os.environ["OPENAI_API_KEY"] = "Your OpenAI API Key"
urllib.request.urlretrieve("https://www.gutenberg.org/cache/epub/7785/pg7785.txt", "davinci.txt")  

indexing_pipeline =  Pipeline.from_template(PredefinedPipeline.INDEXING)
indexing_pipeline.run(data={"sources": ["davinci.txt"]})

rag_pipeline =  Pipeline.from_template(PredefinedPipeline.RAG)

query = "How old was he when he died?"
result = rag_pipeline.run(data={"prompt_builder": {"query":query}, "text_embedder": {"text": query}})
print(result["llm"]["replies"][0])

  from .autonotebook import tqdm as notebook_tqdm
Calculating embeddings: 100%|██████████| 2/2 [00:02<00:00,  1.15s/it]


Leonardo da Vinci was 67 years old when he died."


In [2]:
import os

from haystack import Pipeline
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
from haystack.components.converters import TextFileToDocument
from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter
from haystack.components.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder
from haystack.components.writers import DocumentWriter

from haystack_integrations.components.retrievers.chroma import ChromaEmbeddingRetriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
import urllib.request

os.environ["OPENAI_API_KEY"] = "Your OpenAI API Key"
urllib.request.urlretrieve("https://www.gutenberg.org/cache/epub/7785/pg7785.txt", "davinci.txt")  

document_store = ChromaDocumentStore(persist_path=".")

text_file_converter = TextFileToDocument()
cleaner = DocumentCleaner()
splitter = DocumentSplitter()
embedder = OpenAIDocumentEmbedder()
writer = DocumentWriter(document_store)

indexing_pipeline = Pipeline()
indexing_pipeline.add_component("converter", text_file_converter)
indexing_pipeline.add_component("cleaner", cleaner)
indexing_pipeline.add_component("splitter", splitter)
indexing_pipeline.add_component("embedder", embedder)
indexing_pipeline.add_component("writer", writer)

indexing_pipeline.connect("converter.documents", "cleaner.documents")
indexing_pipeline.connect("cleaner.documents", "splitter.documents")
indexing_pipeline.connect("splitter.documents", "embedder.documents")
indexing_pipeline.connect("embedder.documents", "writer.documents")
indexing_pipeline.run(data={"sources": ["davinci.txt"]})

text_embedder = OpenAITextEmbedder()
retriever = ChromaEmbeddingRetriever(document_store)
template = """Given these documents, answer the question.
              Documents:
              {% for doc in documents %}
                  {{ doc.content }}
              {% endfor %}
              Question: {{query}}
              Answer:"""
prompt_builder = PromptBuilder(template=template)
llm = OpenAIGenerator()

rag_pipeline = Pipeline()
rag_pipeline.add_component("text_embedder", text_embedder)
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", llm)

rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm")

query = "How old was he when he died?"
result = rag_pipeline.run(data={"prompt_builder": {"query":query}, "text_embedder": {"text": query}})
print(result["llm"]["replies"][0])

Calculating embeddings: 100%|██████████| 2/2 [00:02<00:00,  1.05s/it]
Add of existing embedding ID: d33987463a24c325ea594307e4da30102305c09b5a529056b7c1c2db5677d833
Add of existing embedding ID: e471effeb69e7c686510df375987a20824475332fbeeef16842549414ff173e0
Add of existing embedding ID: 2fe49708b8bdade00e8ef589035917254c5d18914cfedeee1ac053e7f9372d75
Add of existing embedding ID: 59f1aaf910fd118d6f9157454b75a356b44489bf2993d2ecbf6e5813da31996f
Add of existing embedding ID: 2a175620e921b84aeaec6a6fb9faa18d2526c46bb1faac083cc273519b717d59
Add of existing embedding ID: c45ec5b54597146bb7da1a265662b84e94671b39fb74a303f5cfd1ffd1587010
Add of existing embedding ID: 4d276795ceabd2000a69215995c41603df35a72664b5020ab21d677130840480
Add of existing embedding ID: 2d4973ac6a14e64cfda17389a9cf6072e213167aec01b7db3018718b591c64ca
Add of existing embedding ID: 4a36050881df54aa6c1be9e396ffa9d696c3cb77bdc99b0db897fbc14875e161
Add of existing embedding ID: dc0a263e258966d331b1bf4ff573f093a21414d230939

Leonardo da Vinci was 67 years old when he died.


<p align="center">
  <img src="../data/research-data/construa-seu-primeiro-pipeline-rag-indexacao.png" alt="Construa seu primeiro pipeline RAG - Pipeline de indexação">
  <img src="../data/research-data/construa-seu-primeiro-pipeline-rag-rag.png" alt="Construa seu primeiro pipeline RAG - Pipeline de RAG">
</p>