<a href="https://colab.research.google.com/github/Ashish-Soni08/Playground/blob/main/haystack/Advent_of_Haystack_Find_the_hidden_answer(Ashish_Soni).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advent of Haystack - Day 2

In this challenge, you will complete a scavenger hunt prepared by the Haystack elves. Find what their favorite animals is!!

- [`LinkContentFetcher`](https://docs.haystack.deepset.ai/v2.0/docs/linkcontentfetcher): This will allow you to fetch the contents of https://haystack.deepset.ai/advent-of-haystack/day-1#challenge
- [`HTMLToDocument`](https://docs.haystack.deepset.ai/v2.0/docs/htmltodocument): Once you've fetched the contents, this component will allow you to convert it to a Document
- [`DocumentSplitter`](https://docs.haystack.deepset.ai/v2.0/docs/documentsplitter) (Optional): This is useful if you want to split your Document into chunks
- [`TransformersSimilarityRanker`](https://docs.haystack.deepset.ai/v2.0/docs/transformerssimilarityranker) (Optional): This is useful if you want to rank your documents (chunked with the splitter above) so that the most relevant is at the top.
- [`PromptBuilder`](https://docs.haystack.deepset.ai/v2.0/docs/promptbuilder): This is used to define how you want to prompr an LLM so that it generates an accurate response for you. We’ve included one for you in the starter Colab that will help you with this challenge.
- [`GPTGenerator`](https://docs.haystack.deepset.ai/v2.0/docs/gptgenerator): This component is used to query GPT. You can change this to one of our other generators instead!

#Installation
**Note:** There is a known issue with colab due to a version conflict error related to `llmx` which comes with Colab. You might get an `llmx` error. You can safely ignore this, or run `pip uninstall -y llmx`

In [4]:
%%capture
%%bash
pip install haystack-ai
pip install boilerpy3
pip install transformers[torch,sentencepiece]==4.32.1 sentence-transformers>=2.2.0

Run this code and you’ll be prompted to enter your openAI credentials. If you don’t have a key, [follow these instructions](https://help.openai.com/en/articles/4936850-where-do-i-find-my-api-key).

In [5]:
from getpass import getpass

openai_api_key = getpass("Enter OpenAI Api key: ")

Enter OpenAI Api key: ··········


## Create a Pipeline to fetch the contents of a webpage

Complete the code cell below to create a pipeline that fetches the contents from  https://haystack.deepset.ai/advent-of-haystack/day-1#challenge

In [7]:
from haystack.components.fetchers.link_content import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.rankers import TransformersSimilarityRanker
from haystack.components.generators import GPTGenerator
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack import Pipeline

# Components
fetcher = LinkContentFetcher()

converter = HTMLToDocument()

splitter = DocumentSplitter(split_length=100, split_overlap=5)

ranker = TransformersSimilarityRanker()

# Prompt for the LLM
prompt_template = """
According to these documents:

{% for doc in documents %}
  {{ doc.content }}
{% endfor %}

Answer the given question: {{question}}
Answer:
"""
prompt_builder = PromptBuilder(template=prompt_template)

# LLM
llm = GPTGenerator(api_key = openai_api_key, model_name = "gpt-3.5-turbo-1106")

# Creating the Pipeline

pipeline = Pipeline()
pipeline.add_component(name="fetcher", instance=fetcher)
pipeline.add_component(name="converter", instance=converter)
pipeline.add_component(name="splitter", instance=splitter)
pipeline.add_component(name="ranker", instance=ranker)
pipeline.add_component(name="prompt_builder", instance=prompt_builder)
pipeline.add_component(name="llm", instance=llm)

# Connect the components in the pipeline
pipeline.connect("fetcher", "converter")
pipeline.connect("converter", "splitter")
pipeline.connect("splitter", "ranker")
pipeline.connect("ranker", "prompt_builder.documents")
pipeline.connect("prompt_builder", "llm")

## Try to find the answer to the question



In [9]:
# run the Pipeline

question = "What is our favorite animal?"
result = pipeline.run({"prompt_builder": {"question": question},
                   "ranker": {"query": question},
                   "fetcher": {"urls": ["https://haystack.deepset.ai/advent-of-haystack/day-1#challenge"]}})

print(result['llm']['replies'][0])

config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

The favorite animal of the Haystack elves is a capybara.


# The favorite animal of the Haystack elves is a ```capybara```

In [14]:
pipeline.draw("/content/pipeline_day_2.png")