<a href="https://colab.research.google.com/github/baldpanda/advent-of-haystack-2023/blob/main/day_10/data_serialisation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advent of Haystack: Day 10

_Make a copy of this Colab to start_

For the last challenge of Advent of Haystack, we are asking you to create your own _marshaller_ to use a pipeline that was seriealized with msgpack.

Your task is to complete **Steps 2 onwards**

## 1) Install Dependencies

In [1]:
!pip install haystack-ai msgpack

Collecting haystack-ai
  Downloading haystack_ai-2.0.0b3-py3-none-any.whl (189 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m189.7/189.7 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
Collecting boilerpy3 (from haystack-ai)
  Downloading boilerpy3-1.0.7-py3-none-any.whl (22 kB)
Collecting lazy-imports (from haystack-ai)
  Downloading lazy_imports-0.3.1-py3-none-any.whl (12 kB)
Collecting openai<1.0.0 (from haystack-ai)
  Downloading openai-0.28.1-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
Collecting posthog (from haystack-ai)
  Downloading posthog-3.1.0-py2.py3-none-any.whl (37 kB)
Collecting rank-bm25 (from haystack-ai)
  Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Collecting monotonic>=1.5 (from posthog->haystack-ai)
  Downloading monotonic-1.6-py2.py3-none-any.whl (8.2 kB)
Collecting backoff>=1.10.0 (from posthog->haystack-ai)
  Downloading backoff-2.2

### Enabling Telemetry

Knowing you’re running this challenge helps us know whether Advent of Haystack is helping people learn about Haystack 2.0-Beta. But you can always opt out by commenting the following line.

In [2]:
from haystack.telemetry import tutorial_running

tutorial_running("challenge_10")

When we de-serialize a pipeline, all the components are automatically loaded
so we can't pass the api key to the PromptBuilder constructor as usual. The
best practice is to let the PromptBuilder instance access the special
environment variable `OPENAI_API_KEY` that we set here.

In [3]:
import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI Api key: ")

Enter OpenAI Api key: ··········


## 2) Create a custom Marshaller


Documentation page for [serialization](https://docs.haystack.deepset.ai/v2.0/docs/serialization)

We will use the custom `MsgpackMarshaller` to be able to use the pipeline degined in `ancient_instructions`

In [4]:
# MessagePack is an efficient binary serialization format. It's not supposed to be human-readable, as you can see:
ancient_instructions = b'\x84\xa8metadata\x80\xb1max_loops_allowedd\xaacomponents\x86\xa9converter\x82\xa4type\xd92haystack.components.converters.html.HTMLToDocument\xafinit_parameters\x80\xa7fetcher\x82\xa4type\xd9<haystack.components.fetchers.link_content.LinkContentFetcher\xafinit_parameters\x84\xb0raise_on_failure\xc3\xabuser_agents\x91\xd9#haystack/LinkContentFetcher/0.152.0\xaeretry_attempts\x02\xa7timeout\x03\xa3llm\x82\xa4type\xd92haystack.components.generators.openai.GPTGenerator\xafinit_parameters\x85\xaamodel_name\xadgpt-3.5-turbo\xb2streaming_callback\xc0\xacapi_base_url\xb9https://api.openai.com/v1\xb1generation_kwargs\x80\xadsystem_prompt\xc0\xaeprompt_builder\x82\xa4type\xd99haystack.components.builders.prompt_builder.PromptBuilder\xafinit_parameters\x81\xa8template\xd9\x8a Acc  ding to these docu  nts:\n{% for  oc in documents %}  {{ doc.con     }} {% endfor %}\nAnswer the given qu  tion: {{question}} Answer: \xa6ranker\x82\xa4type\xd9Phaystack.components.rankers.transformers_similarity.TransformersSimilarityRanker\xafinit_parameters\x84\xa6device\xa3cpu\xb2model_name_or_path\xd9$cross-encoder/ms-marco-MiniLM-L-6-v2\xa5token\xc0\xa5top_k\x03\xa8splitter\x82\xa4type\xd9Dhaystack.components.preprocessors.document_splitter.DocumentSplitter\xafinit_parameters\x83\xa8split_by\xa4word\xacsplit_length2\xadsplit_overlap\x00\xabconnections\x95\x82\xa6sender\xb3converter.documents\xa8receiver\xb2splitter.documents\x82\xa6sender\xaffetcher.streams\xa8receiver\xb1converter.sources\x82\xa6sender\xb5prompt_builder.prompt\xa8receiver\xaallm.prompt\x82\xa6sender\xb0ranker.documents\xa8receiver\xb8prompt_builder.documents\x82\xa6sender\xb2splitter.documents\xa8receiver\xb0ranker.documents'

In [6]:
import msgpack
from typing import Dict, Any, Union


class MsgpackMarshaller:
    """
    Custom Messagepack marshaller implementing
    the Marshaller protocol in Haystack.
    """
    ### Create a custom marshaller
    def marshal(self, message_) -> str:
        return msgpack.packb(message_)

    def unmarshal(self, data) -> Dict[str, Any]:
        return msgpack.unpackb(data)


## 3) Convert Messagepack to YAML

Read the ancient instructions using the custom marshaller into a `Pipeline` object, then use the default marshaller to convert the pipeline again, this time to YAML.

In [7]:
from haystack import Pipeline

pipe = Pipeline.loads(data=ancient_instructions, marshaller=MsgpackMarshaller())
print(pipe.dumps())

components:
  converter:
    init_parameters: {}
    type: haystack.components.converters.html.HTMLToDocument
  fetcher:
    init_parameters:
      raise_on_failure: true
      retry_attempts: 2
      timeout: 3
      user_agents:
      - haystack/LinkContentFetcher/0.152.0
    type: haystack.components.fetchers.link_content.LinkContentFetcher
  llm:
    init_parameters:
      api_base_url: https://api.openai.com/v1
      generation_kwargs: {}
      model_name: gpt-3.5-turbo
      streaming_callback: null
      system_prompt: null
    type: haystack.components.generators.openai.GPTGenerator
  prompt_builder:
    init_parameters:
      template: ' Acc  ding to these docu  nts:

        {% for  oc in documents %}  {{ doc.con     }} {% endfor %}

        Answer the given qu  tion: {{question}} Answer: '
    type: haystack.components.builders.prompt_builder.PromptBuilder
  ranker:
    init_parameters:
      device: cpu
      model_name_or_path: cross-encoder/ms-marco-MiniLM-L-6-v2
      to

# 4) Edit the YAML representation of the pipeline to fix the errors

Contrary to Messagepack, YAML is supposed to be human-readable, and Haystack encourages you to work with pipelines using this format instead of Python whenever it makes sense.

Can you spot the errors in the YAML code you printed in the previous cell?

You can copy over the YAML output from the section above to try to spot the errors.

In [8]:
broken_pipeline_definition = """
components:
  converter:
    init_parameters: {}
    type: haystack.components.converters.html.HTMLToDocument
  fetcher:
    init_parameters:
      raise_on_failure: true
      retry_attempts: 2
      timeout: 3
      user_agents:
      - haystack/LinkContentFetcher/0.152.0
    type: haystack.components.fetchers.link_content.LinkContentFetcher
  llm:
    init_parameters:
      api_base_url: https://api.openai.com/v1
      generation_kwargs: {}
      model_name: gpt-3.5-turbo
      streaming_callback: null
      system_prompt: null
    type: haystack.components.generators.openai.GPTGenerator
  prompt_builder:
    init_parameters:
      template: ' According to these documents:

        {% for doc in documents %}  {{ doc.contents }} {% endfor %}

        Answer the given question: {{question}} Answer: '
    type: haystack.components.builders.prompt_builder.PromptBuilder
  ranker:
    init_parameters:
      device: cpu
      model_name_or_path: cross-encoder/ms-marco-MiniLM-L-6-v2
      token: null
      top_k: 3
    type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
  splitter:
    init_parameters:
      split_by: word
      split_length: 50
      split_overlap: 0
    type: haystack.components.preprocessors.document_splitter.DocumentSplitter
connections:
- receiver: splitter.documents
  sender: converter.documents
- receiver: converter.sources
  sender: fetcher.streams
- receiver: llm.prompt
  sender: prompt_builder.prompt
- receiver: prompt_builder.documents
  sender: ranker.documents
- receiver: ranker.documents
  sender: splitter.documents
max_loops_allowed: 100
metadata: {}
"""


## 5) Run the fixed pipeline

In [9]:
fixed_pipeline_definition = """
components:
  converter:
    init_parameters: {}
    type: haystack.components.converters.html.HTMLToDocument
  fetcher:
    init_parameters:
      raise_on_failure: true
      retry_attempts: 2
      timeout: 3
      user_agents:
      - haystack/LinkContentFetcher/0.152.0
    type: haystack.components.fetchers.link_content.LinkContentFetcher
  llm:
    init_parameters:
      api_base_url: https://api.openai.com/v1
      generation_kwargs: {}
      model_name: gpt-3.5-turbo
      streaming_callback: null
      system_prompt: null
    type: haystack.components.generators.openai.GPTGenerator
  prompt_builder:
    init_parameters:
      template: ' According to these documents:

        {% for doc in documents %}  {{ doc.contents }} {% endfor %}

        Answer the given question: {{question}} Answer: '
    type: haystack.components.builders.prompt_builder.PromptBuilder
  ranker:
    init_parameters:
      device: cpu
      model_name_or_path: cross-encoder/ms-marco-MiniLM-L-6-v2
      token: null
      top_k: 3
    type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
  splitter:
    init_parameters:
      split_by: word
      split_length: 50
      split_overlap: 0
    type: haystack.components.preprocessors.document_splitter.DocumentSplitter
connections:
- receiver: splitter.documents
  sender: converter.documents
- receiver: converter.sources
  sender: fetcher.streams
- receiver: llm.prompt
  sender: prompt_builder.prompt
- receiver: prompt_builder.documents
  sender: ranker.documents
- receiver: ranker.documents
  sender: splitter.documents
max_loops_allowed: 100
metadata: {}
"""

working_pipeline = Pipeline.loads(fixed_pipeline_definition)
result = working_pipeline.run({
    "prompt_builder": {"question": "how do I start a lathe?"},
    "ranker": {"query": "how do I start a lathe?"},
    "fetcher": {"urls": ["https://en.wikipedia.org/wiki/Lathe"]}
})
print(result["llm"]["replies"][0])

config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

To start a lathe, follow these steps:

1. Ensure that you have all the necessary safety equipment like safety glasses, gloves, and ear protection.
2. Make sure the lathe is properly set up and secured to the workbench or stand, with a stable base.
3. Check that the lathe's power switch is in the "off" position before plugging it into a grounded electrical outlet.
4. Adjust the speed settings on the lathe according to the type of material you'll be working with and the desired outcome.
5. Confirm that the lathe's tool rest is in the correct position and securely tightened.
6. Carefully mount the workpiece onto the lathe's drive spindle, ensuring it is centered and properly secured using any necessary clamps or chucks.
7. Double-check that the tool rest is parallel to the workpiece and at the appropriate distance.
8. Turn on the lathe by flipping the power switch to the "on" position, being cautious while doing so.
9. Gradually increase the lathe's speed to the desired level, starting wi