## Streaming model explorer for Haystack 2.0

*Problem*: there are so many LLMs these days! Which model is the best for my use case?

This notebook uses [Haystack 2.0](https://docs.haystack.deepset.ai/v2.0/docs/intro) to compare the results of sending the same prompt to several different models.

This is a very basic demo where you can only compare a few models that support streaming responses. I'd like to support more models in the future, so watch this space for updates.

### Models

Haystack's [OpenAIGenerator](https://docs.haystack.deepset.ai/v2.0/docs/openaigenerator) and [CohereGenerator](https://docs.haystack.deepset.ai/v2.0/docs/coheregenerator) support streaming out of the box.

The other models use the [HuggingFaceTGIGenerator](https://docs.haystack.deepset.ai/v2.0/docs/huggingfacetgigenerator) and have [hosted inference endpoints](https://huggingface.co/inference-api).

### Prerequisites

You need [HuggingFace](https://huggingface.co/docs/hub/security-tokens), [Cohere](https://docs.cohere.com/docs/connector-authentication), and [OpenAI](https://help.openai.com/en/articles/4936850-where-do-i-find-my-api-key) API keys. Save them as secrets in your Colab. Click on the key icon in the left menu or [see detailed instructions here](https://medium.com/@parthdasawant/how-to-use-secrets-in-google-colab-450c38e3ec75).

In [1]:
!pip install haystack-ai cohere_haystack

Collecting haystack-ai
  Downloading haystack_ai-2.0.0b5-py3-none-any.whl (233 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.5/233.5 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting cohere_haystack
  Downloading cohere_haystack-0.2.0-py3-none-any.whl (15 kB)
Collecting boilerpy3 (from haystack-ai)
  Downloading boilerpy3-1.0.7-py3-none-any.whl (22 kB)
Collecting haystack-bm25 (from haystack-ai)
  Downloading haystack_bm25-1.0.2-py2.py3-none-any.whl (8.8 kB)
Collecting lazy-imports (from haystack-ai)
  Downloading lazy_imports-0.3.1-py3-none-any.whl (12 kB)
Collecting openai>=1.1.0 (from haystack-ai)
  Downloading openai-1.9.0-py3-none-any.whl (223 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m223.4/223.4 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
Collecting posthog (from haystack-ai)
  Downloading posthog-3.3.2-py2.py3-none-any.whl (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.7/40.7 kB[0m 

In [10]:
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from cohere_haystack.generator import CohereGenerator
from haystack.components.generators import HuggingFaceTGIGenerator
from google.colab import userdata

open_ai_generator = OpenAIGenerator(api_key=userdata.get('OPENAI_API_KEY'))

cohere_generator = CohereGenerator(api_key=userdata.get('COHERE_API_KEY'))

hf_generator = HuggingFaceTGIGenerator(model="mistralai/Mistral-7B-v0.1",token=userdata.get('HF_API_KEY'))

hf_generator.warm_up()

hf_generator_2 = HuggingFaceTGIGenerator(model="tiiuae/falcon-7b-instruct",token=userdata.get('HF_API_KEY'))

hf_generator_2.warm_up()

hf_generator_3 = HuggingFaceTGIGenerator(model="bigscience/bloom",token=userdata.get('HF_API_KEY'))

hf_generator_3.warm_up()



In [140]:
from haystack.components.generators import GPTGenerator
MODELS: list[GPTGenerator] = [cohere_generator, open_ai_generator, hf_generator, hf_generator_2, hf_generator_3]

The `AppendToken` dataclass formats the output so that the model name is printed, and the text follows in chunks of 5 tokens.

In [176]:
from dataclasses import dataclass
import ipywidgets as widgets

def output():...

@dataclass
class AppendToken:
  output: widgets.Output
  chunks = []
  chunk_size = 5

  def __call__(self, chunk):
      with self.output:
        text = getattr(chunk, 'content', '') or getattr(chunk, 'text', '') or ''
        self.chunks.append(text)
        if len(self.chunks) == self.chunk_size:
          output_string = ' '.join(self.chunks)
          self.output.append_display_data(output_string)
          self.chunks.clear()

def multiprompt(prompt, models=MODELS):
  outputs = [widgets.Output(layout={'border': '1px solid black'}) for _ in models]
  display(widgets.HBox(children=outputs))

  for i, model in enumerate(models):
    model_name = getattr(model, 'model', '') or getattr(model, 'model_name', '') or ''
    outputs[i].append_display_data(f'Model name: {model_name}')
    model.streaming_callback = AppendToken(outputs[i])
    model.run(prompt)


In [177]:
multiprompt("Give me a 50-word description of a beach sunset in the style of Sherlock Holmes.")

HBox(children=(Output(layout=Layout(border='1px solid black')), Output(layout=Layout(border='1px solid black')…

This was a very silly example prompt. If you found this demo useful, let me know the kinds of prompts you tested it with! tilde.thurium@deepset.ai.Thanks for following along!