# Keyword Extraction with LLM Chat Generator
This notebook demonstrates how to extract keywords and key phrases from text using Haystack’s `ChatPromptBuilder` together with an LLM via `OpenAIChatGenerator`. We will:

- Define a prompt that instructs the model to identify single- and multi-word keywords.

- Capture each keyword’s character offsets.

- Assign a relevance score (0–1).

- Parse and display the results as JSON.



### Install packages and setup OpenAI API key

In [None]:
!pip install haystack-ai

In [None]:
import os
from getpass import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")

### Import Required Libraries



In [None]:
import json

from haystack import Document
from haystack.dataclasses import ChatMessage
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator


### Prepare Documents
Create a list of Document objects containing the texts you want to analyze.

In [None]:
documents = [
    Document(content="I'm Merlin, the happy pig!"),
    Document(content="My name is Clara and I live in Berkeley, California."),
]


### Build the Prompt
We construct a single-message template that instructs the model to extract keywords, their positions and scores and return the output as JSON object.


In [None]:
messages = [
    ChatMessage.from_user(
        '''
You are a keyword extractor. Extract the most relevant keywords and phrases from the following text. For each keyword:
1. Find single and multi-word keywords that capture important concepts
2. Include the starting position (index) where each keyword appears in the text
3. Assign a relevance score between 0 and 1 for each keyword
4. Focus on nouns, noun phrases, and important terms

Text to analyze: {{text}}

Return the results as a JSON array in this exact format:
{
  "keywords": [
    {
      "keyword": "example term",
      "positions": [5],
      "score": 0.95
    },
    {
      "keyword": "another keyword",
      "positions": [20],
      "score": 0.85
    }
  ]
}

Important:
- Each keyword must have its EXACT character position in the text (counting from 0)
- Scores should reflect the relevance (0–1)
- Include both single words and meaningful phrases
- List results from highest to lowest score
'''
    )
]

builder = ChatPromptBuilder(template=messages)
prompt = builder.run(text="I'm Merlin, the happy pig!")


### Initialize the Generator and Extract Keywords
We use OpenAIChatGenerator (e.g., gpt-4o-mini) to send our prompt and request a JSON-formatted response.

In [None]:
# Initialize the chat-based generator
extractor = OpenAIChatGenerator(model="gpt-4o-mini")

# Run the generator with our formatted prompt
results = extractor.run(
    messages=prompt["prompt"],
    generation_kwargs={"response_format": {"type": "json_object"}}
)

# Extract the raw text reply
output_str = results["replies"][0].text
print(output_str)


### Parse and Display Results
Finally, convert the returned JSON string into a Python object and iterate over the extracted keywords.

In [None]:
try:
    data = json.loads(output_str)
    for kw in data["keywords"]:
        print(f'Keyword: {kw["keyword"]}')
        print(f' Positions: {kw["positions"]}')
        print(f' Score: {kw["score"]}\n')
except json.JSONDecodeError:
    print("Failed to parse the output as JSON. Raw output:", output_str)
