## Model: distilgpt2
* Embedding - all-MiniLM-L6-v2 Model
* Vectorizing - FAISS
* LLM Pipeline - distilgpt2 Model

## Step1: Import Libraries

In [1]:
import os
import logging
from sentence_transformers import SentenceTransformer
from langchain.prompts import PromptTemplate
from transformers import pipeline
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings

## Step2: Embedding & Vectorizer

In [3]:
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
faiss_index = FAISS.load_local("../faiss_store", embedding_model, allow_dangerous_deserialization=True)

def retrieve_chunks(query, top_k=5):
    docs = faiss_index.similarity_search(query, k=top_k)
    return [doc.page_content for doc in docs]

## Step3: Model

In [4]:
def generate_response(query, chunks):
    """Generate response using retrieved chunks and a lightweight LLM."""
    # Combine chunks into context
    context = "\n".join(chunks) if chunks else "No relevant information found."
    
    # Define prompt template
    prompt_template = PromptTemplate(
        input_variables=["context", "query"],
        template="Based on the following context, answer the query concisely:\nContext: {context}\nQuery: {query}\nAnswer:"
    )
    
    # Format prompt
    prompt = prompt_template.format(context=context, query=query)
    
    # Placeholder for lightweight LLM (e.g., Grok via xAI API)
    # Replace with actual API call if available, or use HuggingFace model
    try:
        llm = pipeline('text-generation', model='distilgpt2', device=-1)
        response = llm(prompt, max_length=150, truncation=True, do_sample=True, num_return_sequences=1)[0]['generated_text']
        answer = response.split('Answer:')[-1].strip() if 'Answer:' in response else response.strip()
    except Exception as e:
        logging.error(f'LLM request failed: {e}')
        answer = "Error generating response. Using context directly:\n" + context[:200]
    
    logging.info(f'Generated response for query: {query}')
    return answer

# Example usage
query = "What is an AI agent in ServiceNow?"
chunks = retrieve_chunks(query)
response = generate_response(query, chunks)
print(f"Query: {query}")
print(f"Response: {response}")

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Query: What is an AI agent in ServiceNow?
Response: be helpful to you. We'll see if you can reach out to us and you can talk to us on Slack or Telegram or email us at hello@youraccount.io or we can send you anything you want to add in the upcoming development and we'll be on the lookout for more details about our upcoming development.


## Step 4: Test Queries

In [5]:
test_queries = [
    "What is ITSM in ServiceNow?"
]

for query in test_queries:
    chunks = retrieve_chunks(query)
    result = generate_response(query, chunks)
    print(f"Q: {query}\nA: {result}\n")

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Q: What is ITSM in ServiceNow?
A: or the platform ui.
Conclusion
As you can see, all this is the one thing that needs to be done. As long as you are using your own service you can do this and get the job done.
In the event of any problem and any future problems you may have, please take this step and let us know in the comments!
If you would like to hear more, or if you would like to learn more, read the article here.



In [6]:
test_queries = [
    "Explain CMDB relationships."
]

for query in test_queries:
    chunks = retrieve_chunks(query)
    result = generate_response(query, chunks)
    print(f"Q: {query}\nA: {result}\n")

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Q: Explain CMDB relationships.
A: it.
Now, let's have a look at the command prompt commands for the following command:
cmds.yml "C:\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86)\Program Files (x86



In [7]:
test_queries = [
    "How does Incident Management work?"
]

for query in test_queries:
    chunks = retrieve_chunks(query)
    result = generate_response(query, chunks)
    print(f"Q: {query}\nA: {result}\n")

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Q: How does Incident Management work?
A: .
But a more concrete and concrete example that this is not being addressed is the recent release of the Microsoft-funded product for mobile, which is very much a product of Microsoft, that was a big step towards Microsoft's leadership. A new service called Microsoft App Service was released today, and we will be on the topic for a longer time.
Note that Microsoft App Service is not only a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it is a product of Microsoft, it i