## Model: Mistral-7B-Instruct-v0.2-GPTQ
* Embedding - BAAI/bge-small-en-v1.5 Model
* Vectorizer - FAISS
* LLM Model - TheBloke/Mistral-7B-Instruct-v0.2-GPTQ Model
* HuggingFace Pipeline
* Retrieval and Context Augmentation

## Step1: Import Libraries

In [9]:
!pip install --upgrade --no-cache-dir \
  transformers \
  accelerate \
  sentence-transformers \
  langchain \
  pinecone-client \
  datasets \
  bitsandbytes \
  langchain-community \
  langchain-pinecone \
  pinecone \
  auto-gptq --quiet

!pip install --upgrade --no-cache-dir "optimum[onnxruntime]" --quiet
!pip install faiss-cpu --quiet

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.8/10.8 MB[0m [31m137.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.5/10.5 MB[0m [31m85.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m30.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [10]:
import os
from langchain_community.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.schema.document import Document
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_core.output_parsers import StrOutputParser
import torch
import logging
import pandas as pd

from google.colab import userdata
HF_TOKEN=userdata.get('HF_TOKEN')

In [11]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [12]:
df = pd.read_csv("/content/drive/MyDrive/my_project/processed_cleaned_chunks.csv")
docs = [
    Document(
        page_content=row['text'],
        metadata={"chunk_id": row['chunk_id'], "video_id": row['video_id']}
    )
    for _, row in df.iterrows()
]
print(f"Loaded {len(docs)} documents for indexing/retrieval.")

Loaded 328 documents for indexing/retrieval.


## Step2: Embedding & Vectorizer

In [13]:
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

faiss_index = FAISS.from_documents(docs, embedding_model)
faiss_index.save_local("/content/drive/MyDrive/faiss_store")


In [14]:
from huggingface_hub import login

login(token=HF_TOKEN)

## Step3: Model

In [15]:
model_id = "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True
)

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

  @custom_fwd
  @custom_bwd
  @custom_fwd(cast_inputs=torch.float16)


model.safetensors:   0%|          | 0.00/4.16G [00:00<?, ?B/s]

`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
Some weights of the model checkpoint at TheBloke/Mistral-7B-Instruct-v0.2-GPTQ were not used when initializing MistralForCausalLM: ['model.layers.0.mlp.down_proj.bias', 'model.layers.0.mlp.gate_proj.bias', 'model.layers.0.mlp.up_proj.bias', 'model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.1.mlp.down_proj.bias', 'model.layers.1.mlp.gate_proj.bias', 'model.layers.1.mlp.up_proj.bias', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.10.mlp.down_proj.bias', 'model.layers.10.mlp.gate_proj.bias', 'model.layers.10.mlp.up_proj.bias', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.q_pr

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

In [16]:
embedding_model = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5")

faiss_index = FAISS.load_local("/content/drive/MyDrive/faiss_store", embedding_model, allow_dangerous_deserialization=True)
retriever = faiss_index.as_retriever(search_type="similarity", search_kwargs={"k": 5})

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## Step4: HuggingFace Pipeline

In [17]:
from langchain.llms import HuggingFacePipeline
from transformers import pipeline
llm_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1
)

llm = HuggingFacePipeline(pipeline=llm_pipeline)

Device set to use cuda:0
  llm = HuggingFacePipeline(pipeline=llm_pipeline)


## Step5: Retrieval & Context Augmentation

In [18]:
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

In [19]:

import logging
logging.getLogger("langchain_community.vectorstores").setLevel(logging.ERROR)
logging.getLogger("transformers").setLevel(logging.ERROR)
query = "How do I create a catalog item in ServiceNow?"
response = qa_chain(query)
raw_output = response["result"]
if "Answer:" in raw_output:
    cleaned = raw_output.split("Answer:")[-1].strip()
else:
    cleaned = raw_output.strip()
print(cleaned)

  response = qa_chain(query)


Creating a catalog item in ServiceNow involves several steps, including designing, building, launching, optimizing, and improving the service. As a manager, you would typically collaborate with other teams to define the requirements, create the item record, configure the pricing and availability, and set up the delivery process. You might also involve your ServiceNow representative or consultants for assistance if needed. Remember to follow the best practices and guidelines provided by ServiceNow to ensure successful implementation. For detailed instructions, refer to the ServiceNow Knowledge Base or contact your ServiceNow account team.


## Step6: Test Queries

In [20]:
logging.getLogger("langchain_community.vectorstores").setLevel(logging.ERROR)
logging.getLogger("transformers").setLevel(logging.ERROR)
query = "What is ITSM in ServiceNow"
response = qa_chain(query)
raw_output = response["result"]

if "Answer:" in raw_output:
    cleaned = raw_output.split("Answer:")[-1].strip()
else:
    cleaned = raw_output.strip()
print(query)
print(cleaned)

What is ITSM in ServiceNow
In ServiceNow, ITSM stands for IT Service Management. It refers to the practice of using IT services to support and enable the operations of an organization, often delivered through the ServiceNow platform. This includes incident management, problem management, change management, and other related processes. The goal is to improve the quality and efficiency of IT services, reduce downtime, and enhance the user experience.


In [21]:
logging.getLogger("langchain_community.vectorstores").setLevel(logging.ERROR)
logging.getLogger("transformers").setLevel(logging.ERROR)
query = "What are different AI concepts used in ServiceNow?"
response = qa_chain(query)
raw_output = response["result"]
if "Answer:" in raw_output:
    cleaned = raw_output.split("Answer:")[-1].strip()
else:
    cleaned = raw_output.strip()
print(query)
print(cleaned)

What are different AI concepts used in ServiceNow?
ServiceNow uses several AI concepts to enhance its capabilities. These include:
1. Agentic AI: This refers to the ability of AI agents to observe, receive issues, and take action on behalf of end users or employees. It helps businesses meet their KPIs and objectives, such as issue deflection or time to resolution, and improves overall productivity at scale.
2. Observability: In the context of AI, observability refers to the ability to collect, analyze, and act upon data from various sources in real-time. This is crucial for understanding system behavior and performance, which is essential for effective AI implementation and management.
3. Large Language Models: These models are integrated into ServiceNow to build new use cases and improve existing ones. They enable natural language processing and generation, which can be used for various applications such as chatbots, virtual agents, and automated responses.
4. Security and Privacy: Gu

In [None]:
logging.getLogger("langchain_community.vectorstores").setLevel(logging.ERROR)
logging.getLogger("transformers").setLevel(logging.ERROR)
query = "What is CMDB in ServiceNow?"
response = qa_chain(query)
raw_output = response["result"]
if "Answer:" in raw_output:
    cleaned = raw_output.split("Answer:")[-1].strip()
else:
    cleaned = raw_output.strip()
print(query)
print(cleaned)