## Whisper Model without Agent - Audio Transcriptions
* Model - Whisper
* Downloading Audio File from Video File
* Transcripting Audio File to Text
* Creating Chunks and Cleaning Chunks
* Retriever and context augmentation

## Step1: Import Libraries & Datasets

In [3]:
%pip install -U openai-whisper --quiet
#%apt-get install ffmpeg --quiet
%pip install faiss-cpu sentence-transformers langchain --quiet
%pip install -U langchain-community --quiet
%pip install -U langchain-openai --quiet

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_openai import OpenAI
from langchain.chains import RetrievalQA
from pathlib import Path
from tqdm.notebook import tqdm
import whisper
import os
import yt_dlp

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
audio_folder = "../audio/audio_files/"
output_folder = "/audio/ServiceNow_Audio_Transcripts/"
os.makedirs(audio_folder, exist_ok=True)
os.makedirs(output_folder, exist_ok=True)

model = whisper.load_model("base")
def download_audio(url, video_id):
    ydl_opts = {
        'format': 'bestaudio/best',
        'outtmpl': f'{audio_folder}{video_id}.%(ext)s',
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192',
        }],
        'quiet': True
    }
    try:
        with yt_dlp.YoutubeDL(ydl_opts) as ydl:
            ydl.download([url])
        return f'{audio_folder}{video_id}.mp3'
    except Exception as e:
        print(f"Error downloading {video_id}: {e}")
        return None

## Step2: Audio File Transcription

In [None]:
for filename in os.listdir(audio_folder):
    if filename.endswith((".mp3", ".wav", ".m4a")):
        base_name = os.path.splitext(filename)[0]
        transcript_path = os.path.join(output_folder, base_name + ".txt")

        if os.path.exists(transcript_path) and os.path.getsize(transcript_path) > 0:
            print(f"Transcript already exists for: {filename} — Skipping.")
            continue

        print(f"Transcribing: {filename}")
        audio_path = os.path.join(audio_folder, filename)
        result = model.transcribe(audio_path)

        with open(transcript_path, "w", encoding='utf-8') as f:
            f.write(result['text'])

Transcribing: 7WJ6lmxa1WQ.mp3




Transcribing: a0fllfx_fmg&list=PLkGSnjw5y2U407_1UQQaVVrD13-MFi5ia&index=22.mp3
Transcribing: E6m8UuVhIzw.mp3
Transcribing: eFMeZto6yMg.mp3
Transcribing: fqB-NcZmqXo.mp3
Transcribing: it1hcs5S1ks&t=27s.mp3
Transcribing: it1hcs5S1ks.mp3
Transcribing: j_PVU9hJTh8.mp3
Transcribing: K6z4c256gzI.mp3
Transcribing: kQV6g8Vbbfc.mp3
Transcribing: KSWNDuKn9t0.mp3
Transcribing: mSYdZW_D67o.mp3
Transcribing: Rx65d0ofz8I.mp3
Transcribing: ThW6lPyYgYk.mp3
Transcribing: tOaMRG8DX3U.mp3
Transcribing: vteLoWpNw8Q.mp3
Transcribing: WyQTP0AA1VU.mp3
Transcribing: x4ZvT7ZmxaI.mp3
Transcribing: XUKTyE2YtHc&list=PLh-u-epknspBswAAKG0EfPHyV6gcVVOhK.mp3
Transcribing: ZYJqkxGrNiI.mp3
Transcribing: _2IG1CX2y6g.mp3


## Step3: Data Cleaning & Chunking

In [None]:
transcript_folder = Path("../audio/ServiceNow_Audio_Transcripts")
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
all_chunks: list[str] = []

intro_phrases = {
    "welcome", "in this video", "today we're", "this tutorial",
    "thank you for joining", "let’s get started", "hello", "hey", "hi",
    "good morning", "good afternoon", "good evening"
}

def is_intro(text: str) -> bool:
    lower = text.lower()
    return any(phrase in lower for phrase in intro_phrases)

for txt_path in tqdm(transcript_folder.glob("*.txt"), desc="Chunking transcripts"):
    raw_text = txt_path.read_text(encoding="utf-8")
    chunks = splitter.split_text(raw_text)
    filtered = [chunk for chunk in chunks if not is_intro(chunk)]
    all_chunks.extend(filtered)

print(f"Total chunks (excluding intros): {len(all_chunks)}")

Chunking transcripts: 0it [00:00, ?it/s]

Total chunks (excluding intros): 70


## Step4: Embedding & Vectorizer

In [None]:
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
faiss_store = FAISS.from_texts(all_chunks, embedding=embedding_model)

  embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")


## Step5: Context Retrieval

In [6]:
from dotenv import load_dotenv
load_dotenv()
llm = OpenAI(
    temperature=0,
    openai_api_key=os.getenv("OPENAI_API_KEY")
)

In [None]:
retriever = faiss_store.as_retriever()
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

question = "What are the steps to configure incident management in ServiceNow?"
answer = qa_chain.run(question)
print("Answer:", answer)

  answer = qa_chain.run(question)


Answer: 
1. Set up the incident management module in ServiceNow: This involves creating a new incident table, configuring fields and forms, and setting up access controls.

2. Define incident categories and subcategories: These categories will help classify and prioritize incidents based on their impact and urgency.

3. Configure incident assignment rules: This will determine how incidents are assigned to specific teams or individuals based on the category, subcategory, and other criteria.

4. Set up incident notification rules: This will define who receives notifications for different types of incidents and how they are notified (e.g. email, SMS, etc.).

5. Create incident templates: These templates can be used to quickly create new incidents with pre-defined information and tasks.

6. Configure SLAs (Service Level Agreements): This will define the response and resolution times for different types of incidents.

7. Set up incident workflows: Workflows can be used to automate certain t

## Step6: Prompt Testing

In [8]:
question = "How does ServiceNow use AI to improve incident management?"
answer = qa_chain.run(question)
print("Question:", question)
print("Answer:", answer)

Question: How does ServiceNow use AI to improve incident management?
Answer:  ServiceNow uses AI, specifically agentic AI, to improve incident management by automating routine tasks and freeing up time for human agents to focus on more complex and critical issues. This includes automating the incident creation process and ensuring that incidents are categorized correctly. This technology is available in the Service Operations Workspace.


In [10]:
question = "What is predictive intelligence in ServiceNow?"
answer = qa_chain.run(question)
print("Question:", question)
print("Answer:", answer)

Question: What is predictive intelligence in ServiceNow?
Answer:  Predictive intelligence in ServiceNow is a feature that suggests values or options to a service desk agent while they are creating an incident. It can be configured in the task intelligence admin console and is designed to save time and reduce errors in the incident creation process.


In [11]:
question = "Explain how machine learning models are trained in ServiceNow Predictive Intelligence."
answer = qa_chain.run(question)
print("Question:", question)
print("Answer:", answer)

Question: Explain how machine learning models are trained in ServiceNow Predictive Intelligence.
Answer:  Machine learning models in ServiceNow Predictive Intelligence are trained using historical data from the ServiceNow platform. This data is used to identify patterns and make predictions about future incidents or requests. The models are continuously trained and improved over time as more data is collected and analyzed. Additionally, users can also provide feedback on the accuracy of the predictions, which helps to further refine the models.


In [12]:
question = "What AI features are available in the ServiceNow platform?"
answer = qa_chain.run(question)
print("Question:", question)
print("Answer:", answer)

Question: What AI features are available in the ServiceNow platform?
Answer:  The ServiceNow platform offers AI features such as data richness, user data, CMDB, knowledge, external enterprise data, Integration Hub, workflow data fabric, and zero copy data. Additionally, the company has recently acquired another company called QIN to accelerate their agentic AI roadmap.
