<a href="https://colab.research.google.com/github/Atharvaud29/MinuteMind/blob/main/Minute_Mind_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## WhisperX and Phi-3 Mini models

In [9]:
!pip install git+https://github.com/m-bain/whisperx.git
!pip install transformers accelerate torchaudio
!pip install pyannote-audio
!pip install dateparser

Collecting git+https://github.com/m-bain/whisperx.git
  Cloning https://github.com/m-bain/whisperx.git to /tmp/pip-req-build-nem6jni8
  Running command git clone --filter=blob:none --quiet https://github.com/m-bain/whisperx.git /tmp/pip-req-build-nem6jni8
  Resolved https://github.com/m-bain/whisperx.git to commit 2d9ce44329ae73af2520196d31cd14b6192ace44
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [10]:
!pip install --upgrade datasets fsspec

Collecting fsspec
  Using cached fsspec-2025.7.0-py3-none-any.whl.metadata (12 kB)


In [11]:
!pip install datasets
from datasets import load_dataset

dataset = load_dataset("knkarthick/dialogsum", split="train")
print(dataset[0])


{'id': 'train_0', 'dialogue': "#Person1#: Hi, Mr. Smith. I'm Doctor Hawkins. Why are you here today?\n#Person2#: I found it would be a good idea to get a check-up.\n#Person1#: Yes, well, you haven't had one for 5 years. You should have one every year.\n#Person2#: I know. I figure as long as there is nothing wrong, why go see the doctor?\n#Person1#: Well, the best way to avoid serious illnesses is to find out about them early. So try to come at least once a year for your own good.\n#Person2#: Ok.\n#Person1#: Let me see here. Your eyes and ears look fine. Take a deep breath, please. Do you smoke, Mr. Smith?\n#Person2#: Yes.\n#Person1#: Smoking is the leading cause of lung cancer and heart disease, you know. You really should quit.\n#Person2#: I've tried hundreds of times, but I just can't seem to kick the habit.\n#Person1#: Well, we have classes and some medications that might help. I'll give you more information before you leave.\n#Person2#: Ok, thanks doctor.", 'summary': "Mr. Smith's 

In [12]:
from datasets import load_dataset

# Load DialogSum dataset
dataset = load_dataset("knkarthick/dialogsum", split="train")

# Print one sample entry
print("Sample Entry:\n")
print("Dialogue:\n", dataset[0]['dialogue'])
print("\nSummary:\n", dataset[0]['summary'])
print("\nTopic:\n", dataset[0]['topic'])


Sample Entry:

Dialogue:
 #Person1#: Hi, Mr. Smith. I'm Doctor Hawkins. Why are you here today?
#Person2#: I found it would be a good idea to get a check-up.
#Person1#: Yes, well, you haven't had one for 5 years. You should have one every year.
#Person2#: I know. I figure as long as there is nothing wrong, why go see the doctor?
#Person1#: Well, the best way to avoid serious illnesses is to find out about them early. So try to come at least once a year for your own good.
#Person2#: Ok.
#Person1#: Let me see here. Your eyes and ears look fine. Take a deep breath, please. Do you smoke, Mr. Smith?
#Person2#: Yes.
#Person1#: Smoking is the leading cause of lung cancer and heart disease, you know. You really should quit.
#Person2#: I've tried hundreds of times, but I just can't seem to kick the habit.
#Person1#: Well, we have classes and some medications that might help. I'll give you more information before you leave.
#Person2#: Ok, thanks doctor.

Summary:
 Mr. Smith's getting a check-up,

In [13]:
sample_dialogue = dataset[0]['dialogue']

prompt = f"""You are an AI assistant. Extract actionable tasks or recommendations from the following conversation.
Include who is responsible for the task if mentioned.

Conversation:
{sample_dialogue}

Output format:
[
  {{ "task": "...", "owner": "..." }},
  ...
]
"""


In [16]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "microsoft/phi-3-mini-4k-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

KeyboardInterrupt: 

In [15]:
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=300)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Extracted Tasks:\n", response)


KeyboardInterrupt: 

In [None]:
instruction_2_prompt = """
Analyze the following dialogue between a customer service representative and a client.
Identify and extract actionable tasks or recommendations.

For each task, include:
- Task description
- Product or service involved
- Urgency level (High / Medium / Low)
- Impact on client's satisfaction (High / Medium / Low)
- Responsible party
- Follow-up required (Yes/No)
- Timeframe (e.g., Within 2 hours, By EOD)

Output format:
[
  {
    "task": "...",
    "product_or_service": "...",
    "urgency": "...",
    "impact": "...",
    "owner": "...",
    "follow_up": "...",
    "timeframe": "..."
  }
]

Dialogue:
#CustomerServiceRep#: Good afternoon, this is Jessica from TechGuru Solutions. How may I assist you today?
#Client#: Hi Jessica, I'm having trouble with my laptop. It's running very slow, and I have an important presentation tomorrow.
#CustomerServiceRep#: I'm sorry to hear that. We’ll help you fix this ASAP. What model is it?
#Client#: It's the TechGuru ProBook 14.
#CustomerServiceRep#: Thank you. Can you tell me when you first noticed the issue?
#Client#: Just this morning. It was working fine yesterday.
#CustomerServiceRep#: Alright. We’ll run a remote diagnostic now and escalate if needed. Is it okay to connect remotely in the next hour?
#Client#: Yes, please. I really need this fixed today.
#CustomerServiceRep#: Got it. I’ll assign this to a senior technician immediately.
"""



In [None]:
inputs = tokenizer(instruction_2_prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=400)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)


## Support Both Audio & Text Inputs

- This hybrid approach gives users flexibility:

  - 🎙️ Upload raw audio → system handles transcription + extraction

  - 📝 Paste or upload text transcript → skip directly to AI analysis

In [None]:

# ## Do not run !!

# def process_input(file_path, input_type="audio"):
#     if input_type == "audio":
#         # Transcribe audio with WhisperX
#         model = whisperx.load_model("small", device="cuda")
#         audio = whisperx.load_audio(file_path)
#         transcript = model.transcribe(audio)["segments"]
#     elif input_type == "text":
#         # Load and clean transcript directly
#         with open(file_path, "r") as f:
#             transcript = f.read()
#     return transcript


In [None]:
# ##Download Outputs to Use Locally

# from google.colab import files
# with open("output.json", "w") as f:
#     json.dump(extracted_data, f)
# files.download("output.json")