In [20]:
import whisper

model = whisper.load_model("small")
result = model.transcribe(
    "/Users/rachitdas/Desktop/solvus_assignment/Play.ht - Good Morning, doctor. May I come in?.wav"
)
print(f' The text in video: \n {result["text"]}')

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


 The text in video: 
  Good morning, doctor. May I come in? Good morning. How are you? You do look quite pale this morning. Yes, doctor. I've not been feeling well for the past few days. I've been having a stomach ache for a few days and feeling a bit dizzy since yesterday. Okay, let me check. Apply pressure on the stomach and checks for pain. Does it hurt here? Yes, doctor. The pain there is the sharpest. Well, you are suffering from a stomach infection. That's the reason you are having a stomach ache and also getting dizzy. Did you change your diet recently or have something unhealthy? Actually, I went to a fair last week and ate food from the stalls there. Okay, so you are probably suffering from food poisoning. Since the food stalls and fares are quite unhygienic, there's a high chance those uncovered food might have caused food poisoning. I think I will never eat from any unhygienic place in the future. That's good. I'm prescribing some medicines, have them for one week and come b

In [36]:
from pydub import AudioSegment
from pyannote.audio import Pipeline
from dotenv import load_dotenv
import os
import whisper
import numpy as np
import torch

# Load environment variables
load_dotenv("/Users/rachitdas/Desktop/solvus_assignment/solvus/backend/.env")
token = os.getenv("HUGGING_FACE_TOKEN")

# Check if the token is loaded correctly
if not token:
    raise ValueError("Hugging Face token not found in environment variables")

# Load the diarization pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token=token,
)

# Load the Whisper model
model = whisper.load_model("small")

# Apply diarization to your audio file
audio_file_path = "/Users/rachitdas/Desktop/solvus_assignment/Play.ht - Good Morning, doctor. May I come in?.wav"
diarization = pipeline(audio_file_path)

# Load the audio file using pydub
audio = AudioSegment.from_wav(audio_file_path)

# Change the sampling rate of the audio to match Whisper's expected input (16000 Hz)
new_sampling_rate = 16000  # Desired sampling rate in Hz
audio = audio.set_frame_rate(new_sampling_rate)

# Initialize variables to store transcriptions
speaker_transcriptions = {"SPEAKER_00": [], "SPEAKER_01": []}


# Function to preprocess and transcribe audio using Whisper
def transcribe_audio(audio_segment):
    # Convert pydub audio segment to raw data in NumPy format
    samples = np.array(audio_segment.get_array_of_samples())

    # Normalize audio to float32 format (expected by Whisper)
    samples = samples.astype(np.float32) / np.iinfo(samples.dtype).max

    # Ensure it's a mono audio (required by Whisper)
    if audio_segment.channels > 1:
        samples = samples.reshape((-1, audio_segment.channels)).mean(axis=1)

    # Convert to a tensor and prepare for Whisper
    audio_tensor = torch.from_numpy(samples).to(torch.float32)

    # Use Whisper's helper to pad/trim and transcribe
    audio_tensor = whisper.pad_or_trim(audio_tensor)
    mel = whisper.log_mel_spectrogram(audio_tensor).to(model.device)

    # Transcribe the preprocessed audio
    options = whisper.DecodingOptions(
        language="en", fp16=False
    )  # Customize options if needed
    result = whisper.decode(model, mel, options)

    # Extract text from the result
    transcription = result.text
    return transcription


# Process diarization and transcribe the speakers
for turn, _, speaker in diarization.itertracks(yield_label=True):
    # Extract the relevant audio chunk based on diarization times
    samples = audio[turn.start * 1000 : turn.end * 1000]

    # Transcribe the audio segment using Whisper
    transcription = transcribe_audio(samples)

    # Store the transcriptions for different speakers
    if speaker in speaker_transcriptions:
        speaker_transcriptions[speaker].append(transcription)
    else:
        speaker_transcriptions[speaker].append(transcription)

# Output the transcriptions
print("Transcription for Speaker 1:", speaker_transcriptions.get("SPEAKER_00", []))
print("Transcription for Speaker 2:", speaker_transcriptions.get("SPEAKER_01", []))

Transcription for Speaker 1: ['Good morning.', 'Hello.', 'You do look quite pale this morning.', 'Okay, let me check.', 'Apply pressure on the stomach and check for pain. Does it hurt here?', "While we were suffering from a stomach infection, that's the reason we were having a stomach ache and also getting busy. Did you change your diet recently or have something unhealthy?", "Okay, so you're probably suffering from food poisoning. Since the food stalls and fares are quite unhygienic, there's a high chance those uncovered food might have caused food poisoning.", "That's good.", "I'm prescribing some medicines, have them for one week and come back for a checkup next week, and please try to avoid spicy and fried foods for now.", "Let's go!"]
Transcription for Speaker 2: ['Good morning, Doctor.', 'May I come in?', 'Yes, Doctor.', "I've not been feeling well for the past few days.", "I've been having a stomach ache for a few days and feeling a bit dizzy since yesterday.", 'Yes, Doctor. The

In [27]:
from transformers import pipeline
import torch

speaker1_transcription_1 = speaker_transcriptions.get("SPEAKER_00", [])
speaker2_transcription_2 = speaker_transcriptions.get("SPEAKER_01", [])
summarizer = pipeline("summarization", model="philschmid/bart-large-cnn-samsum",device=torch.device('mps'))

conversation=""

while speaker1_transcription_1 or speaker2_transcription_2:
    if speaker1_transcription_1:
        conversation +="Speaker 1:" + speaker1_transcription_1.pop(0) + "\n"
    if speaker2_transcription_2:
        conversation +="Speaker 2:" + speaker2_transcription_2.pop(0) + "\n"

summarizer(conversation)

[{'summary_text': 'Speaker 2 went to a fair last week and ate food from the snail cell. He had a stomach ache for a few days and has been feeling dizzy since yesterday. He will take medicine for a week and come back for a checkup next week.'}]

In [28]:
summarizer(result["text"])

[{'summary_text': 'Dr. Bhatnagar went to a fair last week and ate food from unhygienic stalls and fares. He has a stomach infection. He will take medicines for a week and come back for a checkup next week. He should avoid spicy and fried foods for now.'}]

In [34]:
import ollama

from ollama import generate

prompt=f"""
You are a transcriber and need to summarise a conversation between a doctor and a patient.
{result["text"]} 
"""

# Generate a summary of the conversation
summary = generate(model="mistral:latest", prompt=prompt)
print(summary["response"])

 The conversation revolves around a patient who has been feeling unwell for the past few days with symptoms of a stomach ache and dizziness. The doctor suspects a stomach infection or food poisoning caused by eating food from an unhygienic fair last week. The patient agrees to follow a prescription for one week, avoid spicy and fried foods, and return for a checkup the following week. The patient expresses intention to avoid consuming food from unhygienic places in the future.


In [35]:
import ollama

from ollama import generate

prompt = f"""
You are a transcriber and need to summarise a conversation between a doctor and a patient.
{conversation} 
"""

# Generate a summary of the conversation
summary = generate(model="mistral:latest", prompt=prompt)
print(summary["response"])

 In this conversation, the patient discusses their recent health issues (stomach pain and dizziness) with the doctor. The doctor suspects food poisoning due to the patient's recent consumption of unhygienic food from a fair. The doctor prescribes medication for a week and advises the patient to avoid spicy and fried foods, requesting a follow-up visit next week. The patient expresses gratitude and intends to be more cautious regarding their food choices in the future.


In [1]:
with open("/Users/rachitdas/Desktop/solvus_assignment/test.wav", "rb") as f:
    print(f.read())

b'RIFFr\x8e\t\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x01\x00D\xac\x00\x00\x88X\x01\x00\x02\x00\x10\x00LIST\x1a\x00\x00\x00INFOISFT\x0e\x00\x00\x00Lavf58.76.100\x00data,\x8e\t\x001\x00.\x00+\x00-\x003\x006\x006\x007\x008\x00:\x00<\x00?\x00B\x00D\x00E\x00C\x00B\x00A\x00B\x00B\x00A\x00@\x00B\x00D\x00C\x00A\x00B\x00C\x00B\x00@\x00B\x00H\x00J\x00H\x00H\x00J\x00J\x00I\x00I\x00K\x00L\x00I\x00H\x00I\x00J\x00J\x00M\x00R\x00S\x00O\x00N\x00R\x00U\x00Q\x00N\x00Q\x00V\x00U\x00O\x00P\x00W\x00Z\x00V\x00S\x00U\x00Z\x00Y\x00W\x00Y\x00^\x00_\x00\\\x00Z\x00]\x00`\x00a\x00a\x00c\x00f\x00g\x00f\x00f\x00g\x00i\x00j\x00k\x00k\x00k\x00m\x00q\x00r\x00p\x00o\x00q\x00u\x00w\x00v\x00u\x00v\x00x\x00|\x00}\x00|\x00z\x00z\x00}\x00~\x00~\x00~\x00\x80\x00\x82\x00\x81\x00\x7f\x00\x80\x00\x82\x00\x82\x00~\x00|\x00}\x00\x80\x00\x81\x00~\x00{\x00{\x00}\x00\x7f\x00\x82\x00\x85\x00\x86\x00\x84\x00\x80\x00~\x00~\x00~\x00|\x00z\x00z\x00z\x00z\x00x\x00x\x00y\x00y\x00x\x00x\x00z\x00|\x00{\x00x\x00w\x00w\x00w\x00v\x00v\x00x\x00x\x0