# 📞 Call Quality Analyzer — Final (Colab-ready)

**What it does (assignment-compliant):**
- Works with the required test file.
- Handles poor audio quality with simple normalization.
- Uses **faster-whisper (tiny)** for very fast transcription on Colab.
- Computes: **talk-time ratio, # questions, longest monologue, sentiment, one actionable insight**.
- Attempts simple speaker identification (bonus).
- Saves final results to `call_report.txt`.

**How to run:** Open this notebook in Google Colab, set Runtime → GPU (recommended), then Run → Run all cells.


In [None]:
# Install dependencies (Colab). First run may take ~1-2 minutes.
!pip install -q pytube faster-whisper transformers torch pydub librosa webrtcvad==2.0.10 soundfile
!apt-get update -qq && apt-get install -y -qq ffmpeg
print('Installed')


In [None]:
# Imports & helpers
import os, math, time, re, tempfile, shutil
from pathlib import Path
from pytube import YouTube
from pydub import AudioSegment, effects
from faster_whisper import WhisperModel
from transformers import pipeline
import soundfile as sf
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
print('Device:', device)

# Helper: download audio using pytube (more stable in Colab)
def download_audio_pytube(youtube_url, out_path='call_audio.mp3'):
    print('Downloading with pytube...')
    yt = YouTube(youtube_url)
    stream = yt.streams.filter(only_audio=True).order_by('abr').desc().first()
    stream.download(filename=out_path)
    return out_path

# Helper: convert to WAV mono 16k and normalize audio to help poor quality
def to_wav_mono_16k(src, dst='call_audio.wav'):
    audio = AudioSegment.from_file(src)
    audio = effects.normalize(audio)
    audio = audio.set_frame_rate(16000).set_channels(1).set_sample_width(2)
    audio.export(dst, format='wav')
    return dst

print('Helpers ready')


In [None]:
# MAIN: download -> preprocess -> transcribe
YOUTUBE_URL = 'https://www.youtube.com/watch?v=4ostqJD3Psc'
AUDIO_MP3 = 'call_audio.mp3'
AUDIO_WAV = 'call_audio.wav'

start_time = time.time()
# Download (use pytube); if it fails, user can upload manually
try:
    if not os.path.exists(AUDIO_MP3):
        download_audio_pytube(YOUTUBE_URL, out_path=AUDIO_MP3)
    else:
        print('Audio already downloaded')
except Exception as e:
    print('Download failed:', e)
    raise

# Convert & normalize
print('Converting to WAV (mono, 16k)...')
to_wav_mono_16k(AUDIO_MP3, dst=AUDIO_WAV)
print('WAV ready:', AUDIO_WAV)

# Load Whisper tiny model (fast)
print('Loading Whisper (tiny) model...')
model = WhisperModel('tiny', device=device)
print('Transcribing... (this should be fast)')
segments, info = model.transcribe(AUDIO_WAV, beam_size=5)

transcript = []
for seg in segments:
    transcript.append({'start': round(seg.start,3), 'end': round(seg.end,3), 'text': seg.text.strip(), 'dur': round(seg.end-seg.start,3)})

print(f'Transcribed {len(transcript)} segments in {round(time.time()-start_time,1)}s')
for t in transcript[:5]:
    print(t)


In [None]:
# ANALYSIS: talk-time, questions, longest monologue, sentiment, insight
speaker_durations = {'Speaker_1':0.0, 'Speaker_2':0.0}
speaker_segments = {'Speaker_1':[], 'Speaker_2':[]}
for i, s in enumerate(transcript):
    sp = 'Speaker_1' if i%2==0 else 'Speaker_2'
    speaker_durations[sp] += s['dur']
    speaker_segments[sp].append(s)

total_speech_time = sum(speaker_durations.values()) if sum(speaker_durations.values())>0 else 1.0
talk_ratio = {k: round(100*v/total_speech_time,1) for k,v in speaker_durations.items()}

# Questions: count '?' and WH-patterns as fallback
qcount = 0
for s in transcript:
    text = s['text']
    if '?' in text:
        qcount += text.count('?')
    else:
        if re.search(r"\b(what|why|how|when|where|who|is|are|do|did|can|could|would|should)\b", text.lower()):
            qcount += 1

# Longest monologue: max duration
long_A = max([seg['dur'] for seg in speaker_segments['Speaker_1']] or [0])
long_B = max([seg['dur'] for seg in speaker_segments['Speaker_2']] or [0])
longest = max(long_A, long_B)
longest_speaker = 'Speaker_1' if long_A>=long_B else 'Speaker_2'

# Sentiment: use transformers pipeline on joined transcript (safe length)
classifier = pipeline('sentiment-analysis')
joined_text = ' '.join([s['text'] for s in transcript])[:1000]
sent_res = classifier(joined_text)

# Actionable insight (rule-based)
insight = ''
if talk_ratio['Speaker_1'] > 70 or talk_ratio['Speaker_2'] > 70:
    dom = 'Speaker_1' if talk_ratio['Speaker_1']>talk_ratio['Speaker_2'] else 'Speaker_2'
    insight = f"{dom} dominates the call ({talk_ratio[dom]}%). Suggest: ask more open-ended questions and pause to listen."
elif qcount < 3:
    insight = 'Too few questions asked. Encourage asking more open-ended questions.'
else:
    insight = 'Balanced speaking. Maintain concise summaries and clarifying questions.'

# Likely sales rep heuristic
sales_rep = 'Speaker_1' if talk_ratio['Speaker_1']>talk_ratio['Speaker_2'] else 'Speaker_2'

# Print summary
print('\n===== CALL QUALITY REPORT =====')
print('Talk-time ratio (%):', talk_ratio)
print('Number of questions (heuristic):', qcount)
print('Longest monologue (s):', longest, 'by', longest_speaker)
print('Call sentiment:', sent_res)
print('Actionable insight:', insight)
print('Likely Sales Rep:', sales_rep)


In [None]:
# Save final report to a text file
report_lines = []
report_lines.append('📞 Call Quality Report')
report_lines.append('-'*40)
report_lines.append(f'Talk-time ratio: {talk_ratio}')
report_lines.append(f'Number of questions asked: {qcount}')
report_lines.append(f'Longest monologue duration: {longest:.1f} seconds (by {longest_speaker})')
report_lines.append(f'Overall sentiment: {sent_res[0]["label"]} ({sent_res[0]["score"]:.2f})')
report_lines.append(f'Actionable insight: {insight}')
report_lines.append(f'Likely Sales Rep: {sales_rep}')

with open('call_report.txt','w') as f:
    f.write('\n'.join(report_lines))

print('\n✅ Report saved as call_report.txt')
print('You can download it from the left Files panel in Colab or run:')
print("from google.colab import files; files.download('call_report.txt')")


## Short explanation (<=200 words)

This notebook builds a fast Call Quality Analyzer that downloads the test YouTube audio using `pytube`, normalizes and converts it to 16 kHz mono WAV, and transcribes quickly using **faster-whisper (tiny)** — chosen to meet Colab free-tier time limits. We extract timestamped segments, assign segments alternately to two speakers (lightweight diarization), and compute talk-time ratio, number of questions (heuristic via punctuation/WH-words), the longest monologue, and overall sentiment using HuggingFace transformers. A rule-based system generates one actionable insight and the report is saved as `call_report.txt`. This design prioritizes speed, reproducibility on Colab, and explainability; for production, replace heuristic diarization with `pyannote` and upgrade the ASR model for higher accuracy.
