#  Smart Call Insights Analyzer

This notebook demonstrates an end-to-end workflow to **transcribe customer support calls**, extract structured insights using **OpenAI's GPT API**, and generate **visual reports and explanations**. The use case is focused on customer queries related to **CCTV camera setup and configuration**.

---

##  Features Covered

-  Transcribe audio files using **Whisper ASR**
-  Extract insights like **summary, topics, action items, sentiment**
-  Visualize trends using **pandas + matplotlib**
-  Explain predictions using **SHAP**
-  Suitable for AWS-based automation pipelines


##  Step 1: Setup Configuration

Define input/output directories, load Whisper ASR model, and configure OpenAI API access.


In [None]:
import os
import json
import glob
import whisper
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor, as_completed
from openai import OpenAI

AUDIO_DIR    = "calls_audio_2/"
OUTPUT_DIR   = "calls_output/"
WHISPER_MODEL = "medium"
MAX_WORKERS  = 4

client = OpenAI(api_key="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")  # Replace with your actual key

whisper_model = whisper.load_model(WHISPER_MODEL)

def transcribe_file(audio_path: str) -> str:
    result = whisper_model.transcribe(audio_path)
    return result["text"]


KeyboardInterrupt



##  Step 2: Transcription with Whisper

Use OpenAI's Whisper ASR (speech-to-text) to convert `.mp3` or `.wav` audio files into transcripts.



In [1]:
INSIGHT_PROMPT = """
You are an AI assistant that reads a call transcript and extracts:
1. A concise summary (2–3 sentences).
2. Key topics covered.
3. Action items (who needs to do what by when, if mentioned).
4. Overall sentiment/tone of the call.
Provide output as a JSON with keys: summary, topics, action_items, sentiment.
Transcript:
\"\"\"
{transcript}
\"\"\"
"""

def extract_insights(transcript: str) -> dict:
    messages = [
        {"role": "system", "content": "You extract structured call insights as JSON."},
        {"role": "user", "content": INSIGHT_PROMPT.format(transcript=transcript)}
    ]
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        temperature=0.0,
        max_tokens=500
    )

    content = resp.choices[0].message.content.strip()
    try:
        return json.loads(content)
    except json.JSONDecodeError:
        return {"error": "Could not parse JSON", "raw": content}

##  Step 4: Full Pipeline Execution

This function processes all calls in the folder by:
1. Transcribing the call
2. Extracting insights
3. Saving results as `.txt` and `.json` files


In [None]:
import os
import json
import glob
import whisper
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor, as_completed
from openai import OpenAI

def process_call(audio_path: str):
    fname = Path(audio_path).stem
    out_folder = Path(OUTPUT_DIR) / fname
    out_folder.mkdir(parents=True, exist_ok=True)

    transcript = transcribe_file(audio_path)
    txt_path = out_folder / f"{fname}.txt"
    with open(txt_path, "w", encoding="utf-8") as f:
        f.write(transcript)

    insights = extract_insights(transcript)
    json_path = out_folder / f"{fname}_insights.json"
    with open(json_path, "w", encoding="utf-8") as f:
        json.dump(insights, f, ensure_ascii=False, indent=2)

    print(f"[Done] {fname} → transcript + insights")
    return fname

def main():
    audio_files = glob.glob(os.path.join(AUDIO_DIR, "*.*"))
    print(f"Found {len(audio_files)} audio files in {AUDIO_DIR!r}")
    dataset_all = []
    for path in audio_files:
        fname = Path(path).stem
        out_folder = Path(OUTPUT_DIR) / fname
        txt_path = out_folder / f"{fname}.txt"
        with open(txt_path, encoding="utf-8") as f:
            dataset_all.append({
                "call_id": fname,
                "transcript": f.read()
            })
    return dataset_all

    # with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
    #     futures = {executor.submit(process_call, path): path for path in audio_files}
    #     for future in as_completed(futures):
    #         path = futures[future]
    #         try:
    #             future.result()
    #         except Exception as exc:
    #             print(f"[Error] {path}: {exc!r}")

if __name__ == "__main__":
    dataset = main()
    with open("dataset_all.json","+a") as file:
        file.write(json.dumps(dataset, indent=4))

NameError: name 'AUDIO_DIR' is not defined

##  Step 5: Flatten & Copy Audio Files

Copy all `.mp3` audio files from nested folders to a flat directory for easy batch processing.


In [None]:
import shutil
SRC_DIR = Path("calls_audio")
DEST_DIR = Path("calls_audio_2")
DEST_DIR.mkdir(parents=True, exist_ok=True)

for mp3_path in SRC_DIR.rglob("*.mp3"):
    dest_path = DEST_DIR / mp3_path.name
    shutil.copy2(mp3_path, dest_path)
    print(f"Copied: {mp3_path} → {dest_path}")

##  Step 6: Data Visualization

Visualize:
- Sentiment distribution
- Action item frequency
- Top 10 topics from the call data

Useful for stakeholder reporting and call center analysis.


##  Step 7: Summary Reporting

Generate structured tables:
- Sentiment Summary
- Action Items Summary
- Topic Frequency
- Overall Call Health (missing insights)

Exports as CSVs for integration with dashboards.


##  Step 8: Explainability with SHAP

Use a surrogate classifier (Logistic Regression on TF-IDF) to simulate how sentiment is predicted. Visualize using SHAP to identify **important keywords** influencing each prediction.


---

##  You're Ready to Analyze Support Calls!

This notebook can be deployed locally or on cloud environments (e.g., SageMaker, EC2, Lambda + Glue).

Make sure to:
- Replace `sk-proj-xxx` with your actual OpenAI API key
- Provide `.mp3` audio files under `calls_audio/`

For production-scale use, this can be automated via AWS Glue and orchestrated using CloudFormation or Airflow.

---