# Conversation Tree with Audio Generation

This notebook processes educational conversations from CSV files and generates interactive conversation trees with AI-generated audio using the Kokoro text-to-speech model.

## Overview

The notebook performs the following steps:
1. **Data Processing**: Reads conversation CSV files and creates a structured conversation tree
2. **Audio Generation**: Uses Kokoro TTS to generate audio files for each dialogue turn
3. **Link Integration**: Adds audio download links to the conversation tree structure

## Prerequisites & Installation

### Required Dependencies
- Python packages: `kokoro`, `soundfile`, `numpy`, `spacy`
- spaCy English model: `en_core_web_sm`

### espeak-ng Installation (Brief Guide)

**Windows:**
```bash
# Using Chocolatey (recommended)
choco install espeak

# Or download installer from: https://github.com/espeak-ng/espeak-ng/releases
```

**Linux (Ubuntu/Debian):**
```bash
sudo apt-get update
sudo apt-get install espeak-ng
```

**macOS:**
```bash
# Using Homebrew
brew install espeak-ng
```

**Verify Installation:**
```bash
espeak-ng --version
```

> **Note**: espeak-ng provides phonetic processing capabilities that may be used by some TTS engines, though Kokoro primarily uses its own neural processing pipeline.

## Input Data Structure

The notebook expects:
- `simplified/questions-3-items.json`: Root questions file
- `simplified/mini-conversations/`: Folder containing conversation CSV files
- CSV format with columns: `who`, `content`, `original_question`

In [None]:
import os
import csv
import json
import re
from collections import defaultdict

# Clean up text from tags and excess whitespace
def strip_tags(text):
    text = re.sub(r"</?[^>]+>", "", text)
    return re.sub(r"\s+", " ", text).strip()

# Load root questions
with open("simplified/questions-3-items.json", "r", encoding="utf-8") as f:
    data = json.load(f)

root_questions = []
for assessment in data["assessments"]:
    for question in assessment["questions"]:
        root_questions.append(strip_tags(question["text"]))

# Process CSVs into organized threads
conversation_folder = "simplified/mini-conversations"
conversations_by_question = defaultdict(lambda: defaultdict(list))  # {question: {teacher_id: [dialogue]}}

for filename in sorted(os.listdir(conversation_folder)):  # Alphabetical sorting
    if not filename.endswith(".csv"):
        continue

    parts = filename.replace(".csv", "").split("_")
    teacher_num = parts[2]
    student_num = parts[4]
    teacher_id = f"teacher_{teacher_num}"
    student_id = f"student_{student_num}"

    filepath = os.path.join(conversation_folder, filename)
    with open(filepath, "r", encoding="utf-8") as f:
        lines = f.readlines()[4:]  # skip metadata
        reader = csv.DictReader(lines)

        current_question = None
        dialogue = []

        for row in reader:
            who = row["who"].strip().lower()
            content = strip_tags(row["content"].strip())
            orig_question = strip_tags(row.get("original_question", "").strip())

            if orig_question:
                if current_question and dialogue:
                    conversations_by_question[current_question][teacher_id].append({
                        "student_id": student_id,
                        "dialogue": dialogue
                    })
                current_question = orig_question
                dialogue = []

            if current_question:
                dialogue.append({
                    "speaker": f"{who}_{student_num if who == 'student' else teacher_num}",
                    "role": who,
                    "message": content
                })

        # Save last conversation
        if current_question and dialogue:
            conversations_by_question[current_question][teacher_id].append({
                "student_id": student_id,
                "dialogue": dialogue
            })

# Build tree
tree = {"questions": []}
tag_counters = defaultdict(int)

for q_idx, root_q in enumerate(root_questions, 1):
    teacher_dict = conversations_by_question.get(root_q)
    if not teacher_dict:
        continue

    question_node = {
        "speaker": "question",
        "message": root_q,
        "children": []
    }

    for teacher_id in sorted(teacher_dict.keys()):
        student_conversations = teacher_dict[teacher_id]
        for convo in sorted(student_conversations, key=lambda c: c["student_id"]):
            student_id = convo["student_id"]
            dialogue = convo["dialogue"]

            first_msg = next((m for m in dialogue if m["role"] == "teacher"), None)
            if not first_msg:
                continue

            tag_base = f"{teacher_id}_{student_id}_question_{q_idx}"
            tag_counters[tag_base] += 1

            teacher_node = {
                "speaker": first_msg["speaker"],
                "message": first_msg["message"],
                "tag": f"{tag_base}_{tag_counters[tag_base]:04d}",
                "responses": []
            }

            i = dialogue.index(first_msg) + 1
            last_message = first_msg["message"]
            current_node = teacher_node

            while i < len(dialogue):
                msg = dialogue[i]
                speaker = msg["speaker"]
                role = msg["role"]
                message = msg["message"]

                if message == last_message:
                    i += 1
                    continue

                # Merge same-speaker consecutive messages
                while i + 1 < len(dialogue) and dialogue[i + 1]["speaker"] == speaker:
                    next_msg = dialogue[i + 1]["message"]
                    if next_msg != message:
                        message += " " + next_msg
                        i += 1
                    else:
                        i += 1

                tag_counters[tag_base] += 1
                new_node = {
                    "speaker": speaker,
                    "message": message,
                    "tag": f"{tag_base}_{tag_counters[tag_base]:04d}"
                }

                current_node.setdefault("responses", []).append(new_node)
                current_node = new_node
                last_message = message
                i += 1

            question_node["children"].append(teacher_node)

    tree["questions"].append(question_node)

# Save the final tree
output_path = "conversation-tree.json"
with open(output_path, "w", encoding="utf-8") as f:
    json.dump(tree, f, indent=2)

print(f"Conversation tree saved to {output_path}")

Conversation tree saved to conversation-tree.json


## Step 2: Audio Generation with Kokoro TTS

This section generates individual audio files for each dialogue turn using the Kokoro text-to-speech model.

### What this code does:
1. **Initializes Kokoro TTS**: Sets up the neural text-to-speech pipeline
2. **Configures voice mapping**: Assigns different voices to teachers and students
   - Teachers: `af_heart` (female voice)
   - Students: `am_puck` (male pre-teen voice)
3. **Processes the conversation tree**: Recursively traverses all dialogue nodes
4. **Generates audio files**: Creates `.wav` files for each tagged dialogue turn
5. **Handles errors gracefully**: Continues processing even if individual files fail

### Voice Configuration:
- **Teacher voice**: `af_heart` - Professional female voice for educators
- **Student voice**: `am_puck` - Younger male voice for student responses

### Output:
- Audio files saved to `audio/` directory
- Files named using the unique tags from the conversation tree (e.g., `teacher_1_student_2_question_1_0001.wav`)

In [2]:
import os
import json
from kokoro import KPipeline
import soundfile as sf
import numpy as np
import traceback

try:
    # Check if spaCy model is available before initializing Kokoro
    import spacy
    try:
        nlp = spacy.load("en_core_web_sm")
        print("✓ spaCy model verified, initializing Kokoro...")
    except OSError:
        raise Exception("spaCy model 'en_core_web_sm' not found. Please run the previous cell to install it.")
    
    # Initialize Kokoro with explicit repo_id
    pipeline = KPipeline(lang_code='a', repo_id='hexgrad/Kokoro-82M')
    print("✓ Kokoro pipeline initialized successfully!")

    # Speaker-voice mapping
    voices = {
        "teacher": "af_heart",   # Female teacher
        "student": "am_puck"     # Male pre-teen
    }

    # Paths
    json_path = "conversation-tree-with-audio.json"
    output_dir = "audio"
    os.makedirs(output_dir, exist_ok=True)

    # Load JSON
    with open(json_path, "r", encoding="utf-8") as f:
        data = json.load(f)

    # Recursive function to synthesize audio
    def synthesize_from_json(node):
        if isinstance(node, dict):
            if "tag" in node and "message" in node and "speaker" in node:
                tag = node["tag"]
                message = node["message"]
                speaker_type = node["speaker"].split("_")[0]
                voice = voices.get(speaker_type)

                if voice:
                    output_file = os.path.join(output_dir, f"{tag}.wav")
                    if not os.path.exists(output_file):
                        print(f"[Creating] {output_file} with voice '{voice}' and message:\n→ {message}\n")
                        try:
                            audio_segments = []
                            for chunk in pipeline(message, voice=voice):
                                audio_segments.append(chunk.audio.squeeze().cpu().numpy())
                            if audio_segments:
                                full_audio = np.concatenate(audio_segments)
                                sf.write(output_file, full_audio, 24000)
                        except Exception as e:
                            print(f"Error processing {tag}: {str(e)}")
                            traceback.print_exc()  # Show the full traceback for individual file errors
                            return  # Return instead of continue since we're not in a loop
                    else:
                        print(f"[Skipping] {output_file} already exists.\n")

            # Recurse into children and responses
            for key in node:
                if isinstance(node[key], (dict, list)):
                    synthesize_from_json(node[key])

        elif isinstance(node, list):
            for item in node:
                synthesize_from_json(item)

    # Run
    print("Starting audio generation...\n")
    synthesize_from_json(data)
    print("Audio generation completed.")

except Exception as e:
    print("An exception has occurred:")
    print(f"Exception type: {type(e).__name__}")
    print(f"Exception message: {str(e)}")
    traceback.print_exc()



✓ spaCy model verified, initializing Kokoro...
✓ Kokoro pipeline initialized successfully!
Starting audio generation...

[Creating] audio\teacher_1_question_1_0001.wav with voice 'af_heart' and message:
→ Good morning! Let's explore this together. In the context of projectile motion, how would you define a projectile?

[Creating] audio\student_2_question_1_0001.wav with voice 'am_puck' and message:
→ Okay, so a projectile is pretty much an object that's thrown or launched into the air, and then it just keeps moving, mostly because of gravity.

[Creating] audio\teacher_1_question_1_0002.wav with voice 'af_heart' and message:
→ You are correct that a projectile is an object that is thrown or launched into the air and that gravity affects its motion. Remember that in the ideal definition of projectile motion, all other forces apart from gravity are typically disregarded.

[Creating] audio\student_2_question_1_0002.wav with voice 'am_puck' and message:
→ Okay, so it's mainly about gravity 

## Step 3: Audio Link Integration

After generating the audio files, you need to upload them to a Hugging Face dataset repository to make them publicly accessible.

### Manual Upload Process:
1. **Upload audio files** to a Hugging Face dataset repository
2. **Preserve filenames** - ensure the uploaded files keep their original names
3. **Update the base URL** in the code below to match your repository
4. **Run the integration code** to add download links to the conversation tree

### What the integration code does:
- **Adds audio links**: Inserts download URLs for each tagged dialogue turn
- **Preserves structure**: Maintains the original conversation tree hierarchy
- **Creates enhanced JSON**: Outputs a new file with embedded audio links
- **Uses direct download URLs**: Links include `?download=true` for immediate access


In [9]:
# Get HF dataset URL from user input
dataset_url = input("Enter your Hugging Face dataset URL (e.g. https://huggingface.co/datasets/your-username/your-dataset-name): ")
HF_BASE_URL = f"{dataset_url}/resolve/main"

with open("conversation-tree.json", "r", encoding="utf-8") as f:
    data = json.load(f)

def add_audio_link_by_tag(node):
    if isinstance(node, dict):
        tag = node.get("tag", "").strip()
        if tag:
            audio_url = f"{HF_BASE_URL}/{tag}.wav?download=true"

            new_node = {}
            for k, v in node.items():
                new_node[k] = v
                if k == "tag":
                    new_node["audio link"] = audio_url
            node.clear()
            node.update(new_node)

        # Recurse through nested nodes
        for field in ("children", "responses"):
            if field in node:
                for child in node[field]:
                    add_audio_link_by_tag(child)

# Apply the function to each root question
for q in data.get("questions", []):
    add_audio_link_by_tag(q)

# Save the result
output_path = "conversation-tree-with-audio.json"
with open(output_path, "w", encoding="utf-8") as f:
    json.dump(data, f, indent=2, ensure_ascii=False)

print(f"Audio links inserted after tag and saved to: {output_path}")

Audio links inserted after tag and saved to: conversation-tree-with-audio.json
