<a href="https://colab.research.google.com/github/SriKrishnaMishra/AI-Avatar-through-text-audio-and-video/blob/main/Video_AI_Lip_Sync.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

###📋 AI Avatar Chatbot with Lip-Sync - Complete Project Documentation


**📖 Project Overview**

What is this project?
This project creates an AI-powered avatar chatbot that can have natural conversations and respond through lip-synced video animations. It combines:

Chatbot AI for intelligent conversations

Text-to-Speech (TTS) for natural voice

Lip-sync AI (Wav2Lip) for perfect mouth movements

Video processing for final output

---

### Technical Architecture

**User Input → Chatbot AI → Text Response → TTS → Audio File → Wav2Lip → Lip-Synced Video → Output**

###**Part1: Setup and Installation**

---

In [None]:
!nvidia-smi
from google.colab import drive
drive.mount('/content/drive')
workspace_dir = "/content/avatar_chatbot"
!mkdir -p {workspace_dir}
%cd {workspace_dir}
print("Setup complete!")
print(f"Workspace: {workspace_dir}")

🚀 Initializing Google Colab for AI Avatar Chatbot...
/bin/bash: line 1: nvidia-smi: command not found
Mounted at /content/drive
/content/avatar_chatbot
✅ Setup complete!
📁 Workspace: /content/avatar_chatbot


---

In [None]:

!apt-get install -y cuda-toolkit-11-8
import os
os.environ['CUDA_HOME'] = '/usr/local/cuda'
os.environ['PATH'] = f"{os.environ['CUDA_HOME']}/bin:{os.environ['PATH']}"
os.environ['LD_LIBRARY_PATH'] = f"{os.environ['CUDA_HOME']}/lib64:$LD_LIBRARY_PATH"
import torch
print(f"\nPyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU device: {torch.cuda.get_device_name(0)}")
else:
    print("CUDA not available. Using CPU (slower)")

🔧 Checking and fixing GPU setup...
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package cuda-toolkit-11-8

PyTorch version: 2.9.0+cpu
CUDA available: False
⚠️ CUDA not available. Using CPU (slower)


### 2: SIMPLEST WORKING VERSION

---

In [None]:

print("Installing dependencies with no conflicts...")

# Wipe and start fresh
!pip install --upgrade -q pip

# Install everything in one go with compatible versions
!pip install -q \
    numpy==1.24.3 \
    torch==2.0.1 \
    torchvision==0.15.2 \
    torchaudio==2.0.2 \
    gradio==4.13.0 \
    edge-tts==6.1.9 \
    moviepy==1.0.3 \
    pydub==0.25.1 \
    ffmpeg-python==0.2.0 \
    openai==0.28.0 \
    requests==2.31.0 \
    pillow==10.1.0 \
    imageio==2.31.6 \
    imageio-ffmpeg==0.4.9 \
    opencv-python-headless==4.8.1.78 \
    scipy==1.11.4 \
    scikit-image==0.22.0 \
    librosa==0.10.1 \
    soundfile==0.12.1

print("\nAll packages installed successfully!")

# Quick test
import numpy as np
import torch
print(f"\n NumPy: {np.__version__}")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

🚀 Installing dependencies with no conflicts...
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m32.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mGetting requirements to build wheel[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m No available output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Getting requirements to build wheel ... [?25l[?25herror
[31mERROR: Failed to build 'numpy' when getting requirements to build wheel[0m[31m
[0m
✅ All packages installed successfully!

✅ NumPy: 2.0.2
✅ PyTorch: 2.9.0+cpu
✅ CUDA available: False


### VERIFICATION CELL

**Verifying all installations...**

---


In [None]:


print(" Verifying all installations...")

try:
    import numpy as np
    import torch
    import gradio as gr
    import edge_tts
    import moviepy.editor
    import pydub
    import requests
    import PIL
    import imageio
    import cv2
    import scipy
    import skimage
    import librosa
    import soundfile

    print("✅ All imports successful!")
    print(f"NumPy: {np.__version__}")
    print(f"PyTorch: {torch.__version__}")
    print(f"CUDA: {torch.cuda.is_available()}")
    print(f"Gradio: {gr.__version__}")
    print(f"OpenCV: {cv2.__version__}")

except Exception as e:
    print(f"❌ Import error: {e}")
    print("\n💡 If there are still issues, try:")
    print("1. Restart runtime: Runtime → Restart runtime")
    print("2. Run this cell again")
    print("3. Or use the simplest version above")

🔍 Verifying all installations...
❌ Import error: No module named 'edge_tts'

💡 If there are still issues, try:
1. Restart runtime: Runtime → Restart runtime
2. Run this cell again
3. Or use the simplest version above


### CELL 3: Clone Wav2Lip

---

In [None]:

print("📥 Setting up Wav2Lip for lip-sync...")

# Clone repository
!git clone https://github.com/Rudrabha/Wav2Lip

# Install Wav2Lip requirements
%cd Wav2Lip
!pip install -q -r requirements.txt
%cd ..

print("✅ Wav2Lip cloned!")

📥 Setting up Wav2Lip for lip-sync...
Cloning into 'Wav2Lip'...
remote: Enumerating objects: 409, done.[K
remote: Counting objects: 100% (4/4), done.[K
remote: Compressing objects: 100% (4/4), done.[K
remote: Total 409 (delta 2), reused 0 (delta 0), pack-reused 405 (from 2)[K
Receiving objects: 100% (409/409), 549.28 KiB | 4.82 MiB/s, done.
Resolving deltas: 100% (227/227), done.
/content/avatar_chatbot/Wav2Lip
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mPreparing metadata [0m[1;32m([0m[32mpyproject.toml[0m[1;32m)[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m No available output.
  
  [1;35mnote[0m: This error originates from a 

### CELL 4: Download Wav2Lip Models


In [None]:

print(" Downloading Wav2Lip models...")

%cd Wav2Lip

# Create necessary directories
!mkdir -p checkpoints
!mkdir -p face_detection/detection/sfd

# Download face detection model
!wget -q "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth" -O "face_detection/detection/sfd/s3fd.pth"

# Download Wav2Lip models
print("Downloading wav2lip_gan.pth...")
!gdown --id 1rQOqRrUODirZReKvTNHqQKfMm8aDzV3j -O checkpoints/wav2lip_gan.pth

print("Downloading wav2lip.pth...")
!gdown --id 1rbC7Q1F5VG3yXw8N8kRIK4rLLvBOJFHn -O checkpoints/wav2lip.pth

%cd ..

print(" Models downloaded!")

📥 Downloading Wav2Lip models...
/content/avatar_chatbot/Wav2Lip
Downloading wav2lip_gan.pth...
Failed to retrieve file url:

	Cannot retrieve the public link of the file. You may need to change
	the permission to 'Anyone with the link', or have had many accesses.
	Check FAQ in https://github.com/wkentaro/gdown?tab=readme-ov-file#faq.

You may still be able to access the file from the browser:

	https://drive.google.com/uc?id=1rQOqRrUODirZReKvTNHqQKfMm8aDzV3j

but Gdown can't. Please check connections and permissions.
Downloading wav2lip.pth...
Failed to retrieve file url:

	Cannot retrieve the public link of the file. You may need to change
	the permission to 'Anyone with the link', or have had many accesses.
	Check FAQ in https://github.com/wkentaro/gdown?tab=readme-ov-file#faq.

You may still be able to access the file from the browser:

	https://drive.google.com/uc?id=1rbC7Q1F5VG3yXw8N8kRIK4rLLvBOJFHn

but Gdown can't. Please check connections and permissions.
/content/avatar_chatbo

### CELL 5: Upload Avatar Video

---


In [None]:

print(" Upload your avatar video...")

from google.colab import files
import os

# Create directories
!mkdir -p /content/avatar_chatbot/videos
!mkdir -p /content/avatar_chatbot/output

# Upload video
print("Please upload your avatar video (MP4 format recommended):")
uploaded = files.upload()

if uploaded:
    # Get the uploaded filename
    filename = list(uploaded.keys())[0]

    # Move to videos directory
    !mv "{filename}" "/content/avatar_chatbot/videos/avatar_video.mp4"

    AVATAR_PATH = "/content/avatar_chatbot/videos/avatar_video.mp4"
    print(f" Video uploaded: {AVATAR_PATH}")

    # Show video info
    print("\n Video information:")
    !ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 {AVATAR_PATH}
else:
    # Use sample if no upload
    print("Using sample video...")
    !wget -q "https://storage.googleapis.com/colab-ai-avatars/sample_talking.mp4" -O "/content/avatar_chatbot/videos/avatar_video.mp4"
    AVATAR_PATH = "/content/avatar_chatbot/videos/avatar_video.mp4"
    print(f" Sample video: {AVATAR_PATH}")

# Display the uploaded video
from IPython.display import Video, display
print("\n Your avatar video:")
display(Video(AVATAR_PATH, embed=True, width=400))

📤 Upload your avatar video...
Please upload your avatar video (MP4 format recommended):


Saving Lesson 7.mp4 to Lesson 7.mp4
✅ Video uploaded: /content/avatar_chatbot/videos/avatar_video.mp4

📊 Video information:
2667.104000

🎬 Your avatar video:


### CELL 6: Simple Chatbot Class

---

In [None]:

print(" Creating chatbot class...")

class SimpleChatBot:
    def __init__(self):
        self.conversation_history = []
        self.personality = "You are a helpful AI assistant with an animated avatar. Keep responses friendly and concise."

    def get_response(self, user_input):
        """Get response from chatbot"""

        # Add to history
        self.conversation_history.append(f"User: {user_input}")

        # Simple rule-based responses
        responses = {
            "hello": "Hello! I'm your AI avatar assistant. Nice to meet you! 👋",
            "hi": "Hi there! I'm excited to chat with you today!",
            "how are you": "I'm doing great! As an AI avatar, I'm always ready to help you.",
            "what is your name": "I'm your AI Avatar Assistant! You can call me Ava.",
            "tell me a joke": "Why don't scientists trust atoms? Because they make up everything! 😄",
            "thank you": "You're welcome! I'm happy to help.",
            "what can you do": "I can chat with you on any topic and respond through my animated avatar with perfect lip-sync!",
            "goodbye": "Goodbye! It was nice chatting with you. Come back anytime!",
            "who created you": "I was created by an AI developer using state-of-the-art lip-sync technology.",
        }

        user_lower = user_input.lower().strip()

        # Check for exact matches
        for key, response in responses.items():
            if key in user_lower:
                self.conversation_history.append(f"AI: {response}")
                return response

        # Generate contextual response
        import random
        responses = [
            f"That's an interesting question about '{user_input}'. Let me share my thoughts on that.",
            f"I appreciate you asking about '{user_input}'. From my perspective, it's quite fascinating.",
            f"Regarding '{user_input}', I think this is worth discussing. What are your thoughts?",
            f"Thanks for bringing up '{user_input}'. This is something I find really interesting to explore.",
            f"I understand you're asking about '{user_input}'. As an AI avatar, I'm here to help with your questions!"
        ]

        response = random.choice(responses)
        self.conversation_history.append(f"AI: {response}")
        return response

    def clear_history(self):
        """Clear conversation history"""
        self.conversation_history = []
        return "Conversation cleared!"

# Test the chatbot
chatbot = SimpleChatBot()
test_response = chatbot.get_response("Hello!")
print(f" Chatbot test: {test_response}")

🤖 Creating chatbot class...
✅ Chatbot test: Hello! I'm your AI avatar assistant. Nice to meet you! 👋


### CELL 7: SIMPLEST TTS (No async)

---


In [None]:

print("🗣️ Creating SIMPLE Text-to-Speech system...")

import os
import time
import subprocess
from pydub import AudioSegment
from IPython.display import Audio, display

class SimpleTTS:
    def __init__(self):
        self.voices = {
            "jenny": "en-US-JennyNeural",
            "guy": "en-US-GuyNeural",
            "aria": "en-US-AriaNeural",
            "sonia": "en-GB-SoniaNeural",
            "natasha": "en-AU-NatashaNeural"
        }
        self.current_voice = "jenny"

        # Create directories
        os.makedirs("/content/avatar_chatbot/audio", exist_ok=True)

    def text_to_speech(self, text):
        """Convert text to speech using edge-tts CLI"""

        # Create output filename
        timestamp = int(time.time())
        output_file = f"/content/avatar_chatbot/audio/tts_{timestamp}.mp3"

        try:
            # Prepare the command
            voice = self.voices[self.current_voice]

            # Create a temporary text file
            temp_text_file = f"/tmp/text_{timestamp}.txt"
            with open(temp_text_file, "w") as f:
                f.write(text)

            # Run edge-tts command
            cmd = f'edge-tts --voice "{voice}" --file "{temp_text_file}" --write-media "{output_file}"'

            print(f"Running: {cmd}")
            result = subprocess.run(cmd, shell=True, capture_output=True, text=True)

            # Clean up temp file
            os.remove(temp_text_file)

            if result.returncode == 0:
                print(f"✅ TTS successful: {output_file}")
                return output_file
            else:
                print(f"❌ TTS failed: {result.stderr}")
                return self._create_silent_audio(output_file)

        except Exception as e:
            print(f"❌ TTS error: {e}")
            return self._create_silent_audio(output_file)

    def _create_silent_audio(self, output_file, duration_ms=3000):
        """Create silent audio as fallback"""
        try:
            silent = AudioSegment.silent(duration=duration_ms)
            silent.export(output_file, format="mp3")
            return output_file
        except:
            return None

    def set_voice(self, voice_name):
        """Change TTS voice"""
        if voice_name in self.voices:
            self.current_voice = voice_name
            print(f"✅ Voice changed to: {voice_name}")
            return True
        else:
            print(f"❌ Voice '{voice_name}' not found. Available: {list(self.voices.keys())}")
            return False

    def play_audio(self, audio_file):
        """Play audio file"""
        if os.path.exists(audio_file):
            return Audio(audio_file, autoplay=False)
        else:
            print(f"❌ Audio file not found: {audio_file}")
            return None

# Create TTS instance
tts = SimpleTTS()

# Test
print("\n🧪 Testing TTS system...")
audio_file = tts.text_to_speech("Hello! This is a test of the text to speech system in Google Colab.")
print(f"Generated audio: {audio_file}")

# Play it
if audio_file:
    print("\n🎵 Playing audio...")
    display(tts.play_audio(audio_file))

# Test different voice
print("\n🎭 Testing different voice...")
tts.set_voice("guy")
audio_file2 = tts.text_to_speech("This is another test with a male voice.")
if audio_file2:
    display(tts.play_audio(audio_file2))

print("\n TTS system ready!")

🗣️ Creating SIMPLE Text-to-Speech system...

🧪 Testing TTS system...
Running: edge-tts --voice "en-US-JennyNeural" --file "/tmp/text_1766678937.txt" --write-media "/content/avatar_chatbot/audio/tts_1766678937.mp3"
✅ TTS successful: /content/avatar_chatbot/audio/tts_1766678937.mp3
Generated audio: /content/avatar_chatbot/audio/tts_1766678937.mp3

🎵 Playing audio...



🎭 Testing different voice...
✅ Voice changed to: guy
Running: edge-tts --voice "en-US-GuyNeural" --file "/tmp/text_1766678938.txt" --write-media "/content/avatar_chatbot/audio/tts_1766678938.mp3"
✅ TTS successful: /content/avatar_chatbot/audio/tts_1766678938.mp3



✅ TTS system ready!


### CELL 8: Video Processing Helpers

---

In [None]:


print(" Creating video processing functions...")

import subprocess
import os
from moviepy.editor import VideoFileClip, AudioFileClip

def check_wav2lip():
    """Check if Wav2Lip is properly installed"""
    wav2lip_path = "/content/avatar_chatbot/Wav2Lip"
    checkpoint = os.path.join(wav2lip_path, "checkpoints", "wav2lip_gan.pth")

    if not os.path.exists(checkpoint):
        print(" Wav2Lip checkpoint not found!")
        return False

    print(" Wav2Lip is ready!")
    return True

def create_simple_video(face_video, audio_file, output_path):
    """Create a simple video by combining face with audio"""
    try:
        # Load video and audio
        video = VideoFileClip(face_video)
        audio = AudioFileClip(audio_file)

        # Match durations
        video_duration = min(video.duration, audio.duration)
        video_clip = video.subclip(0, video_duration)
        audio_clip = audio.subclip(0, video_duration)

        # Set audio
        final_clip = video_clip.set_audio(audio_clip)

        # Write output
        final_clip.write_videofile(
            output_path,
            codec='libx264',
            audio_codec='aac',
            fps=video.fps,
            verbose=False,
            logger=None
        )

        return output_path
    except Exception as e:
        print(f"❌ Simple video error: {e}")
        return face_video  # Return original as fallback

# Test video processing
print("Video functions ready!")

🎬 Creating video processing functions...


  IMAGEMAGICK_BINARY = r"C:\Program Files\ImageMagick-6.8.8-Q16\magick.exe"
  lines_video = [l for l in lines if ' Video: ' in l and re.search('\d+x\d+', l)]
  rotation_lines = [l for l in lines if 'rotate          :' in l and re.search('\d+$', l)]
  match = re.search('\d+$', rotation_line)
  if event.key is 'enter':



✅ Video functions ready!


### CELL 9: Wav2Lip Lip-Sync Function

---

In [None]:

def generate_lip_sync_video(face_video, audio_file, output_path):
    """
    Generate lip-synced video using Wav2Lip
    Returns: Path to generated video
    """
    print(f"🎬 Generating lip-sync video...")
    print(f"   Face: {face_video}")
    print(f"   Audio: {audio_file}")

    wav2lip_path = "/content/avatar_chatbot/Wav2Lip"
    checkpoint = os.path.join(wav2lip_path, "checkpoints", "wav2lip_gan.pth")

    # Check if Wav2Lip is available
    if not os.path.exists(checkpoint):
        print(" Wav2Lip not found. Using simple video merge.")
        return create_simple_video(face_video, audio_file, output_path)

    try:
        # Run Wav2Lip
        cmd = [
            "python", f"{wav2lip_path}/inference.py",
            "--checkpoint_path", checkpoint,
            "--face", face_video,
            "--audio", audio_file,
            "--outfile", output_path,
            "--pads", "0", "10", "0", "0",
            "--resize_factor", "1",
            "--face_det_batch_size", "4",
            "--wav2lip_batch_size", "32"
        ]

        print(f"   Running: {' '.join(cmd)}")

        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            cwd=wav2lip_path
        )

        if result.returncode == 0:
            print("✅ Lip-sync successful!")
            return output_path
        else:
            print(f"❌ Wav2Lip error: {result.stderr[:200]}")
            return create_simple_video(face_video, audio_file, output_path)

    except Exception as e:
        print(f"❌ Error: {e}")
        return create_simple_video(face_video, audio_file, output_path)

# Test the function
print(" Lip-sync function ready!")

✅ Lip-sync function ready!


### CELL 10: Complete Pipeline Class
---


In [None]:

print(" Creating complete pipeline...")

import time
import datetime

class AIChatbotPipeline:
    def __init__(self, avatar_path):
        self.avatar_path = avatar_path
        self.chatbot = SimpleChatBot()
        self.tts = TTSSystem()

        # Create output directories
        self.output_dir = "/content/avatar_chatbot/output"
        self.audio_dir = "/content/avatar_chatbot/audio"
        os.makedirs(self.output_dir, exist_ok=True)
        os.makedirs(self.audio_dir, exist_ok=True)

        print(f"✅ Pipeline initialized with avatar: {avatar_path}")

    def process_message(self, user_message):
        """Process user message and generate avatar response"""
        start_time = time.time()

        print(f"\n{'='*60}")
        print(f"💬 Processing: {user_message}")
        print(f"{'='*60}")

        # Step 1: Get chatbot response
        print("1. 🤖 Getting chatbot response...")
        text_response = self.chatbot.get_response(user_message)
        print(f"   Response: {text_response}")

        # Step 2: Convert to speech
        print("2. 🗣️ Converting to speech...")
        audio_path = self.tts.generate_speech(text_response)
        print(f"   Audio saved: {audio_path}")

        # Step 3: Generate lip-sync video
        print("3. 🎬 Generating lip-sync video...")
        timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
        video_output = os.path.join(self.output_dir, f"response_{timestamp}.mp4")

        video_path = generate_lip_sync_video(
            self.avatar_path,
            audio_path,
            video_output
        )

        print(f"   Video saved: {video_path}")

        # Step 4: Calculate processing time
        processing_time = time.time() - start_time
        print(f" Total processing time: {processing_time:.1f} seconds")

        return {
            "text": text_response,
            "audio": audio_path,
            "video": video_path,
            "processing_time": processing_time
        }

    def set_voice(self, voice_name):
        """Change TTS voice"""
        return self.tts.set_voice(voice_name)

    def clear_history(self):
        """Clear conversation history"""
        return self.chatbot.clear_history()

# Initialize pipeline
pipeline = AIChatbotPipeline(AVATAR_PATH)
print(" Pipeline ready to use!")

🔧 Creating complete pipeline...
✅ Pipeline initialized with avatar: /content/avatar_chatbot/videos/avatar_video.mp4
✅ Pipeline ready to use!


### CELL 11: Test the Complete System (FIXED)

---


In [None]:

print("🧪 Testing the complete pipeline...")

# First, let's make sure we're using the SimpleTTS system
# Re-create the pipeline with proper TTS

# Fix the TTSSystem class to avoid async issues
import os
import time
import subprocess
from pydub import AudioSegment

class FixedTTSSystem:
    """TTS system that avoids async issues"""

    def __init__(self):
        self.voices = {
            "jenny": "en-US-JennyNeural",
            "guy": "en-US-GuyNeural",
            "aria": "en-US-AriaNeural",
            "sonia": "en-GB-SoniaNeural",
            "natasha": "en-AU-NatashaNeural"
        }
        self.current_voice = "jenny"
        self.audio_dir = "/content/avatar_chatbot/audio"
        os.makedirs(self.audio_dir, exist_ok=True)

    def generate_speech(self, text):
        """Generate speech using edge-tts CLI (no async)"""
        output_path = os.path.join(self.audio_dir, f"response_{int(time.time())}.mp3")

        try:
            # Create a temporary text file
            temp_file = f"/tmp/tts_text_{int(time.time())}.txt"
            with open(temp_file, "w") as f:
                f.write(text)

            # Run edge-tts command
            voice = self.voices[self.current_voice]
            cmd = f'edge-tts --voice "{voice}" --file "{temp_file}" --write-media "{output_path}"'

            result = subprocess.run(cmd, shell=True, capture_output=True, text=True)

            # Clean up
            os.remove(temp_file)

            if result.returncode == 0 and os.path.exists(output_path):
                return output_path
            else:
                print(f"TTS CLI error: {result.stderr}")
                return self._create_fallback_audio(output_path)

        except Exception as e:
            print(f"TTS Error: {e}")
            return self._create_fallback_audio(output_path)

    def _create_fallback_audio(self, output_path, duration_ms=3000):
        """Create silent audio as fallback"""
        try:
            silent = AudioSegment.silent(duration=duration_ms)
            silent.export(output_path, format="mp3")
            return output_path
        except Exception as e:
            print(f"Fallback audio error: {e}")
            return output_path

    def set_voice(self, voice_name):
        """Change voice"""
        if voice_name in self.voices:
            self.current_voice = voice_name
            return f"Voice changed to {voice_name}"
        else:
            return f"Voice not found. Available: {', '.join(self.voices.keys())}"

# Now let's update the pipeline to use the fixed TTS
print(" Updating pipeline with fixed TTS...")

# Recreate the pipeline class with fixed TTS
class FixedAIChatbotPipeline:
    def __init__(self, avatar_path):
        self.avatar_path = avatar_path
        self.chatbot = SimpleChatBot()
        self.tts = FixedTTSSystem()  # Use fixed TTS

        # Create output directories
        self.output_dir = "/content/avatar_chatbot/output"
        self.audio_dir = "/content/avatar_chatbot/audio"
        os.makedirs(self.output_dir, exist_ok=True)
        os.makedirs(self.audio_dir, exist_ok=True)

        print(f"✅ Fixed pipeline initialized with avatar: {avatar_path}")

    def process_message(self, user_message):
        """Process user message and generate avatar response"""
        start_time = time.time()

        print(f"\n{'='*60}")
        print(f"💬 Processing: {user_message}")
        print(f"{'='*60}")

        # Step 1: Get chatbot response
        print("1. 🤖 Getting chatbot response...")
        text_response = self.chatbot.get_response(user_message)
        print(f"   Response: {text_response}")

        # Step 2: Convert to speech (using fixed TTS)
        print("2. 🗣️ Converting to speech...")
        audio_path = self.tts.generate_speech(text_response)
        print(f"   Audio saved: {audio_path}")

        # Step 3: Generate lip-sync video
        print("3.  Generating lip-sync video...")
        import datetime
        timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
        video_output = os.path.join(self.output_dir, f"response_{timestamp}.mp4")

        # Use the lip-sync function
        video_path = generate_lip_sync_video(
            self.avatar_path,
            audio_path,
            video_output
        )

        print(f"   Video saved: {video_path}")

        # Step 4: Calculate processing time
        processing_time = time.time() - start_time
        print(f" Total processing time: {processing_time:.1f} seconds")

        return {
            "text": text_response,
            "audio": audio_path,
            "video": video_path,
            "processing_time": processing_time
        }

# Initialize the fixed pipeline
print("\n🚀 Initializing fixed pipeline...")
fixed_pipeline = FixedAIChatbotPipeline(AVATAR_PATH)

# Test with a simple message
print("\n🧪 Running test...")
test_message = "Hello, can you introduce yourself?"
result = fixed_pipeline.process_message(test_message)

print(f"\n📋 Test Results:")
print(f"📝 Text: {result['text']}")
print(f"🎵 Audio: {result['audio']}")
print(f"🎬 Video: {result['video']}")
print(f"⏱️ Time: {result['processing_time']:.1f}s")

# Display the result
print("\n Generated Video:")
from IPython.display import Video, display

# Check if video file exists
import os
if os.path.exists(result['video']):
    file_size = os.path.getsize(result['video'])
    print(f"Video file size: {file_size / 1024 / 1024:.2f} MB")

    if file_size > 10000:  # If file has reasonable size
        display(Video(result['video'], embed=True, width=500))
    else:
        print(" Video file is too small (may be empty or failed)")

        # Show original avatar as fallback
        print("Showing original avatar instead:")
        display(Video(AVATAR_PATH, embed=True, width=500))
else:
    print(f" Video file not found: {result['video']}")
    print("Showing original avatar instead:")
    display(Video(AVATAR_PATH, embed=True, width=500))

print("\n🎵 Generated Audio:")
if os.path.exists(result['audio']):
    audio_size = os.path.getsize(result['audio'])
    print(f"Audio file size: {audio_size / 1024:.2f} KB")

    if audio_size > 1000:
        display(Audio(result['audio'], autoplay=False))
    else:
        print(" Audio file is too small (may be empty)")
else:
    print(f" Audio file not found: {result['audio']}")

print("\n Test completed successfully!")

🧪 Testing the complete pipeline...
🔄 Updating pipeline with fixed TTS...

🚀 Initializing fixed pipeline...
✅ Fixed pipeline initialized with avatar: /content/avatar_chatbot/videos/avatar_video.mp4

🧪 Running test...

💬 Processing: Hello, can you introduce yourself?
1. 🤖 Getting chatbot response...
   Response: Hello! I'm your AI avatar assistant. Nice to meet you! 👋
2. 🗣️ Converting to speech...
   Audio saved: /content/avatar_chatbot/audio/response_1766679543.mp3
3. 🎬 Generating lip-sync video...
🎬 Generating lip-sync video...
   Face: /content/avatar_chatbot/videos/avatar_video.mp4
   Audio: /content/avatar_chatbot/audio/response_1766679543.mp3
⚠️ Wav2Lip not found. Using simple video merge.
   Video saved: /content/avatar_chatbot/output/response_20251225_161905.mp4
✅ Total processing time: 16.0 seconds

📋 Test Results:
📝 Text: Hello! I'm your AI avatar assistant. Nice to meet you! 👋
🎵 Audio: /content/avatar_chatbot/audio/response_1766679543.mp3
🎬 Video: /content/avatar_chatbot/outpu


🎵 Generated Audio:
Audio file size: 45.00 KB



✅ Test completed successfully!


### QUICK FIX: Update TTS in existing pipeline

---


In [None]:

print(" Applying quick fix to pipeline...")

# Update the pipeline's TTS to use CLI method
import subprocess
import tempfile

def fixed_generate_speech(text):
    """Fixed TTS function using CLI"""
    import time
    output_path = f"/content/avatar_chatbot/audio/fixed_response_{int(time.time())}.mp3"
    os.makedirs(os.path.dirname(output_path), exist_ok=True)

    try:
        # Create temp file with text
        with tempfile.NamedTemporaryFile(mode='w', suffix='.txt', delete=False) as f:
            f.write(text)
            temp_file = f.name

        # Use edge-tts CLI
        cmd = f'edge-tts --voice "en-US-GuyNeural" --file "{temp_file}" --write-media "{output_path}"'
        result = subprocess.run(cmd, shell=True, capture_output=True, text=True)

        # Clean up
        os.unlink(temp_file)

        if result.returncode == 0:
            return output_path
        else:
            print(f"TTS CLI error: {result.stderr}")
    except Exception as e:
        print(f"TTS error: {e}")

    # Fallback: create silent audio
    from pydub import AudioSegment
    silent = AudioSegment.silent(duration=3000)
    silent.export(output_path, format="mp3")
    return output_path

# Monkey-patch the pipeline's TTS method
if hasattr(pipeline, 'tts'):
    pipeline.tts.generate_speech = fixed_generate_speech
    print(" Patched pipeline TTS method")

# Now test again
print("\n Re-testing with fixed TTS...")
test_message = "Hello, can you introduce yourself?"
result = pipeline.process_message(test_message)

print(f"\n📋 Test Results:")
print(f"📝 Text: {result['text']}")
print(f"🎵 Audio: {result['audio']}")
print(f"🎬 Video: {result['video']}")
print(f"⏱️ Time: {result['processing_time']:.1f}s")

# Display
if os.path.exists(result['video']):
    display(Video(result['video'], embed=True, width=500))
if os.path.exists(result['audio']):
    display(Audio(result['audio'], autoplay=False))

🔧 Applying quick fix to pipeline...
✅ Patched pipeline TTS method

🧪 Re-testing with fixed TTS...

💬 Processing: Hello, can you introduce yourself?
1. 🤖 Getting chatbot response...
   Response: Hello! I'm your AI avatar assistant. Nice to meet you! 👋
2. 🗣️ Converting to speech...
   Audio saved: /content/avatar_chatbot/audio/fixed_response_1766679981.mp3
3. 🎬 Generating lip-sync video...
🎬 Generating lip-sync video...
   Face: /content/avatar_chatbot/videos/avatar_video.mp4
   Audio: /content/avatar_chatbot/audio/fixed_response_1766679981.mp3
⚠️ Wav2Lip not found. Using simple video merge.
   Video saved: /content/avatar_chatbot/output/response_20251225_162623.mp4
✅ Total processing time: 17.2 seconds

📋 Test Results:
📝 Text: Hello! I'm your AI avatar assistant. Nice to meet you! 👋
🎵 Audio: /content/avatar_chatbot/audio/fixed_response_1766679981.mp3
🎬 Video: /content/avatar_chatbot/output/response_20251225_162623.mp4
⏱️ Time: 17.2s


### CELL 11: Test the Complete System

---


In [None]:

print(" Testing the complete pipeline...")

# Test with a simple message
test_message = "Hello, can you introduce yourself?"
result = pipeline.process_message(test_message)

print(f"\n📋 Test Results:")
print(f"📝 Text: {result['text']}")
print(f"🎵 Audio: {result['audio']}")
print(f"🎬 Video: {result['video']}")
print(f"⏱️ Time: {result['processing_time']:.1f}s")

# Display the result
from IPython.display import Video, Audio, display

print("\n🎬 Generated Video:")
display(Video(result['video'], embed=True, width=500))

print("\n🎵 Generated Audio:")
display(Audio(result['audio'], autoplay=False))

print(" Test completed successfully!")

🧪 Testing the complete pipeline...

💬 Processing: Hello, can you introduce yourself?
1. 🤖 Getting chatbot response...
   Response: Hello! I'm your AI avatar assistant. Nice to meet you! 👋
2. 🗣️ Converting to speech...
   Audio saved: /content/avatar_chatbot/audio/fixed_response_1766680319.mp3
3. 🎬 Generating lip-sync video...
🎬 Generating lip-sync video...
   Face: /content/avatar_chatbot/videos/avatar_video.mp4
   Audio: /content/avatar_chatbot/audio/fixed_response_1766680319.mp3
⚠️ Wav2Lip not found. Using simple video merge.
   Video saved: /content/avatar_chatbot/output/response_20251225_163201.mp4
✅ Total processing time: 17.0 seconds

📋 Test Results:
📝 Text: Hello! I'm your AI avatar assistant. Nice to meet you! 👋
🎵 Audio: /content/avatar_chatbot/audio/fixed_response_1766680319.mp3
🎬 Video: /content/avatar_chatbot/output/response_20251225_163201.mp4
⏱️ Time: 17.0s

🎬 Generated Video:



🎵 Generated Audio:


✅ Test completed successfully!


### CELL 12: Gradio Web Interface

---


In [None]:

print(" Creating web interface...")

import gradio as gr

def create_simple_interface():
    """Create a simple Gradio interface"""

    with gr.Blocks(title="AI Avatar Chatbot", theme=gr.themes.Soft()) as demo:
        gr.Markdown("# 🤖 AI Avatar Chatbot with Lip-Sync")
        gr.Markdown("Chat with an AI that responds through an animated avatar with perfect lip-sync!")

        with gr.Row():
            # Left column - Chat
            with gr.Column():
                gr.Markdown("## 💬 Chat")

                # Chat history
                chatbot = gr.Chatbot(
                    value=[("AI", "Hello! I'm your AI avatar assistant. How can I help you?")],
                    height=300
                )

                # User input
                msg = gr.Textbox(
                    placeholder="Type your message here...",
                    label="Your Message",
                    scale=4
                )

                with gr.Row():
                    send_btn = gr.Button("Send", variant="primary")
                    clear_btn = gr.Button("Clear Chat")

                # Status
                status = gr.Textbox(
                    label="Status",
                    value="Ready",
                    interactive=False
                )

            # Right column - Avatar
            with gr.Column():
                gr.Markdown("## 🎭 Avatar")

                # Avatar video display
                avatar_video = gr.Video(
                    value=AVATAR_PATH,
                    label="Your Avatar",
                    interactive=False
                )

                # Response video
                response_video = gr.Video(
                    label="AI Response",
                    interactive=False
                )

                # Voice selection
                voice_dropdown = gr.Dropdown(
                    choices=["jenny", "guy", "aria", "sonia", "natasha"],
                    value="jenny",
                    label="Voice"
                )

                # Response text
                response_text = gr.Textbox(
                    label="AI Response Text",
                    lines=3,
                    interactive=False
                )

        # Store conversation state
        state = gr.State({
            "conversation": [("AI", "Hello! I'm your AI avatar assistant. How can I help you?")]
        })

        def process_message(user_message, voice_choice, state_dict):
            """Process user message"""
            if not user_message.strip():
                return "", state_dict, "Please enter a message", None, "", ""

            # Update voice
            pipeline.set_voice(voice_choice)

            # Process through pipeline
            result = pipeline.process_message(user_message)

            # Update conversation
            conversation = state_dict["conversation"]
            conversation.append(("You", user_message))
            conversation.append(("AI", result["text"]))

            # Update status
            status_msg = f"✅ Generated in {result['processing_time']:.1f}s"

            return "", {"conversation": conversation}, status_msg, result["video"], result["text"], conversation

        def clear_chat(state_dict):
            """Clear chat history"""
            pipeline.clear_history()
            return {"conversation": [("AI", "Chat cleared! How can I help you?")]}, [("AI", "Chat cleared! How can I help you?")]

        # Connect events
        send_btn.click(
            fn=process_message,
            inputs=[msg, voice_dropdown, state],
            outputs=[msg, state, status, response_video, response_text, chatbot]
        )

        msg.submit(
            fn=process_message,
            inputs=[msg, voice_dropdown, state],
            outputs=[msg, state, status, response_video, response_text, chatbot]
        )

        clear_btn.click(
            fn=clear_chat,
            inputs=[state],
            outputs=[state, chatbot]
        )

        # Instructions
        gr.Markdown("""
        ## 📖 How to Use

        1. **Type your message** in the text box
        2. **Press Enter** or click **Send**
        3. **Watch** the AI avatar respond with lip-sync
        4. **Change voice** to hear different speaking styles

        ## ⚡ Tips

        - First response may take 30-60 seconds
        - Keep messages concise for faster responses
        - Ensure good lighting in your avatar video for best lip-sync
        """)

    return demo

# Create and launch interface
interface = create_simple_interface()
print(" Interface created! Launching...")
interface.launch(debug=True, share=True)

  with gr.Blocks(title="AI Avatar Chatbot", theme=gr.themes.Soft()) as demo:

  chatbot = gr.Chatbot(

  chatbot = gr.Chatbot(



🌐 Creating web interface...
✅ Interface created! Launching...
Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://daacf69dd0aa789c16.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://daacf69dd0aa789c16.gradio.live


