# ðŸŽ¤ Enhanced Voice Cloning with Zonos TTS - Google Colab\n\n

In [None]:
#@title 1. ðŸ“¥ Setup and Clone Repository\nimport os\nimport subprocess\nimport sys\n\nprint(\)\nprint(\ * 40)\n\n# Check if we're in Colab\ntry:\n    import google.colab\n    IN_COLAB = True\n    print(\)\nexcept ImportError:\n    IN_COLAB = False\n    print(\)\n\n# Clone the repository if it doesn't exist\nif not os.path.exists('Zonos'):\n    print(\)\n    !git clone https://github.com/Wamp1re-Ai/Zonos.git\n    print(\)\nelse:\n    print(\)\n\n# Change to the Zonos directory\n%cd Zonos\n\n# Install system dependencies\nprint(\)\n!apt-get update -qq\n!apt-get install -y espeak-ng git-lfs -qq\n!git lfs install\nprint(\)\n\n# Check for enhanced files\nif os.path.exists('enhanced_voice_cloning.py'):\n    print(\)\n    print(\)\nelse:\n    print(\)\n\nprint(\)

In [None]:
#@title 2. âš¡ Install Dependencies with UV (Ultra-Fast Installation)\nimport subprocess\nimport sys\nimport os\nimport time\n\nprint(\)\nprint(\ * 50)\n\nstart_time = time.time()\n\n# Step 1: Install UV for ultra-fast package management\nprint(\)\ntry:\n    # Check if uv is already installed\n

In [None]:
#@title 3. ðŸ¤– Load Enhanced Zonos Model\n# IMPORTANT: If you have modified the underlying Zonos Python files (e.g., zonos/model.py),\n# you MUST re-run this cell for those changes to take effect in the model.\nimport sys\nimport os\n\nprint(\)\nprint(\ * 40)\n\n# Make sure we can import zonos modules\ncurrent_dir = os.getcwd()\nif current_dir not in sys.path:\n    sys.path.insert(0, current_dir)\n\n# Check NumPy version (should be fixed by Cell 2)\nprint(\)\ntry:\n    import numpy as np\n    numpy_version = np.__version__\n

In [None]:
#@title 4. ðŸ“œ Helper Functions for Long Audio & Text Segmentation\n#@markdown This cell defines utility functions for segmenting long text and generating audio in chunks. \n#@markdown **Run this cell once after loading the model (Cell 3) and before generating speech (Cell 6).**\nimport nltk\nimport torch\nimport torchaudio \nimport time\n\ndef setup_nltk_punkt():\n    try:\n        nltk.data.find('tokenizers/punkt')\n    except nltk.downloader.DownloadError:\n        nltk.download('punkt', quiet=True)\n    print(\)\n\n

In [None]:
#@title 5. ðŸŽ¤ Upload Voice Sample for Cloning\nfrom google.colab import files\nimport torchaudio\nimport torch\nimport IPython.display as ipd\n\nprint(\)\nprint(\)\nprint(\)\nprint(\)\n\n# Upload audio file\nuploaded = files.upload()\n\nif uploaded:\n    # Get the uploaded file\n

In [None]:
#@title 6. ðŸŽ¤ Generate Speech with Enhanced Voice Cloning\nimport IPython.display as ipd\nimport torch\nimport time\n\n#@markdown ### Text and Settings\n#@markdown The system now automatically handles long input texts by breaking them into chunks for generation. This allows for much longer audio outputs.\ntext = \ #@param {type:\}\nlanguage = \ #@param [\, \, \, \, \, \, \, \

In [None]:
#@title 7. ðŸ“Š Run CFG Scale Benchmarks\n#@markdown This cell runs benchmarks with different CFG scales (1.0, 1.5, 2.2). Other generation parameters (pitch, rate, sampling) are based on the 'Balanced' preset to isolate the impact of CFG Scale.\n#@markdown - **CFG Scale 1.0**: Typically offers the fastest generation but may result in the least expressive or most robotic audio. \n#@markdown - **CFG Scale 1.5**: Used by the \ preset in Cell 5. Aims for a balance between speed and quality, though still less expressive than higher CFG scales.\n#@markdown - **CFG Scale 2.2**: Default for the \ preset in Cell 5, offering a good blend of quality and naturalness.\n#@markdown Results will show Real-Time Factor (RTF), audio duration, generation time, and allow you to listen to each sample.\n#@markdown \n#@markdown **IMPORTANT:** If `zonos/model.py` (or other underlying model code) has been changed due to updates or local modifications, you **MUST re-run Cell 3 (Load Model)** to load the new model code *before* running these benchmarks or generating audio in Cell 5.\n\nimport time\nimport torchaudio\nimport IPython.display as ipd\nimport numpy as np\nimport os\n\nbenchmark_audio_dir = \\nif not os.path.exists(benchmark_audio_dir):\n    os.makedirs(benchmark_audio_dir)\n\ndef run_benchmark_trial(text_input, language_code, seed_value, cfg_scale_to_test, quality_preset_value, \n                        speaker_embedding_tensor, voice_quality_data, \n                        zonos_model, torch_device, \n                        run_warmup=False):\n    print(f\)\n    print(f\

---\n## ðŸŽ‰ Enhanced Voice Cloning Complete!\n\nYou've successfully used the enhanced voice cloning system with Zonos TTS. This notebook provides a comprehensive suite for voice cloning, generation, and performance benchmarking.\n\n### ðŸš€ What's Enhanced & Key Features:\n- **Improved Speech Quality**: Significant reductions in gibberish, better timing consistency, and more natural speech flow.\n- **Advanced Audio Preprocessing**: Automatic silence removal and normalization for uploaded voice samples.\n- **Voice Quality Analysis**: SNR estimation and quality scoring for your voice samples to guide parameter choices.\n- **Flexible Quality Presets**: \n    - Choose from presets like \, \, \, \, and \.\n    - \ preset uses a lower CFG Scale (1.5) for quicker generation with a trade-off in expressiveness.\n    - \ and \ presets now incorporate specific emotion vectors for more vivid speech.\n- **Adaptive Settings**: Parameters automatically adjust based on your voice sample's quality and chosen preset.\n- **CFG Scale Control**: Support for `cfg_scale=1.0` (and other values) in `zonos.model.py` allows for fine-tuning the balance between speed and expressiveness. This is benchmarked in Cell 6.\n- **Long Text Handling**: Cell 5 now automatically segments very long texts and generates audio in chunks using audio prefixing for continuity.\n- **Reproducible Results**: Seed support for consistent audio generation.\n- **Google Colab Compatibility**: Streamlined setup and dependency management within the Colab environment.\n- **Benchmarking Tools**: Cell 7 allows for systematic testing of different CFG Scales to understand performance and quality trade-offs.\n\n### ðŸ’¡ Tips for Best Results:\n- Use clean, high-quality audio (16kHz+ sample rate, minimal background noise/music) for voice cloning.\n- Provide 10-20 seconds of clear speech for optimal cloning.\n- Experiment with different Quality Presets in Cell 6 to find the best match for your needs.\n- If modifying underlying code (like `zonos/model.py`), always re-run Cell 3 (Load Model) and Cell 4 (Helper Functions) to apply changes.\n\n### ðŸ”§ If You Encountered Issues:\n- **NumPy or other dependency errors**: Try `Runtime` > `Restart runtime` (or `Factory reset runtime`) then re-run cells from the beginning (Cell 1 onwards).\n- **`generate_long_audio_from_text` not defined**: Ensure Cell 4 (Helper Functions) has been run successfully.\n- **Model loading errors after code changes**: Ensure you've re-run Cell 3.\n- **Memory errors**: Try shorter text for generation or restart the runtime.\n- **Audio quality issues**: Use cleaner source audio for cloning. Experiment with different presets in Cell 6.\n\n---\n\n**ðŸŽ¤ Thank you for using Enhanced Voice Cloning with Zonos TTS!**\n\n