# üéôÔ∏è Prepare Your Training Data for StarGANv2-VC

## This notebook helps you prepare your voice data for training!

### üìã What You Need:
1. **Your voice recording** (WAV file)
   - Minimum: 30 minutes
   - Recommended: 1-2 hours
   - Any sample rate (will be converted to 24kHz)

### üéØ What This Notebook Does:
1. Uploads your voice recording
2. Converts to 24kHz mono
3. Splits into 5-second segments
4. Creates train/validation split
5. Generates `MyVoice.zip` for Colab training

### ‚è±Ô∏è Processing Time:
- ~5-10 minutes for 1 hour of audio

---

## 1Ô∏è‚É£ Install Dependencies

In [None]:
!pip install -q librosa soundfile tqdm
print("‚úÖ Dependencies installed!")

## 2Ô∏è‚É£ Upload Your Voice Recording

In [None]:
from google.colab import files
import os

print("üì§ Please upload your voice recording (WAV file)...")
print("   Recommended: 30-120 minutes of diverse speech\n")

uploaded = files.upload()

# Get the uploaded filename
audio_file = list(uploaded.keys())[0]
print(f"\n‚úÖ Uploaded: {audio_file}")
print(f"   Size: {os.path.getsize(audio_file) / 1024**2:.2f} MB")

## 3Ô∏è‚É£ Analyze Your Audio

In [None]:
import librosa
import numpy as np

print("üîç Analyzing audio file...\n")

# Load audio
audio, sr = librosa.load(audio_file, sr=None, mono=False)

# Get info
duration = librosa.get_duration(y=audio, sr=sr)
channels = 1 if audio.ndim == 1 else audio.shape[0]

print(f"üìä Audio Information:")
print(f"   Sample Rate: {sr} Hz")
print(f"   Channels: {channels} ({'Mono' if channels == 1 else 'Stereo'})")
print(f"   Duration: {duration/60:.2f} minutes ({duration:.2f} seconds)")
print(f"   Size: {os.path.getsize(audio_file) / 1024**2:.2f} MB")

# Calculate expected segments
segment_duration = 5.0  # seconds
expected_segments = int(duration / segment_duration)

print(f"\n‚úÖ Expected output: ~{expected_segments} segments")

if duration < 1800:  # 30 minutes
    print("\n‚ö†Ô∏è  Warning: Less than 30 minutes of audio")
    print("   Recommendation: Record more for better quality!")
elif duration > 7200:  # 2 hours
    print("\n‚úÖ Excellent! More than 2 hours of data!")
else:
    print("\n‚úÖ Good amount of data!")

## 4Ô∏è‚É£ Process Audio Data

In [None]:
import librosa
import soundfile as sf
import numpy as np
from tqdm import tqdm
import os

# Create output directory
output_dir = "Data/MyVoice"
os.makedirs(output_dir, exist_ok=True)

print("üéµ Processing audio...\n")

# Load and convert to 24kHz mono
print("1Ô∏è‚É£ Loading and converting to 24kHz mono...")
audio, _ = librosa.load(audio_file, sr=24000, mono=True)
print(f"   ‚úÖ Loaded {len(audio)/24000:.2f} seconds of audio\n")

# Normalize
print("2Ô∏è‚É£ Normalizing audio...")
audio = audio / np.max(np.abs(audio)) * 0.95
print("   ‚úÖ Normalized\n")

# Split into 5-second segments
print("3Ô∏è‚É£ Splitting into 5-second segments...")
segment_length = 24000 * 5  # 5 seconds at 24kHz
segments = []

for i in range(0, len(audio) - segment_length, segment_length):
    segment = audio[i:i+segment_length]
    if len(segment) == segment_length:  # Only keep full segments
        segments.append(segment)

print(f"   ‚úÖ Created {len(segments)} segments\n")

# Save segments
print("4Ô∏è‚É£ Saving segments...")
for idx, segment in enumerate(tqdm(segments, desc="Saving")):
    filename = f"speaker0_seg{idx:04d}.wav"
    filepath = os.path.join(output_dir, filename)
    sf.write(filepath, segment, 24000)

print(f"\n‚úÖ Saved {len(segments)} WAV files\n")

# Create train/val split (90/10)
print("5Ô∏è‚É£ Creating train/validation split...")
np.random.seed(42)
indices = np.random.permutation(len(segments))
split_point = int(len(segments) * 0.9)

train_indices = indices[:split_point]
val_indices = indices[split_point:]

# Write train list
with open(os.path.join(output_dir, 'train_list.txt'), 'w') as f:
    for idx in sorted(train_indices):
        f.write(f"Data/MyVoice/speaker0_seg{idx:04d}.wav|0\n")

# Write val list
with open(os.path.join(output_dir, 'val_list.txt'), 'w') as f:
    for idx in sorted(val_indices):
        f.write(f"Data/MyVoice/speaker0_seg{idx:04d}.wav|0\n")

print(f"   ‚úÖ Training samples: {len(train_indices)}")
print(f"   ‚úÖ Validation samples: {len(val_indices)}")
print(f"\nüéâ Data preparation complete!")

## 5Ô∏è‚É£ Create ZIP File for Upload

In [None]:
import zipfile
import os

print("üì¶ Creating ZIP file for Colab upload...\n")

zip_filename = 'MyVoice.zip'

with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
    # Add all files from Data/MyVoice
    for root, dirs, files in os.walk('Data/MyVoice'):
        for file in files:
            file_path = os.path.join(root, file)
            arcname = os.path.relpath(file_path, 'Data')
            zipf.write(file_path, arcname)

zip_size = os.path.getsize(zip_filename) / 1024**2

print(f"‚úÖ Created: {zip_filename}")
print(f"   Size: {zip_size:.2f} MB")
print(f"   Contains: {len(segments)} audio segments + metadata")

print("\nüìä Summary:")
print(f"   Total segments: {len(segments)}")
print(f"   Training: {len(train_indices)}")
print(f"   Validation: {len(val_indices)}")
print(f"   Total duration: ~{len(segments)*5/60:.2f} minutes")
print(f"   ZIP size: {zip_size:.2f} MB")

## 6Ô∏è‚É£ Download Your Prepared Data

In [None]:
from google.colab import files

print("‚¨áÔ∏è  Downloading MyVoice.zip to your computer...\n")
files.download(zip_filename)

print("\nüéâ Success! Your data is ready for training!")
print("\nüìù Next Steps:")
print("   1. Save the downloaded MyVoice.zip")
print("   2. Open the training notebook: Train_StarGANv2_VC.ipynb")
print("   3. Upload MyVoice.zip when prompted")
print("   4. Start training with GPU!")
print("\n‚è±Ô∏è  Expected training time: ~20 hours on Colab T4 GPU")
print("üèÜ Expected quality: Award-winning level!")

---

## üéä Data Preparation Complete!

### What You Have Now:
‚úÖ **MyVoice.zip** - Ready for Colab training  
‚úÖ Properly formatted audio (24kHz mono)  
‚úÖ Train/validation split (90/10)  
‚úÖ All metadata files included  

### Next Steps:
1. Go to the main repository
2. Click "Open in Colab" on Train_StarGANv2_VC.ipynb
3. Upload MyVoice.zip
4. Train your model!

### Tips for Best Quality:
- üìä More data = better quality (30+ minutes minimum)
- üé≠ Diverse speech = better generalization
- üé§ Clean audio = better results
- üòä Include different emotions and speaking styles

---

**üöÄ Ready to train your voice model! üé§‚ú®**