# ✨ Generating Media with VideoDB: A Simple Guide



<a href="https://colab.research.google.com/github/video-db/videodb-cookbook/blob/main/guides/genai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This guide shows you how to easily generate music, images, sound effects, voiceovers, and video clips using the VideoDB Python SDK.

### 🎯 Objective
Learn the basic usage of VideoDB's generative functions to programmatically create various media assets from text prompts or existing videos.


#### 📦 Install VideoDB SDK
First, make sure you have the VideoDB library installed

In [None]:
!pip install videodb --upgrade

#### 🔑 Connect to VideoDB



Use your [API key](https://console.videodb.io) to connect and get your collection.


In [None]:
import videodb

# Replace with your actual API key
api_key = "YOUR_API_KEY"

# Connect and get the default collection
conn = videodb.connect(api_key=api_key)
coll = conn.get_collection()

print(f"Connected! Using collection ID: {coll.id}")


## 🚀 Generating Media Assets

Use the **coll** object to generate media.


### 🖼️ Generate Image (`generate_image`)


Create images from text.

In [None]:
from IPython.display import Image

image_prompt = "Green neon sign jellyfish photography" # Your prompt here
print(f"Generating image for: '{image_prompt}'")

# Generate image (returns an Image object)
generated_image = coll.generate_image(prompt=image_prompt)

print(f"-> Image generation started! Image ID: {generated_image.id}")

# Get the URL
image_url = generated_image.generate_url()
print(f"-> Image URL: {image_url}")


Image(url=image_url)

**⚡ Power Up generate_image**: Configruation Options Explained

*   `prompt` (str): **Required.** The text description for image generation.
*   `aspect_ratio` (Literal['1:1', '9:16', '16:9', '4:3', '3:4'] | None): *Optional.* The desired aspect ratio. Defaults to `'1:1'`.
*   `callback_url` (str | None): *Optional.* URL for completion notification. Defaults to `None`.

In [None]:
generated_image = coll.generate_image(
    prompt=image_prompt,
    aspect_ratio="9:16",  # Custom Aspect ratio
)

Image(url=generated_image.generate_url())

### 🎵 Generate Music (`generate_music`)

Create music from a text description.

In [None]:
from IPython.display import Audio

music_prompt = "Upbeat electronic background music"
print(f"Generating music for: '{music_prompt}'")

# Generate music (returns an Audio object)
generated_music = coll.generate_music(prompt=music_prompt)

# Get the URL 
music_url = generated_music.generate_url()
print(f"-> Music URL: {music_url}")

Audio(url=music_url, filename="audio.mp3")

**⚡ Power Up generate_music**: Configruation Options Explained

*   `prompt` (str): **Required.** The text description of the music.
*   `duration` (int): *Optional.* The desired duration of the music in seconds. Defaults to `5`.
*   `callback_url` (str | None): *Optional.* A URL endpoint that VideoDB will notify when generation is complete. Defaults to `None`.

In [None]:
generated_music = coll.generate_music(
    prompt=music_prompt,
    duration=10,  # Custom Duration
)

Audio(url=generated_music.generate_url(), filename="audio.mp3")

### 🔊 Generate Sound Effect (`generate_sound_effect`)


Create short sounds.


In [None]:
from IPython.display import Audio

sfx_prompt = "Generate a sound of footsteps on wet gravel, for a mystery film scene. The sound should be realistic, rhythmic, and slightly echoey, around 3 seconds long "  # Your prompt here
print(f"Generating sound effect for: '{sfx_prompt}'")

# Generate SFX (returns an Audio object)
generated_sfx = coll.generate_sound_effect(prompt=sfx_prompt, duration=5)

# Get the URL
sfx_url = generated_sfx.generate_url()
print(f"-> SFX URL: {sfx_url}")

Audio(url=sfx_url, filename="audio.mp3")

### 🗣️ Generate Voice (`generate_voice`)


Convert text to speech.


In [None]:
from IPython.display import Audio

text_to_speak = "This is an AI voice speaking. I was created using the generate_voice method in VideoDB!" # Your text here
print(f"Generating voice for: '{text_to_speak}'")

# Generate voice (returns an Audio object)
generated_voice = coll.generate_voice(text=text_to_speak)

print(f"-> Voice generation started! Audio ID: {generated_voice.id}")

# Get the URL 
voice_url = generated_voice.generate_url()
print(f"-> Voice URL : {voice_url}")

Audio(url=voice_url, filename="audio.mp3")

**⚡ Power Up generate_voice**: Configruation Options Explained

*   `text` (str): **Required.** The text to be converted to speech.
*   `voice_name` (str): *Optional.* Name of the voice to use (check VideoDB docs/console for options). Defaults to `'Default'`.
*   `config` (dict): *Optional.* Configuration dictionary for the voice generation (e.g., speed, pitch; depends on provider). Defaults to `{}`.
*   `callback_url` (str | None): *Optional.* URL for completion notification. Defaults to `None`.

**Voice Name**

| Name      | Voice Style     | Accent        | Gender         |
|-----------|------------------|---------------|----------------|
| Aria      | Expressive       | American      | Female         |
| Roger     | Confident        | American      | Male           |
| Sarah     | Soft             | American      | Young Female   |
| Laura     | Upbeat           | American      | Young Female   |
| Charlie   | Natural          | Australian    | Male           |
| George    | Warm             | British       | Middle-aged Male |
| Callum    | Intense          | Transatlantic | Male           |
| River     | Confident        | American      | Non-binary     |
| Liam      | Articulate       | American      | Young Male     |
| Charlotte | Seductive        | Swedish       | Young Female   |
| Alice     | Confident        | British       | Middle-aged Female |
| Matilda   | Friendly         | American      | Middle-aged Female |
| Will      | Friendly         | American      | Young Male     |
| Jessica   | Expressive       | American      | Young Female   |
| Eric      | Friendly         | American      | Middle-aged Male |
| Chris     | Casual           | American      | Middle-aged Male |
| Brian     | Deep             | American      | Middle-aged Male |
| Daniel    | Authoritative    | British       | Middle-aged Male |
| Lily      | Warm             | British       | Middle-aged Female |
| Bill      | Trustworthy      | American      | Old Male       |


In [None]:
generated_voice = coll.generate_voice(
    text=text_to_speak,
    voice_name="Charlotte", # Custom Voice Name
    config={
        "stability": 0.0, # Determines how stable the voice is and the randomness between each generation. Lower values introduce broader emotional range for the voice. Higher values can result in a monotonous voice with limited emotion.
        "similarity_boost": 1.0, #Determines how closely the AI should adhere to the original voice when attempting to replicate it.
        "style": 0.0, # Determines the style exaggeration of the voice. This setting attempts to amplify the style of the original speaker. It does consume additional computational resources and might increase latency if set to anything other than 0.
    },
)


Audio(url=generated_voice.generate_url(), filename="audio.mp3")

### 🎬 Generate Video (`generate_video`)

Create short video clips (5-8 seconds).

In [None]:
video_prompt = "Cinematic close-up of a majestic lion slowly rolling its head, its golden mane catching the soft afternoon sunlight on the savanna." # Your prompt here
print(f"Generating video for: '{video_prompt}'")

# Generate video (returns a Video object)
# Duration must be 5-8 seconds if specified (e.g., duration=7)
generated_video = coll.generate_video(prompt=video_prompt)


# Play the video
generated_video.play()

**Configuration Options (`generate_video`):**

*   `prompt` (str): **Required.** Text prompt for video generation.
*   `duration` (float): *Optional.* Duration in seconds. **Must be an integer value between 5 and 8 (inclusive).** Defaults to `5`. Raises `ValueError` if invalid.
*   `callback_url` (str | None): *Optional.* URL for completion notification. Defaults to `None`.

In [45]:
generated_video = coll.generate_video(
    prompt=video_prompt,
    duration=7,  # Custom Duration
)
generated_video.play()

'https://console.videodb.io/player?url=https://stream.videodb.io/v3/published/manifests/00c80d5f-ea39-4a36-a0dc-2a3559430e39.m3u8'

### 🌐 Search YouTube Videos (`youtube_search`)

Find relevant YouTube videos using the main **conn** object.

In [None]:
search_query = "learn python programming"
print(f"\nSearching YouTube for: '{search_query}'")

youtube_results = conn.youtube_search(query=search_query)

print(f"-> Found {len(youtube_results)} YouTube results:")
for i, result in enumerate(youtube_results):
    print(f"  {i+1}. {result.get('title', 'N/A')} ({result.get('link', 'N/A')})")

**⚡ Power Up youtube_search**: Configruation Options Explained

*   `query` (str): **Required.** Query string to search for on YouTube.
*   `result_threshold` (int | None): *Optional.* Maximum number of results to return. Defaults to `10`.
*   `duration` (str): *Optional.* Filter by video duration (e.g., 'short', 'medium', 'long'). Defaults to `'medium'`.

In [None]:
youtube_results = conn.youtube_search(
    query=search_query,
    result_threshold=3,  # Get top 3
    duration="long"
)

print(f"-> Found {len(youtube_results)} YouTube results:")
for i, result in enumerate(youtube_results):
    print(f"  {i+1}. {result.get('title', 'N/A')} ({result.get('link', 'N/A')})")

### 📝 Translate Video Transcripts (`translate_transcript`)

Get a translated text version of a video's spoken content.

In [None]:
upload_url = "https://www.youtube.com/watch?v=a9__D53WsUs" # Example video

print(f"\nUploading video from URL for modification: {upload_url}")
video = coll.upload(url=upload_url)
video.play()


# 2. Target_transcript_language = "fr" # Example: French
target_language_code = "fr" # Example: French 
print(f"\nTranslating transcript for video '{video.id}' into: '{target_language_code}'")

# Note: Video needs transcribed spoken words.
video.index_spoken_words()

translated_transcript = video.translate_transcript(language=target_language_code)
print("-> Transcript translation process completed.")
print(translated_transcript) # Example: View first 3 segments


**⚡ Power Up translate_transcript:** Configuration Options Explained

*   `language` (str): **Required.** Target language for the transcript translation.
*   `additional_notes` (str): *Optional.* Additional notes or context for the translation model regarding style or tone. Defaults to `''`.
*   `callback_url` (str | None): *Optional.* URL for completion notification. Defaults to `None`.

In [None]:
translated_transcript = video.translate_transcript(
    language="en",
    additional_notes="Translate the language, and give a Gen-z mordern touch",  # additional notes
)

print("-> Transcript translation process completed.")
print(translated_transcript) # Example: View first 3 segments

### 🎤 Dub Existing Videos (`dub_video`)
Translate the spoken audio of a video you've uploaded.


In [None]:
# 1. Upload a video first (if you haven't already)
upload_url = "https://www.youtube.com/watch?v=FgrO9ADPZSA" # Example video

print(f"\nUploading video from URL for modification: {upload_url}")
video = coll.upload(url=upload_url)
video.play()

# 2. Dub the uploaded video
target_language_code = "hi" # Example: German

print(f"\nDubbing video '{video.id}' into language: '{target_language_code}'")
dubbed_video = coll.dub_video(video_id=video.id, language_code=target_language_code)

print(f"-> Dubbing Done! New Video ID: {dubbed_video.id}")
dubbed_video.play()


**⚡ Power Up dub_video:** Configuration Options Explained

*   `video_id` (str): **Required.** The ID of the video in your collection to dub.
*   `language_code` (str): **Required.** Target language code (e.g., "es", "fr", "ja"). Check VideoDB documentation for supported codes.
*   `callback_url` (str | None): *Optional.* URL for completion notification. Defaults to `None`.