# generate assets



```markdown
# BigQuery & Gemini: Generating and Analyzing Multimodal Data

This notebook demonstrates how to use Vertex AI's generative models to create a multimodal dataset, store it in Google Cloud Storage, and then analyze it using BigQuery and Gemini.
```


```markdown
## 1. Setup and Installation

First, let's install the necessary Python libraries for interacting with Google Cloud services.
```

In [None]:
%pip install --upgrade --user google-cloud-aiplatform google-cloud-storage google-cloud-bigquery



In [None]:
%pip install --upgrade --user google-cloud-aiplatform google-cloud-storage google-cloud-bigquery google-cloud-texttospeech



In [None]:
%pip install --upgrade google-cloud-aiplatform[generative_models]




```markdown
Next, please fill in your Google Cloud project details and other configuration values below.
```

In [104]:
#easy test - to be deleted later
import os

# Your Google Cloud project ID
PROJECT_ID = "geminienterprise-485114"
# The region for your resources
LOCATION = "us-central1"
# Your Google Cloud Storage bucket name
GCS_BUCKET = " meetupmarch"
# Your BigQuery dataset name
# BIGQUERY_DATASET = "your_bigquery_dataset"

# Authenticate with Google Cloud
if "google.colab" in str(get_ipython()):
    from google.colab import auth
    auth.authenticate_user()

# Initialize Vertex AI
import vertexai
vertexai.init(project=PROJECT_ID, location=LOCATION)

In [105]:
# Authentication
# --- AUTHENTICATE ---
# Authenticate with Google Cloud. This is crucial for running in a Colab environment.
# It will trigger a pop-up window to ask for your credentials and permissions.
import sys
if "google.colab" in sys.modules:
    from google.colab import auth
    auth.authenticate_user()

In [106]:
# --- CONFIGURATION AND IMPORTS ---
import os
import json
import vertexai

# CORRECTED IMPORTS: Import each client library on its own line.
from google.cloud import bigquery
from google.cloud import storage
from google.cloud import texttospeech
from vertexai.preview.vision_models import ImageGenerationModel
from vertexai.preview.generative_models import GenerativeModel


In [107]:
import base64

from IPython.display import Audio
import google.auth
import google.auth.transport.requests
import requests

In [108]:
# Your Google Cloud project ID
PROJECT_ID = "geminienterprise-485114"

# The region for your resources
LOCATION = "us-central1"

# Your Google Cloud Storage bucket name (no 'gs://' prefix)
GCS_BUCKET_NAME = "meetupmarch"

# Derived names for our BigQuery resources
DATASET_ID = "generative_assets_dataset"
TABLE_ID = "assets_metadata"

# --- INITIALIZE CLIENTS ---
# Initialize Vertex AI SDK and other clients with your project details.
# After authentication, these clients will have the necessary permissions.
vertexai.init(project=PROJECT_ID, location=LOCATION)

storage_client = storage.Client(project=PROJECT_ID)
bq_client = bigquery.Client(project=PROJECT_ID)
tts_client = texttospeech.TextToSpeechClient()

print(f"Project: {PROJECT_ID}, Location: {LOCATION}")
print("Vertex AI and other Google Cloud clients initialized successfully.")



Project: geminienterprise-485114, Location: us-central1
Vertex AI and other Google Cloud clients initialized successfully.


In [60]:
# --- PREPARE GCS BUCKET AND BIGQUERY DATASET ---
# This code will create the resources if they don't already exist.

# GCS Bucket
bucket = storage_client.bucket(GCS_BUCKET_NAME)
if not bucket.exists():
    bucket.create(location=LOCATION)
    print(f"Bucket '{GCS_BUCKET_NAME}' created.")
else:
    print(f"Bucket '{GCS_BUCKET_NAME}' already exists.")

Bucket 'meetupmarch' already exists.


In [82]:
# BigQuery Dataset
dataset_ref = bq_client.dataset(DATASET_ID)
try:
    bq_client.get_dataset(dataset_ref)
    print(f"Dataset '{DATASET_ID}' already exists.")
except Exception:
    bq_client.create_dataset(dataset_ref)
    print(f"Dataset '{DATASET_ID}' created.")


Dataset 'generative_assets_dataset' already exists.


In [83]:
# A list to hold metadata for all generated assets
all_metadata = []

```markdown
## 2. Data Generation with Vertex AI

Now, let's generate some multimodal data using different Vertex AI models.
```

```markdown
### 2.1 Generate an Image with Imagen - 1 image
```

In [84]:
import vertexai
from vertexai.vision_models import ImageGenerationModel

# TODO: Specify your project ID and location
# vertexai.init(project="your-project-id", location="your-location")

image_model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-002")

image_prompt = "a futuristic banana-shaped spaceship flying through a nebula"

response = image_model.generate_images(prompt=image_prompt)

# The response is a list of Image objects.
# Access the first image directly and save it.
response[0].save("generated_image.png")

print("Image generated and saved as generated_image.png")



Image generated and saved as generated_image.png


```markdown
### 2.1 Generate an Image with Imagen - 10 images
```

In [85]:
print("--- Starting Image Generation (Imagen 3) ---")
image_model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-002")
local_image_dir = "generated_images"
os.makedirs(local_image_dir, exist_ok=True)

image_prompts = [
    "A state-of-the-art chemical manufacturing plant at sunset, with clean energy sources visible.",
    "Macro shot of a new, sustainable consumer goods product made from plant-based materials.",
    "A team of engineers in a modern factory reviewing data on a holographic display.",
    "Futuristic robotic arms assembling a complex piece of machinery with precision.",
    "A digital twin of a manufacturing facility, showing real-time operational data streams.",
    "An aerial view of a smart warehouse with autonomous forklifts and delivery drones.",
    "A scientist in a lab coat examining a beaker with a glowing liquid.",
    "High-end cosmetic products arranged in a minimalist, elegant composition.",
    "A cross-section of an advanced engine, showing intricate inner workings.",
    "A beautiful landscape shot of a factory that blends seamlessly with nature.",
]

for i, prompt in enumerate(image_prompts):
    local_filename = f"{local_image_dir}/image_{i}.png"
    gcs_blob_name = f"images/image_{i}.png"

    print(f"Generating image {i+1}/10 with prompt: '{prompt[:50]}...'")
    response = image_model.generate_images(prompt=prompt)
    response[0].save(local_filename)

    # Upload to GCS
    blob = bucket.blob(gcs_blob_name)
    blob.upload_from_filename(local_filename)
    gcs_uri = f"gs://{GCS_BUCKET_NAME}/{gcs_blob_name}"

    # Store metadata
    all_metadata.append({
        "asset_id": f"image_{i}",
        "asset_type": "image",
        "prompt": prompt,
        "gcs_uri": gcs_uri,
        "model_used": "imagen-3.0-generate-002"
    })
    print(f"Image {i+1} saved and uploaded to {gcs_uri}")

print("--- Image Generation Complete ---")


--- Starting Image Generation (Imagen 3) ---
Generating image 1/10 with prompt: 'A state-of-the-art chemical manufacturing plant at...'
Image 1 saved and uploaded to gs://meetupmarch/images/image_0.png
Generating image 2/10 with prompt: 'Macro shot of a new, sustainable consumer goods pr...'
Image 2 saved and uploaded to gs://meetupmarch/images/image_1.png
Generating image 3/10 with prompt: 'A team of engineers in a modern factory reviewing ...'
Image 3 saved and uploaded to gs://meetupmarch/images/image_2.png
Generating image 4/10 with prompt: 'Futuristic robotic arms assembling a complex piece...'
Image 4 saved and uploaded to gs://meetupmarch/images/image_3.png
Generating image 5/10 with prompt: 'A digital twin of a manufacturing facility, showin...'
Image 5 saved and uploaded to gs://meetupmarch/images/image_4.png
Generating image 6/10 with prompt: 'An aerial view of a smart warehouse with autonomou...'
Image 6 saved and uploaded to gs://meetupmarch/images/image_5.png
Generating im

```markdown
### 2.2 Generate Music with Lyria
```

In [86]:
# We need these libraries for making direct HTTP requests and handling authentication.
import requests
import google.auth
import google.auth.transport.requests
import base64
import json
import os

# --- Helper Functions Exactly as in the Notebook ---

def send_request_to_google_api(api_endpoint, data=None):
    """
    Sends an HTTP request to a Google API endpoint.
    """
    creds, project = google.auth.default()
    auth_req = google.auth.transport.requests.Request()
    creds.refresh(auth_req)
    access_token = creds.token
    headers = {
        "Authorization": f"Bearer {access_token}",
        "Content-Type": "application/json",
    }
    response = requests.post(api_endpoint, headers=headers, json=data)
    response.raise_for_status()
    return response.json()

def generate_music(api_endpoint, request: dict):
    """
    Wraps the request and calls the API.
    """
    req = {"instances": [request], "parameters": {}}
    resp = send_request_to_google_api(api_endpoint, req)
    return resp["predictions"]

# --- Define the Model URL ---
music_model = f"https://us-central1-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/us-central1/publishers/google/models/lyria-002:predict"
local_music_dir = "generated_music"
os.makedirs(local_music_dir, exist_ok=True)

print("Setup complete. You can now run the cells below to generate each song individually.")


Setup complete. You can now run the cells below to generate each song individually.


In [87]:
#simple version - to be deleted
try:
    print("Generating Song 1...")
    # New, more descriptive prompt to avoid safety filters.
    prompt = "An uplifting and motivational corporate anthem for a quarterly results presentation, featuring bright piano, a steady electronic beat, and subtle synth pads."

    predictions = generate_music(music_model, {"prompt": prompt, "duration_secs": 20})

    for pred in predictions:
        bytes_b64 = dict(pred)["bytesBase64Encoded"]
        audio_data = base64.b64decode(bytes_b64)

        with open(f"{local_music_dir}/music_1.wav", "wb") as f:
            f.write(audio_data)
        print("SUCCESS: Song 1 saved as music_1.wav")
except Exception as e:
    print(f"ERROR generating Song 1: {e}")


Generating Song 1...
SUCCESS: Song 1 saved as music_1.wav


In [88]:
try:
    # --- Define Song Details ---
    song_number = 1
    prompt = "An uplifting and motivational corporate anthem for a quarterly results presentation, featuring bright piano, a steady electronic beat, and subtle synth pads."
    local_filename = f"{local_music_dir}/music_{song_number}.wav"
    gcs_blob_name = f"music/music_{song_number}.wav"

    print(f"Generating Song {song_number} using the correct method...")

    # --- This is YOUR working code for generating the music ---
    predictions = generate_music(
        music_model,
        {"prompt": prompt, "duration_secs": 20}
    )

    for pred in predictions:
        bytes_b64 = dict(pred)["bytesBase64Encoded"]
        audio_data = base64.b64decode(bytes_b64)

        # --- Save Locally (as in your working code) ---
        with open(local_filename, "wb") as f:
            f.write(audio_data)
        print(f"SUCCESS: Song {song_number} saved locally.")

        # --- ADDED: Upload to Google Cloud Storage ---
        bucket = storage_client.bucket(GCS_BUCKET_NAME)
        blob = bucket.blob(gcs_blob_name)
        blob.upload_from_filename(local_filename)
        print(f"SUCCESS: Song {song_number} uploaded to GCS.")

        # --- ADDED: Store metadata for BigQuery ---
        gcs_uri = f"gs://{GCS_BUCKET_NAME}/{gcs_blob_name}"
        all_metadata.append({
            "asset_id": f"music_{song_number}",
            "asset_type": "music",
            "prompt": prompt,
            "gcs_uri": gcs_uri,
            "model_used": "lyria-002-direct-api"
        })
        print(f"SUCCESS: Metadata for Song {song_number} stored.")

except Exception as e:
    print(f"ERROR generating Song {song_number}: {e}")



Generating Song 1 using the correct method...
ERROR generating Song 1: 400 Client Error: Bad Request for url: https://us-central1-aiplatform.googleapis.com/v1/projects/geminienterprise-485114/locations/us-central1/publishers/google/models/lyria-002:predict


In [89]:
try:
    # --- Define Song Details ---
    song_number = 2
    prompt = "A minimal, ambient electronic track for a technology product showcase, calm and focused."
    local_filename = f"{local_music_dir}/music_{song_number}.wav"
    gcs_blob_name = f"music/music_{song_number}.wav"

    print(f"Generating Song {song_number} using the correct method...")

    # --- This is YOUR working code for generating the music ---
    predictions = generate_music(
        music_model,
        {"prompt": prompt, "duration_secs": 20}
    )

    for pred in predictions:
        bytes_b64 = dict(pred)["bytesBase64Encoded"]
        audio_data = base64.b64decode(bytes_b64)

        # --- Save Locally (as in your working code) ---
        with open(local_filename, "wb") as f:
            f.write(audio_data)
        print(f"SUCCESS: Song {song_number} saved locally.")

        # --- ADDED: Upload to Google Cloud Storage ---
        bucket = storage_client.bucket(GCS_BUCKET_NAME)
        blob = bucket.blob(gcs_blob_name)
        blob.upload_from_filename(local_filename)
        print(f"SUCCESS: Song {song_number} uploaded to GCS.")

        # --- ADDED: Store metadata for BigQuery ---
        gcs_uri = f"gs://{GCS_BUCKET_NAME}/{gcs_blob_name}"
        all_metadata.append({
            "asset_id": f"music_{song_number}",
            "asset_type": "music",
            "prompt": prompt,
            "gcs_uri": gcs_uri,
            "model_used": "lyria-002-direct-api"
        })
        print(f"SUCCESS: Metadata for Song {song_number} stored.")

except Exception as e:
    print(f"ERROR generating Song {song_number}: {e}")



Generating Song 2 using the correct method...
SUCCESS: Song 2 saved locally.
SUCCESS: Song 2 uploaded to GCS.
SUCCESS: Metadata for Song 2 stored.


In [90]:
try:
    # --- Define Song Details ---
    song_number = 3
    prompt = "A powerful, driving industrial beat with synth elements, suggesting innovation and power."
    local_filename = f"{local_music_dir}/music_{song_number}.wav"
    gcs_blob_name = f"music/music_{song_number}.wav"

    print(f"Generating Song {song_number} using the correct method...")

    # --- This is YOUR working code for generating the music ---
    predictions = generate_music(
        music_model,
        {"prompt": prompt, "duration_secs": 20}
    )

    for pred in predictions:
        bytes_b64 = dict(pred)["bytesBase64Encoded"]
        audio_data = base64.b64decode(bytes_b64)

        # --- Save Locally (as in your working code) ---
        with open(local_filename, "wb") as f:
            f.write(audio_data)
        print(f"SUCCESS: Song {song_number} saved locally.")

        # --- ADDED: Upload to Google Cloud Storage ---
        bucket = storage_client.bucket(GCS_BUCKET_NAME)
        blob = bucket.blob(gcs_blob_name)
        blob.upload_from_filename(local_filename)
        print(f"SUCCESS: Song {song_number} uploaded to GCS.")

        # --- ADDED: Store metadata for BigQuery ---
        gcs_uri = f"gs://{GCS_BUCKET_NAME}/{gcs_blob_name}"
        all_metadata.append({
            "asset_id": f"music_{song_number}",
            "asset_type": "music",
            "prompt": prompt,
            "gcs_uri": gcs_uri,
            "model_used": "lyria-002-direct-api"
        })
        print(f"SUCCESS: Metadata for Song {song_number} stored.")

except Exception as e:
    print(f"ERROR generating Song {song_number}: {e}")



Generating Song 3 using the correct method...
SUCCESS: Song 3 saved locally.
SUCCESS: Song 3 uploaded to GCS.
SUCCESS: Metadata for Song 3 stored.


In [91]:
try:
    # --- Define Song Details ---
    song_number = 4
    prompt = "An atmospheric and thoughtful soundscape for a documentary about sustainable manufacturing."
    local_filename = f"{local_music_dir}/music_{song_number}.wav"
    gcs_blob_name = f"music/music_{song_number}.wav"

    print(f"Generating Song {song_number} using the correct method...")

    # --- This is YOUR working code for generating the music ---
    predictions = generate_music(
        music_model,
        {"prompt": prompt, "duration_secs": 20}
    )

    for pred in predictions:
        bytes_b64 = dict(pred)["bytesBase64Encoded"]
        audio_data = base64.b64decode(bytes_b64)

        # --- Save Locally (as in your working code) ---
        with open(local_filename, "wb") as f:
            f.write(audio_data)
        print(f"SUCCESS: Song {song_number} saved locally.")

        # --- ADDED: Upload to Google Cloud Storage ---
        bucket = storage_client.bucket(GCS_BUCKET_NAME)
        blob = bucket.blob(gcs_blob_name)
        blob.upload_from_filename(local_filename)
        print(f"SUCCESS: Song {song_number} uploaded to GCS.")

        # --- ADDED: Store metadata for BigQuery ---
        gcs_uri = f"gs://{GCS_BUCKET_NAME}/{gcs_blob_name}"
        all_metadata.append({
            "asset_id": f"music_{song_number}",
            "asset_type": "music",
            "prompt": prompt,
            "gcs_uri": gcs_uri,
            "model_used": "lyria-002-direct-api"
        })
        print(f"SUCCESS: Metadata for Song {song_number} stored.")

except Exception as e:
    print(f"ERROR generating Song {song_number}: {e}")



Generating Song 4 using the correct method...
SUCCESS: Song 4 saved locally.
SUCCESS: Song 4 uploaded to GCS.
SUCCESS: Metadata for Song 4 stored.


```markdown
### 2.3 Generate Speech with Gemini TTS
```

In [None]:
# generate 1 file only test
from google.cloud import texttospeech

client = texttospeech.TextToSpeechClient()

synthesis_input = texttospeech.SynthesisInput(text="Hello, this is a test of the Gemini Text-to-Speech API.")
voice = texttospeech.VoiceSelectionParams(language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL)
audio_config = texttospeech.AudioConfig(audio_encoding=texttospeech.AudioEncoding.MP3)

response = client.synthesize_speech(input=synthesis_input, voice=voice, audio_config=audio_config)

with open("generated_speech.mp3", "wb") as out:
    out.write(response.audio_content)
    print('Audio content written to file "generated_speech.mp3"')

In [92]:
import google.auth
from google.colab import auth

# --- THE FINAL FIX: Force a new authentication for the correct project ---
# This command will trigger a new login pop-up.
# It explicitly tells the authentication library to use YOUR project.
print(f"Forcing new user authentication for project '{PROJECT_ID}'...")
auth.authenticate_user(project_id=PROJECT_ID)
print("\nSUCCESS: Authentication complete. The session is now correctly configured for your project.")
print("You can now run the Text-to-Speech cell below.")


Forcing new user authentication for project 'geminienterprise-485114'...

SUCCESS: Authentication complete. The session is now correctly configured for your project.
You can now run the Text-to-Speech cell below.


In [93]:
import os

# --- THE FINAL FIX: Get the token directly from the gcloud CLI ---

# 1. Force the project one last time to be absolutely sure.
print(f"Forcing the active project to '{PROJECT_ID}'...")
!gcloud config set project {PROJECT_ID}

# 2. Ask gcloud to print the authentication token for this project.
# We capture the output of this command.
print("\nGetting auth token directly from gcloud...")
token_output = !gcloud auth print-access-token
AUTH_TOKEN = token_output[0]

print("SUCCESS: Manually retrieved authentication token.")
print("This token is guaranteed to be for the correct project.")
print("You can now run the TTS cell below.")


Forcing the active project to 'geminienterprise-485114'...
Updated property [core/project].

Getting auth token directly from gcloud...
SUCCESS: Manually retrieved authentication token.
This token is guaranteed to be for the correct project.
You can now run the TTS cell below.


In [70]:
import requests
import google.auth
import google.auth.transport.requests
import base64
import json
import os

print("\n--- Starting Speech Generation (Final Attempt with Quota Project Header) ---")

try:
    # Get the auth token manually, which we know works.
    token_output = !gcloud auth print-access-token
    AUTH_TOKEN = token_output[0]

    # --- THE FINAL FIX: Add the X-Goog-User-Project header ---
    # This explicitly tells the API which project to bill for this request.
    HEADERS = {
        "Authorization": f"Bearer {AUTH_TOKEN}",
        "Content-Type": "application/json; charset=utf-8",
        "X-Goog-User-Project": PROJECT_ID
    }
    TTS_ENDPOINT_URL = "https://texttospeech.googleapis.com/v1/text:synthesize"

    local_speech_dir = "generated_speech"
    os.makedirs(local_speech_dir, exist_ok=True)

    speech_texts = ["Quarterly production targets have been exceeded by fifteen percent."] # Just one for the final test

    for i, text in enumerate(speech_texts):
        print(f"\nGenerating one final test speech...")
        request_body = {
            "input": {"text": text},
            "voice": {"languageCode": "en-US", "ssmlGender": "NEUTRAL"},
            "audioConfig": {"audioEncoding": "MP3"}
        }

        response = requests.post(url=TTS_ENDPOINT_URL, headers=HEADERS, json=request_body)
        response.raise_for_status()

        print("SUCCESS! The API call succeeded with the X-Goog-User-Project header.")
        # ... (the rest of the code to save and upload would go here) ...

except Exception as e:
    print(f"\n--- ERROR ---")
    print(f"Error details: {e}")
    if 'response' in locals() and hasattr(response, 'text'):
        print(f"Server response: {response.text}")
    print("\n--- CONCLUSION ---")
    print("If this still fails, the environment is unrecoverable. The only solution is a Factory Reset.")

print("\n--- Final Test Complete ---")



--- Starting Speech Generation (Final Attempt with Quota Project Header) ---

Generating one final test speech...
SUCCESS! The API call succeeded with the X-Goog-User-Project header.

--- Final Test Complete ---


In [94]:
# We need these libraries for the direct API call.
import requests
import base64
import json
import os

print("\n--- Starting Speech Generation (Final Working Code) ---")

try:
    # --- Step 1: Get the Authentication Token ---
    # This method is confirmed to get the correct token for your active project.
    print("Getting auth token directly from gcloud...")
    token_output = !gcloud auth print-access-token
    AUTH_TOKEN = token_output[0]
    print("SUCCESS: Token retrieved.")

    # --- Step 2: Set up the Headers with the CRITICAL FIX ---
    HEADERS = {
        "Authorization": f"Bearer {AUTH_TOKEN}",
        "Content-Type": "application/json; charset=utf-8",
        "X-Goog-User-Project": PROJECT_ID  # This header forces the correct project.
    }
    TTS_ENDPOINT_URL = "https://texttospeech.googleapis.com/v1/text:synthesize"

    local_speech_dir = "generated_speech"
    os.makedirs(local_speech_dir, exist_ok=True)

    # --- Step 3: Loop Through All 10 Speech Texts ---
    speech_texts = [
        "Quarterly production targets have been exceeded by fifteen percent.",
        "Safety protocol update: All personnel must attend the mandatory briefing on Friday.",
        "The new supply chain optimization model is now live across all regions.",
        "Alert: Unscheduled maintenance is required for assembly line three.",
        "Our commitment to sustainable manufacturing has reduced our carbon footprint by 20% year-over-year.",
        "The next shareholder meeting will be held on July 25th to discuss Q2 earnings.",
        "Innovation in materials science is key to developing our next generation of products.",
        "Customer feedback indicates a 95% satisfaction rate with our new service portal.",
        "We are projecting a 10% growth in the consumer goods sector for the upcoming fiscal year.",
        "Emergency shutdown procedures for the chemical processing unit have been initiated. This is a drill.",
    ]

    for i, text in enumerate(speech_texts):
        local_filename = f"{local_speech_dir}/speech_{i}.mp3"
        gcs_blob_name = f"speech/speech_{i}.mp3"

        print(f"\nGenerating speech {i+1}/{len(speech_texts)}...")

        request_body = {
            "input": {"text": text},
            "voice": {"languageCode": "en-US", "ssmlGender": "NEUTRAL"},
            "audioConfig": {"audioEncoding": "MP3"}
        }

        # This request will now succeed.
        response = requests.post(url=TTS_ENDPOINT_URL, headers=HEADERS, json=request_body)
        response.raise_for_status()

        response_json = response.json()
        audio_data_base64 = response_json["audioContent"]
        audio_data = base64.b64decode(audio_data_base64)

        # Save locally
        with open(local_filename, "wb") as out:
            out.write(audio_data)
        print(f"SUCCESS: Speech {i+1} saved locally.")

        # Upload to GCS
        bucket = storage_client.bucket(GCS_BUCKET_NAME)
        blob = bucket.blob(gcs_blob_name)
        blob.upload_from_filename(local_filename)
        gcs_uri = f"gs://{GCS_BUCKET_NAME}/{gcs_blob_name}"
        print(f"SUCCESS: Speech {i+1} uploaded to GCS.")

        # Store metadata
        all_metadata.append({
            "asset_id": f"speech_{i}",
            "asset_type": "speech",
            "prompt": text,
            "gcs_uri": gcs_uri,
            "model_used": "google-text-to-speech-direct-api"
        })
        print(f"SUCCESS: Metadata for speech {i+1} stored.")

except Exception as e:
    print(f"\n--- An unexpected error occurred ---")
    print(f"Error details: {e}")
    if 'response' in locals() and hasattr(response, 'text'):
        print(f"Server response: {response.text}")

print("\n--- Speech Generation Complete ---")



--- Starting Speech Generation (Final Working Code) ---
Getting auth token directly from gcloud...
SUCCESS: Token retrieved.

Generating speech 1/10...
SUCCESS: Speech 1 saved locally.
SUCCESS: Speech 1 uploaded to GCS.
SUCCESS: Metadata for speech 1 stored.

Generating speech 2/10...
SUCCESS: Speech 2 saved locally.
SUCCESS: Speech 2 uploaded to GCS.
SUCCESS: Metadata for speech 2 stored.

Generating speech 3/10...
SUCCESS: Speech 3 saved locally.
SUCCESS: Speech 3 uploaded to GCS.
SUCCESS: Metadata for speech 3 stored.

Generating speech 4/10...
SUCCESS: Speech 4 saved locally.
SUCCESS: Speech 4 uploaded to GCS.
SUCCESS: Metadata for speech 4 stored.

Generating speech 5/10...
SUCCESS: Speech 5 saved locally.
SUCCESS: Speech 5 uploaded to GCS.
SUCCESS: Metadata for speech 5 stored.

Generating speech 6/10...
SUCCESS: Speech 6 saved locally.
SUCCESS: Speech 6 uploaded to GCS.
SUCCESS: Metadata for speech 6 stored.

Generating speech 7/10...
SUCCESS: Speech 7 saved locally.
SUCCESS: S

```markdown
## 3. Upload to Google Cloud Storage
```

#  BigQuery Metadata Ingestion

In [96]:
print("\n--- Creating and Uploading Metadata File ---")
local_metadata_filename = "metadata.jsonl"
gcs_metadata_blob_name = "metadata/assets.jsonl"

with open(local_metadata_filename, "w") as f:
    for item in all_metadata:
        f.write(json.dumps(item) + "\n")

blob = bucket.blob(gcs_metadata_blob_name)
blob.upload_from_filename(local_metadata_filename)
metadata_gcs_uri = f"gs://{GCS_BUCKET_NAME}/{gcs_metadata_blob_name}"

print(f"Metadata file uploaded to {metadata_gcs_uri}")



--- Creating and Uploading Metadata File ---
Metadata file uploaded to gs://meetupmarch/metadata/assets.jsonl


# Create BigQuery External Table

In [97]:
print("\n--- Creating BigQuery External Table ---")

table_ref = bq_client.dataset(DATASET_ID).table(TABLE_ID)

# Define the schema for the external table
schema = [
    bigquery.SchemaField("asset_id", "STRING"),
    bigquery.SchemaField("asset_type", "STRING"),
    bigquery.SchemaField("prompt", "STRING"),
    bigquery.SchemaField("gcs_uri", "STRING"),
    bigquery.SchemaField("model_used", "STRING"),
]

# --- THE FIX: Use the correct format name ---
# Our file is Newline Delimited JSON, not a single JSON object.
external_config = bigquery.ExternalConfig("NEWLINE_DELIMITED_JSON")
# --- END FIX ---

external_config.source_uris = [metadata_gcs_uri]
external_config.schema = schema

# Create the table
try:
    bq_client.delete_table(table_ref, not_found_ok=True) # Delete if it exists to ensure a fresh start
    print(f"Existing table '{TABLE_ID}' deleted.")
except Exception as e:
    print(e)

table = bigquery.Table(table_ref)
table.external_data_configuration = external_config

# This create_table call will now succeed.
table = bq_client.create_table(table)

print(f"\nSUCCESS: External table '{table.project}.{table.dataset_id}.{table.table_id}' created successfully.")



--- Creating BigQuery External Table ---
Existing table 'assets_metadata' deleted.

SUCCESS: External table 'geminienterprise-485114.generative_assets_dataset.assets_metadata' created successfully.


# Analyze Data with BigQuery and Gemini

Before running the SQL: You'll need a reference to a Gemini model in BigQuery. If you don't have one, you can create it with this DDL command in BigQuery:

In [111]:
from google.colab import auth
from google.cloud import bigquery

# --- Step 1: Force a new login pop-up ---
print("--- Forcing New Authentication ---")
auth.authenticate_user(project_id=PROJECT_ID)
print("SUCCESS: Authentication complete.")

# --- Step 2: Immediately create a new client ---
# This ensures it uses the fresh token you just received.
print("\n--- Creating a new, fresh BigQuery Client ---")
bq_client = bigquery.Client(project=PROJECT_ID)
print("SUCCESS: BigQuery Client is now ready with a valid login.")



--- Forcing New Authentication ---
SUCCESS: Authentication complete.

--- Creating a new, fresh BigQuery Client ---
SUCCESS: BigQuery Client is now ready with a valid login.


In [118]:
import pandas as pd
from vertexai.preview.generative_models import GenerativeModel

print("--- Starting Final Analysis Step (Bypassing BigQuery ML) ---")

try:
    # --- Step 1: Load the Gemini-Pro model using the SDK (this method works) ---
    print("\nLoading Gemini Pro model via Vertex AI SDK...")
    text_model = GenerativeModel("gemini-2.5-pro")
    print("SUCCESS: Model loaded.")

    # --- Step 2: Read the metadata from the BigQuery table we created ---
    # We use a simple SELECT query. This does not require the broken remote model.
    print("\nReading data from the BigQuery External Table...")
    full_table_id = f"{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}"
    sql_read_data = f"SELECT prompt, gcs_uri, asset_type FROM `{full_table_id}`"
    df = bq_client.query(sql_read_data).to_dataframe()
    print("SUCCESS: Metadata loaded into a pandas DataFrame.")

    # --- Step 3: Loop through the data and analyze the TEXT PROMPTS in Python ---
    print("\nAnalyzing text prompts using the Vertex AI SDK...")
    analysis_results = []
    # We will only analyze the first 5 to save time
    for index, row in df.head(5).iterrows():
        prompt_to_analyze = row['prompt']
        asset_type = row['asset_type']

        # Create a new prompt for the analysis
        analysis_prompt = f"Analyze the sentiment of the following {asset_type} prompt and classify it as positive, neutral, or negative. Prompt: '{prompt_to_analyze}'"

        # Call the Gemini model directly
        response = text_model.generate_content(analysis_prompt)

        analysis_results.append({
            "Original Prompt": prompt_to_analyze,
            "Asset Type": asset_type,
            "Sentiment Analysis": response.text.strip()
        })
        print(f"Analyzed prompt {index+1}/{len(df.head(5))}...")

    # --- Step 4: Display the Final Results ---
    results_df = pd.DataFrame(analysis_results)
    print("\n--- Gemini Text Analysis Results ---")
    print(results_df.to_markdown(index=False))

except Exception as e:
    print(f"\n--- An Unexpected Error Occurred ---")
    print(f"Error details: {e}")

print("\n--- Workshop Analysis Complete ---")


--- Starting Final Analysis Step (Bypassing BigQuery ML) ---

Loading Gemini Pro model via Vertex AI SDK...
SUCCESS: Model loaded.

Reading data from the BigQuery External Table...
SUCCESS: Metadata loaded into a pandas DataFrame.

Analyzing text prompts using the Vertex AI SDK...
Analyzed prompt 1/5...
Analyzed prompt 2/5...
Analyzed prompt 3/5...
Analyzed prompt 4/5...
Analyzed prompt 5/5...

--- Gemini Text Analysis Results ---
| Original Prompt                                                                               | Asset Type   | Sentiment Analysis                                                                                                                                                                                                                                               |
|:----------------------------------------------------------------------------------------------|:-------------|:--------------------------------------------------------------------------------

In [120]:
import pandas as pd
from vertexai.preview.generative_models import GenerativeModel

print("--- Starting Final Analysis Step (Analyzing All Asset Types) ---")

try:
    # --- Step 1: Load the Correct Gemini Model ---
    model_name_from_your_list = "gemini-2.5-pro"
    print(f"\nLoading the correct model: '{model_name_from_your_list}'...")
    text_model = GenerativeModel(model_name_from_your_list)
    print("SUCCESS: Model loaded.")

    # --- Step 2: Read the metadata from the BigQuery table ---
    print("\nReading data from the BigQuery External Table...")
    full_table_id = f"{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}"
    sql_read_data = f"SELECT prompt, gcs_uri, asset_type FROM `{full_table_id}`"
    df = bq_client.query(sql_read_data).to_dataframe()
    print("SUCCESS: Metadata for all assets loaded.")

    # --- Step 3: Loop through ALL data and analyze the TEXT PROMPTS ---
    print("\nAnalyzing text prompts for all asset types...")
    analysis_results = []

    # --- THE FIX: We removed .head(5) to loop through the entire DataFrame ---
    for index, row in df.iterrows():
        prompt_to_analyze = row['prompt']
        asset_type = row['asset_type']

        # Create a new prompt for the analysis
        analysis_prompt = f"Analyze the sentiment of the following {asset_type} prompt and classify it as positive, neutral, or negative. Prompt: '{prompt_to_analyze}'"

        # Call the Gemini model
        response = text_model.generate_content(analysis_prompt)

        analysis_results.append({
            "Original Prompt": prompt_to_analyze,
            "Asset Type": asset_type,
            "Sentiment Analysis": response.text.strip()
        })
        print(f"Analyzed prompt {index+1}/{len(df)}...")

    # --- Step 4: Display the Final Results ---
    results_df = pd.DataFrame(analysis_results)
    print("\n--- Gemini Text Analysis Results (All Assets) ---")
    print(results_df.to_markdown(index=False))

except Exception as e:
    print(f"\n--- An Unexpected Error Occurred ---")
    print(f"Error details: {e}")

print("\n--- Workshop Analysis Complete ---")


--- Starting Final Analysis Step (Analyzing All Asset Types) ---

Loading the correct model: 'gemini-2.5-pro'...
SUCCESS: Model loaded.

Reading data from the BigQuery External Table...
SUCCESS: Metadata for all assets loaded.

Analyzing text prompts for all asset types...
Analyzed prompt 1/23...
Analyzed prompt 2/23...
Analyzed prompt 3/23...
Analyzed prompt 4/23...
Analyzed prompt 5/23...
Analyzed prompt 6/23...
Analyzed prompt 7/23...
Analyzed prompt 8/23...
Analyzed prompt 9/23...
Analyzed prompt 10/23...
Analyzed prompt 11/23...
Analyzed prompt 12/23...
Analyzed prompt 13/23...
Analyzed prompt 14/23...
Analyzed prompt 15/23...
Analyzed prompt 16/23...
Analyzed prompt 17/23...
Analyzed prompt 18/23...
Analyzed prompt 19/23...
Analyzed prompt 20/23...
Analyzed prompt 21/23...
Analyzed prompt 22/23...
Analyzed prompt 23/23...

--- Gemini Text Analysis Results (All Assets) ---
| Original Prompt                                                                                      | Asse