In [None]:
!pip install --upgrade vertexai==1.70.0

In [None]:
# only required if you run this notebook as a colab notebook
from google.colab import auth
auth.authenticate_user()

In [8]:
import vertexai
from vertexai.preview.generative_models import GenerativeModel
from vertexai.preview import caching

from vertexai.generative_models import Part
import datetime

from IPython.display import display, Markdown

project_id = "sascha-playground-doit"
vertexai.init(project=project_id, location="us-central1")

In [2]:
import time
from contextlib import contextmanager

@contextmanager
def measure_time():
    start_time = time.perf_counter()
    yield
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"Elapsed time: {elapsed_time:.4f} seconds")

## Without cache for comparison

In [3]:
system_instruction = """
You are an expert video analyzer, and you answer user's query based on the video file you have access to.
Always return markdown.
"""

video = Part.from_uri(
    mime_type="video/mp4",
    uri="gs://doit-ml-demo/gemini/caching/video/Getting started with Gemini on Vertex AI.mp4")

model = GenerativeModel(
    "gemini-1.5-flash-002",
    system_instruction=[system_instruction]
  )

In [4]:
with measure_time():
  response = model.generate_content(
      [video, """provide a summary for the video"""],
  )
  print(response.usage_metadata)

I0000 00:00:1729672205.549813   68606 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported


prompt_token_count: 320997
candidates_token_count: 148
total_token_count: 321145

Elapsed time: 100.4730 seconds


In [None]:
Markdown(response.text)

## Generate Cache

In [9]:
system_instruction = """
You are an expert video analyzer, and you answer user's query based on the video file you have access to.
Always return markdown.
"""

contents = [
    Part.from_uri(
    mime_type="video/mp4",
    uri="gs://doit-ml-demo/gemini/caching/video/Getting started with Gemini on Vertex AI.mp4")
]

cached_content = caching.CachedContent.create(
    model_name="gemini-1.5-flash-002",
    system_instruction=system_instruction,
    contents=contents,
    ttl=datetime.timedelta(minutes=60),
)

cache_name = cached_content.name
print(cache_name)

I0000 00:00:1729780775.523345   68606 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
I0000 00:00:1729780824.588784   68606 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported


6040470592396722176


## Use Cache

In [10]:
cached_content = caching.CachedContent(cached_content_name=cache_name)
cached_content

I0000 00:00:1729780829.701077   68606 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported


<vertexai.caching._caching.CachedContent object at 0x104738550>: {
  "name": "projects/sascha-playground-doit/locations/us-central1/cachedContents/6040470592396722176",
  "model": "projects/sascha-playground-doit/locations/us-central1/publishers/google/models/gemini-1.5-flash-002",
  "createTime": "2024-10-24T14:39:36.578830Z",
  "updateTime": "2024-10-24T14:39:36.578830Z",
  "expireTime": "2024-10-24T15:39:36.562363Z",
  "usageMetadata": {
    "totalTokenCount": 320992,
    "textCount": 107,
    "videoDurationSeconds": 1089,
    "audioDurationSeconds": 1089
  }
}

In [11]:
model = GenerativeModel.from_cached_content(cached_content=cached_content)

with measure_time():
  response = model.generate_content("provide a summary for the video")
  print(response.usage_metadata)

I0000 00:00:1729780876.502222   68606 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported


prompt_token_count: 320997
candidates_token_count: 325
total_token_count: 321322
cached_content_token_count: 320992

Elapsed time: 68.6343 seconds


In [12]:
Markdown(response.text)

Certainly! Here's a summary of the video.

The video is a presentation on Gemini 1.5 Pro, a multimodal model with breakthrough long-context understanding. The speakers are Lewis Liu, Product Manager, Vertex AI Google Cloud, and Christopher Cho, Product Manager, Google Cloud.


The presentation covers:

- Gemini on Vertex AI: An overview of Gemini, what's new, and what's coming this year.
- Start with Vertex AI SDK: A quick journey on how to use the SDK to get started.
- Demos: A demonstration of Gemini 1.5 Pro's capabilities, including analyzing video with audio, finding an image in a video, and interacting with a PDF.
- MLOps for Generative AI: A discussion of MLOps for generative AI and the tools needed to build, experiment, deploy, and manage ML models.
- Enterprise Readiness: An explanation of Google's approach to foundational models and the security controls offered, including VPC-SC, CMEK, access transparency, and Responsible AI tooling, enablement & support.


The presentation highlights the following key features of Gemini 1.5 Pro:

- State-of-the-art, natively multimodal reasoning capabilities.
- Highly optimized serving stack.
- Built with responsibility and safety at the core.


The presentation concludes with a demonstration of the Vertex AI SDK and its ease of use. The SDK supports Python, Node.js, Java, and Go, and allows users to get started with Gemini in just four lines of code.

Hope this helps!

## Use it with your codebase

In [None]:
import git
import os
import shutil
from pathlib import Path
from vertexai.preview import tokenization


import magika
m = magika.Magika()

In [61]:
repo_dir="repo"
repo_url="https://github.com/SaschaHeyer/gen-ai-livestream"

In [62]:
#if os.path.exists(repo_dir):
#        shutil.rmtree(repo_dir)

os.makedirs(repo_dir)
git.Repo.clone_from(repo_url, repo_dir)

<git.repo.base.Repo '/Users/sascha/Desktop/development/gen-ai-livestream/context-caching/repo/.git'>

In [92]:
import os
from pathlib import Path

def extract_code(repo_dir):
    """Create an index, extract content of .py, .ipynb, and .md files."""

    code_index = []
    code_text = ""
    allowed_extensions = {'.py', '.ipynb', '.md'}  # Allowed file types
    for root, _, files in os.walk(repo_dir):
        for file in files:
            file_path = os.path.join(root, file)
            relative_path = os.path.relpath(file_path, repo_dir)

            # Check if file has an allowed extension
            if Path(file).suffix in allowed_extensions:
                code_index.append(relative_path)

                try:
                    with open(file_path, 'r', encoding='utf-8') as f:
                        code_text += f"----- File: {relative_path} -----\n"
                        code_text += f.read()
                        code_text += "\n-------------------------\n"
                except Exception:
                    pass

    return code_index, code_text


In [93]:
code_index, code_text = extract_code(repo_dir)

In [94]:
code_index

['README.md',
 'function-calling/automatic.py',
 'function-calling/dynamic.py',
 'function-calling/complete.py',
 'function-calling/simple.py',
 'grounding/grounding-own-data.py',
 'grounding/grounding-search.py',
 'podcast-automation/generate.py',
 'rag-api/rag+gemini.py',
 'rag-api/import.py',
 'rag-api/rag.py',
 'rag-api/jira.py',
 'rag-api/slack.py',
 'rag-api/ui/app.py',
 'rag-api/confluence/confluence.py',
 'rag-api/helper/list.py',
 'rag-api/helper/empty.py',
 'rag-api/helper/cleanup.py',
 'code-assistant/analyze/notebook/analyze.ipynb',
 'code-assistant/analyze/service/app.py',
 'document-processing/costs.md',
 'document-processing/multimodal.py',
 'document-processing/ui/app.py',
 'document-processing/cloud-run-service/readme.md',
 'document-processing/cloud-run-service/sample.py',
 'document-processing/cloud-run-service/main.py',
 'orchestration/README.md',
 'orchestration/services/image/main.py',
 'reranking/query_limit.py',
 'reranking/ranking.py',
 'reranking/ranking_vs_em

In [96]:
print(code_text)

----- File: README.md -----
# Generative AI Livestream

![](images/livestream.gif)

This repository is part of a live streaming series. 
Every Friday we build live and this repoistory contains all the code from the livestreams. 

📺 Get Ready to Code and Laugh Live! 
Join me every Friday* from 10 - 11:30 AM CET / 8 - 10:30 UTC for the Coding GenAI Applications Live Stream!

Watch it here:

* LinkedIn: https://www.linkedin.com/in/saschaheyer
* Twitch: https://www.twitch.tv/saschaheyer
* YouTube: https://www.youtube.com/@ml-engineer
* Kick: https://kick.com/mlengineer

## Watch the recordings
If you want to follow along you can watch the recordings. 

https://ml-engineer.dev
-------------------------
----- File: function-calling/automatic.py -----
import vertexai
from vertexai.generative_models import (
    Content,
    FunctionDeclaration,
    GenerationConfig,
    GenerativeModel,
    Tool,
    Part,
    AutomaticFunctionCallingResponder,
)

# Initialize Vertex AI
project_id = "sascha-p

In [97]:
model_name = "gemini-1.5-pro-002"
tokenizer = tokenization.get_tokenizer_for_model(model_name)

result = tokenizer.count_tokens(code_text)

print(f"{result.total_tokens = :,}")

result.total_tokens = 1,013,619


In [72]:
system_instruction = """
You are an python expert based on the code proved you answer questions.
Always return markdown.
"""

contents = [
    Part.from_text(code_text)
]

cached_content = caching.CachedContent.create(
    model_name="gemini-1.5-pro-002",
    system_instruction=system_instruction,
    contents=contents,
    ttl=datetime.timedelta(minutes=60),
)

cache_name = cached_content.name
print(cache_name)

I0000 00:00:1729172588.338975  138376 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
I0000 00:00:1729172723.061269  138376 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported


2456678412358516736


In [98]:
model = GenerativeModel.from_cached_content(cached_content=cached_content)

with measure_time():
  response = model.generate_content("provide a summary of this repository")
  print(response.usage_metadata)

I0000 00:00:1729173021.628394  138376 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported


prompt_token_count: 1039581
candidates_token_count: 552
total_token_count: 1040133
cached_content_token_count: 1039576

Elapsed time: 76.4001 seconds


In [99]:
Markdown(response.text)

This repository demonstrates several applications of generative AI using Vertex AI, focusing on different capabilities and integrations:

1. **Function Calling:** This section shows how to use function calling with Gemini models. It includes examples of defining function declarations, invoking functions, handling responses, and implementing both dynamic and automatic function calling.  The code provides simulations of API calls for order management tasks.

2. **Grounding:** This demonstrates how to ground Gemini responses using external data sources. It includes examples using Google Search and a custom Vertex AI Search data store, allowing the model to access real-time information and ground its responses in factual data.

3. **Podcast Automation:** This section showcases a more complex application, automating podcast generation from text articles. It integrates with both the Google Text-to-Speech API and the ElevenLabs API for synthesizing speech, creating a multi-speaker podcast with realistic filler words and emotional variations.

4. **RAG (Retrieval Augmented Generation):** This section focuses on using Retrieval Augmented Generation (RAG) with Gemini. It shows how to create a RAG corpus, import files, perform retrieval queries, and integrate RAG with the Gemini API.  It also includes examples using different data sources like Google Search and a local document store, enabling dynamic question answering based on the provided corpus.  There is also a Streamlit UI application for document management and querying the RAG system.

5. **Code Assistant:** This presents a code analysis application powered by Gemini. It uses a notebook interface and allows the user to interact with the code by sending prompts to the model. The model is provided with the entire codebase as context.  It can also extract and use the git diff between commits.  The Streamlit UI facilitates cloning, extracting and asking questions about code.

6. **Document Processing:** This demonstrates multimodal document processing. Code can extract structured data from PDF and image documents using Gemini, returning JSON output with a custom schema.  It also includes a Streamlit UI to upload and process documents.  A cloud run backend is created to call the model.

7. **Orchestration:** This section provides an example of orchestrating generative AI tasks using Google Cloud Workflows.  It implements a workflow to generate a recipe, including a title, description, ingredients, and an image. The workflow interacts with the Gemini API for text generation and a custom Cloud Run service for image generation, illustrating how to chain together different AI capabilities in a workflow.  A UI is created that displays the created recipes.

The repository covers a wide range of generative AI use cases, offering examples of function calling, grounding, tool integration, document processing, and workflow orchestration. The code is well-documented and provides instructions for setup and usage.
