<a href="https://colab.research.google.com/github/Advait-Joshi-svg/Colab/blob/main/notebooks/Generate_images_with_Gemini_and_Vertex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<b>Gemini and Vertex can generate images for online sales</b>

Gemini, Google's latest and most advanced model, can help you create beautiful images with Vertex's image generation API. Use this notebook to generate images you can use for online marketing.

<b>[Required] Set up a Google Cloud account</b>

Okay so we get it, this part is hard, but in order to use the Cloud speech-to-text API you need to set up a Cloud account, project, and billing. Start [here](https://console.cloud.google.com/getting-started).

Once you've done that, come back here.

In [1]:
#@title Authenticate with Google Cloud and your project ID

import vertexai
from vertexai.preview.vision_models import Image, ImageGenerationModel

from google.colab import auth

gcp_project_id = 'advait-450518' # @param {type: "string"}

auth.authenticate_user(project_id=gcp_project_id)

vertexai.init(project=gcp_project_id)

In [2]:
#@title Configure Gemini API key

#Access your Gemini API key

import google.generativeai as genai
from google.colab import userdata

gemini_api_secret_name = 'API'  # @param {type: "string"}

try:
  GOOGLE_API_KEY=userdata.get(gemini_api_secret_name)
  genai.configure(api_key=GOOGLE_API_KEY)
except userdata.SecretNotFoundError as e:
   print(f'Secret not found\n\nThis expects you to create a secret named {gemini_api_secret_name} in Colab\n\nVisit https://makersuite.google.com/app/apikey to create an API key\n\nStore that in the secrets section on the left side of the notebook (key icon)\n\nName the secret {gemini_api_secret_name}')
   raise e
except userdata.NotebookAccessError as e:
  print(f'You need to grant this notebook access to the {gemini_api_secret_name} secret in order for the notebook to access Gemini on your behalf.')
  raise e
except Exception as e:
  # unknown error
  print(f"There was an unknown error. Ensure you have a secret {gemini_api_secret_name} stored in Colab and it's a valid key from https://makersuite.google.com/app/apikey")
  raise e

model = genai.GenerativeModel('gemini-pro')

In [3]:
! git clone https://github.com/deepseek-ai/janus

Cloning into 'janus'...
remote: Enumerating objects: 121, done.[K
remote: Counting objects: 100% (74/74), done.[K
remote: Compressing objects: 100% (38/38), done.[K
remote: Total 121 (delta 51), reused 36 (delta 36), pack-reused 47 (from 2)[K
Receiving objects: 100% (121/121), 7.19 MiB | 40.01 MiB/s, done.
Resolving deltas: 100% (57/57), done.


In [4]:
%cd janus

/content/janus


In [5]:
! pip install -e


Usage:   
  pip3 install [options] <requirement specifier> [package-index-options] ...
  pip3 install [options] -r <requirements file> [package-index-options] ...
  pip3 install [options] [-e] <vcs project url> ...
  pip3 install [options] [-e] <local project path> ...
  pip3 install [options] <archive url/path> ...

-e option requires 1 argument


In [7]:
!pip install attrdict

import os
import torch
import PIL.Image
import numpy as np
from transformers import AutoModel
from transformers import AutoModelForCausalLM
from janus.models import MultiModalityCausalLM, VLChatProcessor
from janus.utils.io import load_pil_images

Collecting attrdict
  Downloading attrdict-2.0.1-py2.py3-none-any.whl.metadata (6.7 kB)
Downloading attrdict-2.0.1-py2.py3-none-any.whl (9.9 kB)
Installing collected packages: attrdict
Successfully installed attrdict-2.0.1


In [9]:
model_path="deepseek-ai/Janus-Pro-1B"
vl_chat_processor=VLChatProcessor.from_pretrained(model_path)
tokenizer=vl_chat_processor.tokenizer
vl_gpt=AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
vl_gpt=vl_gpt.to(torch.bfloat16).cuda().eval

Some kwargs in processor config are unused and will not have any effect: add_special_token, sft_format, mask_prompt, num_image_tokens, ignore_id, image_tag. 


In [10]:
conversation = [{
    "role":"<|User|>",
    "content":"Generate a logo for Facebook. The background is pure blue, the clear text 'facebook' arrange horizontally on the image. No modification on the text. The text font is Times New Roman"
},
                {"role":"<|Assistant|>","content":""}
                ]
sft_format=vl_chat_processor.apply_sft_template_for_multi_turn_prompts(
    conversations=conversation,
    sft_format=vl_chat_processor.sft_format,
    system_prompt="",
)

prompt=sft_format + vl_chat_processor.image_start_tag

In [None]:
@torch.inference_mode()
def generate(
    mmgpt: MultiModalityCasualLM
    vl_chat_processor: VLChatProcessor,
    prompt: str,
    temperature: float=1,
    parallel_size: int =8,
    cfg_weight: float = 5,
    image_token_num_per_image: int=576,
    img_size: int =384,
    patch_size: int = 16,

):
    input_ids=vl_chat_processor.tokenizer.encode(prompt)
    i