# AISG7: Day3 - MultiModal AI with Gemini

This guide will walk you through using Google AI Studio with Gemini. You'll learn how to set up and interact with this powerful AI model, understand its capabilities and learn best practices for using it effectively.

## Goals

By the end of this guide, you should:

- Have working access to Gemini in Google AI Studio
- Understand Gemini's capabilities
- Be able to make basic API calls
- Test out one multimodal aspect of Gemini

** Please remember to replace your AI Studio API key in the .env file! **

Notes: You might need a google cloud account; there is $300 free credits for each new user and a generous free tier.   
For our API calls below, we are using the new Gemini 2.0-Flash experimental model which is not billed.

In [None]:
!pip install -r requirements.txt

In [None]:
import google.generativeai as genai
from dotenv import load_dotenv
import os

In [None]:
# Load environment variables from .env file
load_dotenv()

# Access the API key from the environment variable
api_key = os.getenv("GOOGLE_API_KEY")

# Initialize the generativeAI client using AI Studio key
genai.configure(api_key=api_key)

In [None]:
model1=genai.GenerativeModel("gemini-2.0-flash-exp")

print("Your message to Gemini:")
msg = input()
print("Sending message to Gemini...")

# Generate text using the Gemini model

response = model1.generate_content(msg)

print(response.text)


In [None]:
# Gemini to tell a joke

response = model1.generate_content(
"""
Tell me a joke, but do not explain why it is funny. 
Please place a carriage return after each sentence and ensure readibility.
Use this as a starting point:
OpenAI, Gemini and Claude are in a plane ..."""
)

print(response.text)

## Exploring Multimodal Capabilities with Gemini
Gemini is not just a text-based model; it can also process and generate images. Here's how you can explore its multimodal capabilities:

**Image Classification / Captioning**

You can provide an image to Gemini and ask it to generate a caption describing the image. This showcases Gemini's ability to understand visual content.

**Image Generation**

You can ask Gemini to generate images based on a text description. This demonstrates its ability to translate textual concepts into visual representations.  
Please note that the Imagen3 API for image generation is still in beta and not publically available.

**Code execution**

You can ask Gemini to generate and execute code.

there are more capabilities, including audio understanding and video understanding.

In [None]:
import httpx
import os
import base64

# image captioning
model = genai.GenerativeModel(model_name = "gemini-2.0-flash-exp")
image_path = "https://upload.wikimedia.org/wikipedia/commons/thumb/b/b6/Felis_catus-cat_on_snow.jpg/1024px-Felis_catus-cat_on_snow.jpg"

image = httpx.get(image_path)

prompt = "Caption this image."
response = model.generate_content([{'mime_type':'image/jpeg', 'data': base64.b64encode(image.content).decode('utf-8')}, prompt])

print(response.text)

In [None]:
# code generation and execution

response = model.generate_content(
    ('What is the sum of the first 50 prime numbers? '
    'Generate and run code for the calculation, and make sure you get all 50.'),
    tools='code_execution')

print(response.text)

## And thats it folks 👏

You have successfully :
- used an API key from AI Studio and sent Gemini a handful of prompts
- utilised multimodal capabilities of Gemini 2.0

To find out more go to the docs for Gemini Python SDK
[https://ai.google.dev/]

Now the world is your oyster - get building and show us what you come up with!!!