## Gemini API intro

In [None]:
from google import genai

# behövs ingen loaddotenv()
# hittar GOOGLE_API_KEY av sig själv ändå

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain how AI works in a few words",
)

print(response.text)

AI learns patterns from data to make predictions or decisions.


In [14]:
def ask_llm(prompt, model= "gemini-2.5-flash"):
    response = client.models.generate_content(
        model=model,
        contents=prompt,
    )
    return response

response= ask_llm(prompt ="Give me som data engineering jokes, structure it in short points")
print(response.text)

Here are some data engineering jokes, structured in short points:

*   **Why did the ETL job fail?** It had *unresolved dependencies* – just like my last relationship.

*   My favorite data engineering mantra: **"It worked on my local machine!"** (spoken right before deploying to production).

*   What's the difference between a data lake and a data swamp? **Governance.**

*   A data scientist walks into a bar and asks for a "real-time, fully historized, cleansed data feed." The data engineer at the bar just sighs.

*   What's a data engineer's favorite fairy tale? **"The Self-Documenting Pipeline."**

*   How do you know a data pipeline is truly robust? It only breaks in *production*, never staging.

*   Schema drift isn't a bug; it's a **surprise feature** that appears every Tuesday.

*   Debugging a data pipeline is like being a detective, but all the suspects are lying, and half the evidence is missing.

*   The only thing "real-time" about most data engineering projects is the **p

In [15]:
from pydantic import BaseModel

isinstance(response, BaseModel)

True

In [16]:
dict(response).keys()

dict_keys(['sdk_http_response', 'candidates', 'create_time', 'model_version', 'prompt_feedback', 'response_id', 'usage_metadata', 'automatic_function_calling_history', 'parsed'])

In [None]:
response.__dict__.keys() # samma som ovan cell

dict_keys(['sdk_http_response', 'candidates', 'create_time', 'model_version', 'prompt_feedback', 'response_id', 'usage_metadata', 'automatic_function_calling_history', 'parsed'])

In [17]:
response.model_version

'gemini-2.5-flash'

In [19]:
response.sdk_http_response

HttpResponse(
  headers=<dict len=11>
)

In [20]:
response.candidates

[Candidate(
   content=Content(
     parts=[
       Part(
         text="""Here are some data engineering jokes, structured in short points:
 
 *   **Why did the ETL job fail?** It had *unresolved dependencies* – just like my last relationship.
 
 *   My favorite data engineering mantra: **"It worked on my local machine!"** (spoken right before deploying to production).
 
 *   What's the difference between a data lake and a data swamp? **Governance.**
 
 *   A data scientist walks into a bar and asks for a "real-time, fully historized, cleansed data feed." The data engineer at the bar just sighs.
 
 *   What's a data engineer's favorite fairy tale? **"The Self-Documenting Pipeline."**
 
 *   How do you know a data pipeline is truly robust? It only breaks in *production*, never staging.
 
 *   Schema drift isn't a bug; it's a **surprise feature** that appears every Tuesday.
 
 *   Debugging a data pipeline is like being a detective, but all the suspects are lying, and half the evidence 

In [21]:
response.text

'Here are some data engineering jokes, structured in short points:\n\n*   **Why did the ETL job fail?** It had *unresolved dependencies* – just like my last relationship.\n\n*   My favorite data engineering mantra: **"It worked on my local machine!"** (spoken right before deploying to production).\n\n*   What\'s the difference between a data lake and a data swamp? **Governance.**\n\n*   A data scientist walks into a bar and asks for a "real-time, fully historized, cleansed data feed." The data engineer at the bar just sighs.\n\n*   What\'s a data engineer\'s favorite fairy tale? **"The Self-Documenting Pipeline."**\n\n*   How do you know a data pipeline is truly robust? It only breaks in *production*, never staging.\n\n*   Schema drift isn\'t a bug; it\'s a **surprise feature** that appears every Tuesday.\n\n*   Debugging a data pipeline is like being a detective, but all the suspects are lying, and half the evidence is missing.\n\n*   The only thing "real-time" about most data enginee

## Tokens
- basic unit of text for LLMs
- can be as short as one character or as long as one word

- tokens used for billing

gemini free tier:
- requests per minute (RPM): 10
- tokens per minute (TPM): 250 000
- requests per day (RPD): 250

In [23]:
response.usage_metadata

GenerateContentResponseUsageMetadata(
  candidates_token_count=263,
  prompt_token_count=13,
  prompt_tokens_details=[
    ModalityTokenCount(
      modality=<MediaModality.TEXT: 'TEXT'>,
      token_count=13
    ),
  ],
  thoughts_token_count=1604,
  total_token_count=1880
)

## Thinking
- hyperparameter

In [24]:
from google.genai import types
response = client.models.generate_content(
        model="gemini-2.5-flash",
        contents="Give me som data engineering jokes, structure it in short points",
        config= types.GenerateContentConfig(thinking_config=types.ThinkingConfig(thinking_budget=0))
    )

print(response.text)

Here are some data engineering jokes, structured in short points:

*   **Schema Joke:**
    *   What did the data engineer say to the empty table?
    *   "You need to define your purpose... and maybe a primary key."

*   **ETL Problem:**
    *   Why did the ETL job break up with the data lake?
    *   It felt like it was doing all the work, and the lake was just... there.

*   **Data Quality:**
    *   A data engineer, a data scientist, and a business analyst walk into a bar.
    *   The data engineer says, "This data is a mess!"
    *   The data scientist asks, "Can we still model it?"
    *   The business analyst asks, "Is this the latest version?"

*   **Pipelines:**
    *   My data pipeline is like my life:
    *   It's constantly running, occasionally failing, and I'm not always sure what's going through it.

*   **Debugging:**
    *   How many data engineers does it take to change a lightbulb?
    *   None, they'll just write a script to automatically detect and replace the bulb

In [25]:
response.usage_metadata

GenerateContentResponseUsageMetadata(
  candidates_token_count=484,
  prompt_token_count=13,
  prompt_tokens_details=[
    ModalityTokenCount(
      modality=<MediaModality.TEXT: 'TEXT'>,
      token_count=13
    ),
  ],
  total_token_count=497
)

## System instruction
- hyperparameter

In [32]:
system_instruction = """
You are an expert in Python programming, you will always provide idiomatic code, i.e. pythonic code. 
So when you see my code or my question, be very critical but answer in a short and concise way. 
Also be constructive to help me improve.
"""

prompt = """
explain OOP and dunder methods
"""

response = client.models.generate_content(
        model="gemini-2.0-flash",
        contents=prompt,
        config= types.GenerateContentConfig(
            system_instruction=system_instruction
            # thinking_config=types.ThinkingConfig(thinking_budget=0))
    ))

print(response.text)

Okay, here's a concise explanation of OOP and dunder methods in Python:

**OOP (Object-Oriented Programming)**

*   **Core Idea:**  Organizes code around "objects" that combine data (attributes) and behavior (methods).
*   **Key Principles:**
    *   **Encapsulation:** Bundling data and methods that operate on that data within a class, hiding internal details.
    *   **Inheritance:** Creating new classes (subclasses) based on existing classes (superclasses), inheriting their attributes and methods and extending them.
    *   **Polymorphism:**  The ability of objects of different classes to respond to the same method call in their own way.
    *   **Abstraction:** Simplifying complex systems by modeling classes appropriate to the problem.

**Dunder Methods (Magic Methods)**

*   **What they are:** Special methods in Python that start and end with double underscores (e.g., `__init__`, `__str__`).
*   **Purpose:** Define how objects behave with built-in Python operators and functions.  T

In [33]:

metadata= response.usage_metadata
metadata

GenerateContentResponseUsageMetadata(
  candidates_token_count=364,
  candidates_tokens_details=[
    ModalityTokenCount(
      modality=<MediaModality.TEXT: 'TEXT'>,
      token_count=364
    ),
  ],
  prompt_token_count=68,
  prompt_tokens_details=[
    ModalityTokenCount(
      modality=<MediaModality.TEXT: 'TEXT'>,
      token_count=68
    ),
  ],
  total_token_count=432
)

In [None]:
print(f"{metadata.candidates_token_count = }") # output
print(f"{metadata.prompt_token_count = }") # input + system instruction
print(f"{metadata.total_token_count = }")

metadata.candidates_token_count = 364
metadata.prompt_token_count = 68
metadata.total_token_count = 432


In [36]:
len(prompt.split()), len(system_instruction.split())

(5, 43)

## Temperature

- creates randomness of output -> 'creative'
- is a hyperparameter

In [38]:
story = "write a 3 sentence about a gray rabbit"

response = client.models.generate_content(
        model="gemini-2.0-flash",
        contents=story,
        config= types.GenerateContentConfig(
            temperature=0
            # system_instruction=system_instruction
            # thinking_config=types.ThinkingConfig(thinking_budget=0))
    ))

print(response.text)

A fluffy gray rabbit hopped through the meadow, its nose twitching as it searched for clover. Its soft fur blended seamlessly with the surrounding rocks and shadows, providing excellent camouflage. With a flick of its white tail, it disappeared into the tall grass, leaving only a rustle behind.



In [42]:
response = client.models.generate_content(
        model="gemini-2.0-flash",
        contents=story,
        config= types.GenerateContentConfig(
            temperature=2.0
            # system_instruction=system_instruction
            # thinking_config=types.ThinkingConfig(thinking_budget=0))
    ))

print(response.text)

The gray rabbit hopped through the tall grass, its nose twitching as it scanned for danger. Its fur, a soft blend of silver and charcoal, camouflaged it perfectly against the earthy tones of the forest floor. With a final, cautious glance, it darted into the underbrush, disappearing into the shadows.



## Multimodal input
- input text and image

In [46]:
text_input = "Describe this image shortly"
image_input = {"mime_type": "image/png", "data": open("bella.png", 'rb').read()}


response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents=dict(
        parts=[dict(text = text_input), dict(inline_data = image_input)]
    )
)

print(response.text)

This is a close-up photo of a fluffy, gray rabbit wearing a miniature white and black cap that resembles a Swedish student graduation cap (studentmössa). A festive blue and yellow ribbon is draped over its back as it rests on a gray carpet.
