**After opening this notebook , please do not attempt to run it directly. Instead, click "Copy to Drive" and continue working with your own copy of the notebook.**


## Setup

The goal of this notebook is to setup the LLM model prompting that will be used during this workshop.

Please do not hesitate to contact us if you encounter any difficulties, as proper setup is essential for participating in all hands-on activities during this workshop.

---

During the whole workshop, we will use *Gemini-2.0-Flash* as LLM of our choice. We recommend you go to https://aistudio.google.com/ and click "Create API Key", which gives you a access to 1500 requests per day at 15 RPM, which is perfectly fine for our purposes.

*Warning: When using the free version of Gemini, Google can use your interactions to improve their products and models, so don't use it for any sensitive or work-related data.*

When you have your key, you can use one of the following options to make it accessible from this notebook:
1. Add the key to Google Colab's `Secrets`. You can even import it directly by clicking on `Gemini API Keys -> Import key from Google AI Studio`. That's the easiest and recommended way.
2. The other way would be to copy-paste it into this notebook and add it to the environment with `os`

---

**WARNING:** If you choose to use a different model, you need to modify the functions `generate_text` and `embed_text` to be compatible with your model's API. While we have not specifically tested these models, they should function without significant issues.

In [None]:
import os
from typing import Dict, Any, Optional, List
from google import genai
from google.colab import userdata
from google.genai import types

os.environ["GOOGLE_API_KEY"] = userdata.get("GOOGLE_API_KEY")  # alternatively paste your key here
client = genai.Client(api_key=os.getenv("GOOGLE_API_KEY"))

## Prompt the model

In [None]:
query = "How does the reach of a general-purpose AI model to thousands of business users in the EU affect its classification as having systemic risk?"

In [None]:
model = "gemini-2.0-flash"
# Generate content
response = client.models.generate_content(
      model=model,
      contents=[query],
)
print(response.text)

The reach of a general-purpose AI (GPAI) model to thousands of business users in the EU significantly increases the likelihood that it will be classified as having systemic risk under regulations like the EU AI Act. Here's a breakdown of how reach contributes to that assessment:

**1. Interconnectedness and Interdependence:**

*   **Business Operations:**  When a GPAI model is used across a large number of businesses, it becomes deeply interwoven into their operations. This creates dependencies where a failure or vulnerability in the AI system can have cascading effects, disrupting or impairing multiple organizations simultaneously. Think of it as a keystone species in an ecosystem; its removal (or failure) leads to collapse.
*   **Data Flows:** GPAI models often handle large volumes of data from diverse sources. A systemic risk arises when the AI's decisions, biases, or vulnerabilities affect the integrity, accuracy, or security of this data across a broad range of businesses.  A prob

## Prompt the model with more options

We define two helper functions `generate_text` and `embed_text` so that we quickly and conveniently can call the Gemini text generation and embedding endpoints.


In [None]:
def generate_text(
    prompt: str,
    model: str = "gemini-2.0-flash",
    temperature: Optional[float] = None,
    max_tokens: Optional[int] = None,
    system_instructions: Optional[str] = None
) -> str:
    """
    Generate text using Google's Gemini model with configurable parameters.

    Args:
        prompt: The user prompt to send to the model
        model: Model name to use (default: gemini-2.0-flash)
        temperature: Controls temperature (0.0-2.0, lower is more deterministic)
        max_tokens: Maximum number of tokens to generate
        system_instructions: Optional system instruction to guide the model

    Returns:
        Generated text response as string
    """
    try:
        # Create config with only non-None parameters
        config_params = {}
        if temperature:
            config_params["temperature"] = temperature
        if max_tokens is not None:
            config_params["max_output_tokens"] = max_tokens
        if system_instructions:
            config_params["system_instruction"] = system_instructions

        # Create the config object
        config = types.GenerateContentConfig(**config_params)

        # Generate content
        response = client.models.generate_content(
            model=model,
            contents=[prompt],
            config=config
        )

        return response.text
    except Exception as e:
        return f"Error generating text: {str(e)}"


def embed_text(
    text: str,
    model: str = "text-embedding-004"
) -> List[float]:
    """
    Generate embeddings for a text string using Google's embedding model.

    Args:
        text: The text to generate embeddings for
        model: Embedding model to use (default: text-embedding-004)

    Returns:
        List of embedding values as floats
    """
    try:
        # Generate embeddings
        response = client.models.embed_content(
            model=model,
            contents=[text]
        )

        # Return the embedding values
        return response.embeddings[0].values
    except Exception as e:
        print(f"Error generating embedding: {str(e)}")
        return []

Try to prompt the language model with more configurable parameters using the method that will be applied in the workshop.

In [None]:
answer = generate_text(query, system_instructions="Your task is to answer questions about the EU AI Act as accurately and concisely as possible.")
print(answer)

The EU AI Act doesn't directly classify general-purpose AI (GPAI) models as "systemic risk" based solely on the *number* of business users. However, widespread adoption of a GPAI model by thousands of businesses in the EU *significantly increases the likelihood* that it will be considered a GPAI model with systemic risk. Here's why:

*   **High Impact:** Widespread use means a single failure or vulnerability in the model could have cascading effects across multiple sectors and businesses, causing significant disruption to the EU market or impacting fundamental rights.
*   **Criticality:** If the GPAI model becomes deeply embedded in essential business operations across various sectors, its unavailability or malfunction could severely disrupt critical services.
*   **Network Effects:** The more businesses rely on the same GPAI model, the stronger the network effects become. This can create a single point of failure and amplify the impact of any issues with the model.
*   **Lack of Alter

In [None]:
query_vec = embed_text(query)

In [None]:
print(len(query), len(query_vec))
print(query_vec[:10])

Finally, please try to load an `example.csv` file from our shared drive, as this is how we will provide the datasets to you during the workshop.

In [None]:
!wget "https://drive.google.com/uc?export=download&id=1BNgszCBUW-D8hjNkL8BZ-TEODgvHlvcI" -O ./examples.csv

--2025-04-07 15:32:58--  https://drive.google.com/uc?export=download&id=1BNgszCBUW-D8hjNkL8BZ-TEODgvHlvcI
Resolving drive.google.com (drive.google.com)... 142.250.65.110, 2607:f8b0:4025:804::200e
Connecting to drive.google.com (drive.google.com)|142.250.65.110|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://drive.usercontent.google.com/download?id=1BNgszCBUW-D8hjNkL8BZ-TEODgvHlvcI&export=download [following]
--2025-04-07 15:32:58--  https://drive.usercontent.google.com/download?id=1BNgszCBUW-D8hjNkL8BZ-TEODgvHlvcI&export=download
Resolving drive.usercontent.google.com (drive.usercontent.google.com)... 172.217.7.33, 2607:f8b0:4025:811::2001
Connecting to drive.usercontent.google.com (drive.usercontent.google.com)|172.217.7.33|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7224 (7.1K) [application/octet-stream]
Saving to: ‘./examples.csv’


2025-04-07 15:33:00 (55.1 MB/s) - ‘./examples.csv’ saved [7224/7224]



In [None]:
import pandas as pd

df = pd.read_csv("examples.csv")

In [None]:
df

Unnamed: 0,passage,persona,query_style,query
0,"Chapter IX - POST-MARKET MONITORING, INFORMATI...",A legal consultant who specializes in technolo...,Technical language with domain-specific termin...,What authority do EU regulatory bodies have to...
1,Preamble\n\n(174)Given the rapid technological...,A journalist who covers technology trends for ...,Simple direct question with basic vocabulary,How often will the EU evaluate if their AI reg...
2,Chapter II - PROHIBITED AI PRACTICES\n\nArticl...,A municipal government official responsible fo...,Search engine keyword query without full sente...,prohibited AI social scoring systems governmen...
3,ANNEX IV\n\n(b) the design specifications of t...,A software developer specializing in machine l...,Informal conversational question with filler w...,"Hey, so um, what kind of documentation do I ne..."
