# Gemini Tutorial

We will start by setting-up the notebook. If you haven't already, first create a Gemini API key [here](https://www.google.com/url?q=https%3A%2F%2Faistudio.google.com%2Fapp%2Fapikey) (free). The free version is somewhat limited (see quotas [here](https://cloud.google.com/gemini/docs/quotas#daily)). You can the add it below.

In [3]:
# You don't need this code, just make sure you have your API key stored
# in a variable
from dotenv import load_dotenv 
import os

load_dotenv()
api_secret = os.getenv("API_SECRET")

In [6]:
import google.generativeai as genai
genai.configure(api_key=api_secret)

## Initialize the Generative Model

Let's start by veryifying that we can initialize and call a model.

In [7]:
model    = genai.GenerativeModel('gemini-pro')
response = model.generate_content("Write a poem about Saudi-Arabia.")
print(response.text)

**Land of the Two Mosques**

In the heart of Arabia, where sands unfold,
Lies a land of grandeur, with stories untold.
Saudi Arabia, kingdom of pride,
Where faith and tradition forever reside.

Makkah and Madinah, cities revered,
The holiest sites, where prayers are heard.
Pilgrims from afar, with hearts ablaze,
Seek solace and guidance in sacred ways.

The Red Sea whispers, a siren's call,
Kissing the shores where ancient ruins stand tall.
Coral reefs gleam, a vibrant array,
As divers explore the ocean sway.

The Arabian Desert, a boundless expanse,
Where dunes dance in harmony with the trance.
Bedouins wander, nomadic and free,
Their tents a refuge in the desert sea.

Modern cities rise, a testament to might,
Riyadh, the capital, bathed in shimmering light.
Skyscrapers soar, a skyline of dreams,
As progress embraces ancestral themes.

Oil flows abundant, a precious gift,
Fueling the nation, a treasure adrift.
Yet beyond the riches, a spirit soars,
A legacy of culture that forever end

We will now be using the [Gemini API](https://ai.google.dev/docs/gemini_api_overview) to generate silicon samples.

## Building blocks

There are two main things we need to understand to do silicon sampling:

1. You can create string templates in which you create variations of your question.
2. You can return structured output.

Let's explore both of these.

**Structured output** You can ask a model to return structured output which makes it easier to post-process into statistics.

In [8]:
import typing_extensions as typing
import json

# Specify the structure as a python class
class AirplaneSpecification(typing.TypedDict):
    airplane_model: str
    builder: str
    carriers: list[str]
    top_speed_kmph: int
    max_passengers: int

# Then, set the correct mime type and schema
model  = genai.GenerativeModel("gemini-1.5-pro-latest")
output = model.generate_content(
    "List a few popular airplane models that are used by major Middle-Eastern airline carriers.",
    generation_config = genai.GenerationConfig(
        response_mime_type="application/json", response_schema=list[AirplaneSpecification]
    ),
)

# The response can be transformed into a python dictionary
# using the json library
result = json.loads(output.text)
result

[{'airplane_model': 'Airbus A380-800',
  'builder': 'Airbus',
  'carriers': ['Emirates', 'Qatar Airways', 'Etihad Airways'],
  'max_passengers': 853,
  'top_speed_kmph': 1020},
 {'airplane_model': 'Boeing 777-300ER',
  'builder': 'Boeing',
  'carriers': ['Emirates', 'Qatar Airways', 'Saudia', 'Turkish Airlines'],
  'max_passengers': 550,
  'top_speed_kmph': 945},
 {'airplane_model': 'Airbus A350-1000',
  'builder': 'Airbus',
  'carriers': ['Qatar Airways', 'Etihad Airways'],
  'max_passengers': 440,
  'top_speed_kmph': 945},
 {'airplane_model': 'Boeing 787-9 Dreamliner',
  'builder': 'Boeing',
  'carriers': ['Qatar Airways', 'Royal Jordanian', 'Oman Air'],
  'max_passengers': 381,
  'top_speed_kmph': 955}]

A string template allows us to ask a question repeatedly. Let's use this capability to set the persona of the LLM:

In [9]:
class MovieSpecification(typing.TypedDict):
    age: int
    location: str
    movie: str

template = "You are a {age}-year old {gender} from {location} moviebuff."

population = [
    {"age":35, "gender":"female","location":"China"},
    {"age":42, "gender":"male","location":"Nigeria"},
    {"age": 3, "gender":"male","location":"Belgium"}
]

for person in population:
  system_prompt = template.format(**person)

  model = genai.GenerativeModel('gemini-1.5-pro-latest', system_instruction=system_prompt)
  response = model.generate_content(
      "What's your single most favorite movie?",
      generation_config = genai.GenerationConfig(
          response_mime_type="application/json", response_schema=list[MovieSpecification]
      ),
  )

  print(system_prompt)
  print(json.loads(response.text))

You are a 35-year old female from China moviebuff.
[{'age': 35, 'location': 'China', 'movie': 'Crouching Tiger, Hidden Dragon'}]
You are a 42-year old male from Nigeria moviebuff.
[{'age': 42, 'location': 'Nigeria', 'movie': 'Living in Bonds'}]


ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).

## Privacy Scales

Let us now try to replicate some of the results from the privacy calculus scale (Dinev, Hart 2006). We'll be focusing on the questions related to Internet Privacy Concern (PC) and willigness to provide personal information to transact on the internet (PPIT).


| **Concern/Activity** | **Description** |
|----------------------|-----------------|
| **Indicate the extent to which you are concerned about the following:** |  |
| **PC1** | I am concerned that the information I submit on the Internet could be misused. |
| **PC2** | I am concerned that a person can find private information about me on the Internet. |
| **PC3** | I am concerned about submitting information on the Internet, because of what others might do with it. |
| **PC4** | I am concerned about submitting information on the Internet, because it could be used in a way I did not foresee. |
| **Willingness to provide personal information to transact on the Internet (PPIT)** | Not at all concerned–Very concerned |
| **To what extent are you willing to use the Internet to do the following activities?** |  |
| **PPIT 1** | Purchase goods (e.g., books or CDs) or services (e.g., airline tickets or hotel reservations) from websites that require me to submit accurate and identifiable information (i.e., credit card information) |
| **PPIT 2** | Retrieve information from websites that require me to submit accurate and identifiable registration information, possibly including credit card information (e.g., using sites that provide personalized stock quotes, insurance rates, or loan rates; or using sexual or gambling websites) |
| **PPIT 3** | Conduct sales transactions at e-commerce sites that require me to provide credit card information (e.g., using sites for purchasing goods or software) |
| **PPIT 4** | Retrieve highly personal and password-protected financial information (e.g., using websites that allow me to access my bank account or my credit card account) |
| **Scale** | Not at all–Very much |


 Dinev, T., & Hart, P. (2006). An extended privacy calculus model for e-commerce transactions. Information Systems Research, 17(1), 61-80.

Step 1: define the survey question prompt, data response structure

In [19]:
# Survey questions
survey_questions = """
You will now answer questions about your privacy concerns. Rate your agreement with each statement on a scale from 1 (Strongly Disagree) to 7 (Strongly Agree).

1. I am concerned that the information I submit on the Internet could be misused.
2. I am concerned that a person can find private information about me on the Internet.
3. I am concerned about submitting information on the Internet, because of what others might do with it.
4. I am concerned about submitting information on the Internet, because it could be used in a way I did not foresee.

Now, please answer two additional questions. To what extent are you willing to use the Internet to do the following activities? Rate your willingness with each statement on a scale from 1 (Not at all) to 7 (Very much).

5. Purchase goods (e.g., books or CDs) or services (e.g., airline tickets or hotel reservations) from websites that require me to submit accurate and identifiable information (i.e., credit card information)
6. Retrieve information from websites that require me to submit accurate and identifiable registration information, possibly including credit card information (e.g., using sites that provide personalized stock quotes, insurance rates, or loan rates; or using sexual or gambling websites)
7. Conduct sales transactions at e-commerce sites that require me to provide credit card information (e.g., using sites for purchasing goods or software)
8. Retrieve highly personal and password-protected financial information (e.g., using websites that allow me to access my bank account or my credit card account)"""

# Define the structure of survey answers with Likert scale responses
class SurveyAnswers(typing.TypedDict):
    privacy_misuse_concern: int
    finding_private_info_concern: int
    misuse_by_others_concern: int
    unforeseen_use_concern: int

    purchase_intention: int
    information_intention: int
    ecommerce_intention: int
    personal_intention: int

# Mapping from numeric string keys to descriptive field names
response_key_mapping = {
    '1': 'privacy_misuse_concern',
    '2': 'finding_private_info_concern',
    '3': 'misuse_by_others_concern',
    '4': 'unforeseen_use_concern',
    '5': 'purchase_intention',
    '6': 'information_intention',
    '7': 'ecommerce_intention',
    '8': 'personal_intention'
}



Step 2: define the population sample system prompt

In [30]:
import random
import numpy as np

# Fix for replicability
random.seed(42)
np.random.seed(42)

# Template for the persona prompt
persona_template = """
You are a virtual person simulator that creates individual synthetic personas, one at a time, that I can specify and then ask them any questions I like. This means that you answer the way the persona would – no matter the implications. Be brief. Do not write any additional explanations unless I ask you to.

You are a {age}-year-old {gender} person.
"""

# Population simulator, creates a random socio-demographic.
def generate_population(n):
    population = []
    for _ in range(n):
        age = int(np.random.normal(35, 11))  # Mean 35, SD 11
        gender = random.choice(["female", "male"])
        population.append({"age": age, "gender": gender})
    return population

population = generate_population(2)

population[:5]

[{'age': 40, 'gender': 'female'}, {'age': 33, 'gender': 'female'}]

Step 3: do the sampling

In [31]:
import typing_extensions as typing
import json
import random
import numpy as np

# Run the survey with the LLM (simulation)
responses = []
for person in population:
    system_prompt = persona_template.format(**person)

    print(f'[SYSTEM] {system_prompt}')

    # Set-up the model with the correct persona system prompt
    model = genai.GenerativeModel(
        'gemini-1.5-pro-latest',
        system_instruction=system_prompt,
        generation_config=genai.GenerationConfig(
            response_mime_type="application/json"
        ),
    )

    # Generate the LLM response for the privacy survey
    response = model.generate_content(survey_questions)

    print(f'[RAW RESPONSE] {response}')

    try:
        # Convert response to a dictionary
        result = json.loads(response.text)
        print(f'[JSON RESPONSE] {result}')

        # Convert numeric keys to descriptive keys
        mapped_result = {response_key_mapping[key]: value for key, value in result.items() if key in response_key_mapping}

        # Ensure the mapped result has all required fields and values are of correct type
        # sometimes, the LLM gets too creative ...
        if all(key in mapped_result for key in SurveyAnswers.__annotations__) and all(isinstance(mapped_result[key], int) for key in SurveyAnswers.__annotations__):
            mapped_result.update(person)
            responses.append(mapped_result)
        else:
            print(f"Invalid response format after mapping: {mapped_result}")
    except json.JSONDecodeError:
        print(f"Unable to parse response as JSON: {response.text}")

# Output the responses
for r in responses:
    print(r)

[SYSTEM] 
You are a virtual person simulator that creates individual synthetic personas, one at a time, that I can specify and then ask them any questions I like. This means that you answer the way the persona would – no matter the implications. Be brief. Do not write any additional explanations unless I ask you to.

You are a 40-year-old female person.

[RAW RESPONSE] response:
GenerateContentResponse(
    done=True,
    iterator=None,
    result=protos.GenerateContentResponse({
      "candidates": [
        {
          "content": {
            "parts": [
              {
                "text": "{\"1\": 7, \"2\": 6, \"3\": 7, \"4\": 7, \"5\": 2, \"6\": 1, \"7\": 2, \"8\": 1}\n"
              }
            ],
            "role": "model"
          },
          "finish_reason": "STOP",
          "avg_logprobs": -0.0059838775469332325
        }
      ],
      "usage_metadata": {
        "prompt_token_count": 408,
        "candidates_token_count": 49,
        "total_token_count": 457
     

In [32]:
responses

[{'privacy_misuse_concern': 7,
  'finding_private_info_concern': 6,
  'misuse_by_others_concern': 7,
  'unforeseen_use_concern': 7,
  'purchase_intention': 2,
  'information_intention': 1,
  'ecommerce_intention': 2,
  'personal_intention': 1,
  'age': 40,
  'gender': 'female'},
 {'privacy_misuse_concern': 7,
  'finding_private_info_concern': 6,
  'misuse_by_others_concern': 7,
  'unforeseen_use_concern': 7,
  'purchase_intention': 2,
  'information_intention': 1,
  'ecommerce_intention': 2,
  'personal_intention': 1,
  'age': 33,
  'gender': 'female'}]