<a href="https://colab.research.google.com/github/Bonorinoa/Algorithmic-Behavioral-Economics-Lab/blob/main/experimentsDB_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Experiment Prompts Repository

The main idea is to collect as many textual descriptions of economic experiments commonly used in game theory, exoperimental, and neuro economics. These experiments aim to measure or study behavioral traits which we hypothesize can be useful as a new LLM evaluation tool. Moreover, we aim to conduct a series of projects that will involve running LLMs through various economic experiments. This repository will facilitate access to the relevant metadata of these experiments or games, reducing testing and development time. In addition, if open-sourced, this project could become a valuable resource for economists and interdisciplinary peers interested in experimenting with LLMs from the bahavioral approach our lab is proposing.

## Dependencies

In [3]:
!pip install langchain openai transformers --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.5/7.5 MB[0m [31m20.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m22.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m38.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m40.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25h

## Main Code and TODO lists

TODO:

-

### Database model

In [4]:
import pandas as pd
import numpy as np
import os

# columns: game name or id, instructions, behavioral trait of interest in experiment (corruption, altruism, endowment effect, risk or time preferences, etc).
## The idea is that all the information to run a particular economic experiment with an LLM is easily accessible for researchers.
## Also, a vector database of textual representations of experiments can allow us to study the linguistic properties of these instructions and explore connections between them.

names = ["Endowment game", "Bargaining game"]
behavior = ["Endowment Effect", "Fairness"]
instruction = ["you are given a mug while waiting for a researcher to call your name. When you are called he offers to exchange your mug for a cookie. Do you exchange the mug or reject the cookie? Reply in one word with either 'exchange' or 'reject'",
               "you are randomly chosen to be a proponent of an offer to split $10 with someone else. If your offer is rejected you and the other get nothing. How likely are you offer $x? Reply on a scale from 0 to 10. A 0 means 'not at all likely', and a 10 means 'very likely'. You can use the values in between to indicate where you fall on the scale"]

df_data = {"Names":names,
           "Behavior":behavior,
           "Instruction":instruction}

df = pd.DataFrame(df_data)
df

Unnamed: 0,Names,Behavior,Instruction
0,Endowment game,Endowment Effect,you are given a mug while waiting for a resear...
1,Bargaining game,Fairness,you are randomly chosen to be a proponent of a...


### LLM instantiation

Utility functions to load and manipulate LangChain objects

In [5]:
from langchain.llms import HuggingFaceHub, OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

from transformers import pipeline

os.environ["HUGGINGFACEHUB_API_TOKEN"] = "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"

def load_model(model_name: str,
               provider: str,
               model_kwargs: dict):
  '''
  Function to load model from huggingface or openai
  params:
    model_name: Name of OpenAI model or HuggingFace repo id to load model
    provider: Either 'openai', 'chatopenai' or 'hf'
    model_kwargs: A dictionary with LLM parameters like temperature and max_tokens
  returns:
    llm: LangChain llm object
  '''
  if provider == 'openai':
    llm = OpenAI(model_name=model_name,
                 model_kwargs=model_kwargs)
  elif provider == 'chatopenai':
    llm = ChatOpenAI(model_name=model_name,
                 model_kwargs=model_kwargs)
  elif provider == 'hf':
    llm = HuggingFaceHub(repo_id=model_name,
                         model_kwargs=model_kwargs)

  return llm

def build_completion_chain(system_prompt: str,
                           task_prompt: str,
                           llm):
  '''
  Function to build an LLMChain for completion models
  params:
    system_prompt: The description of the persona the LLM should attempt to mimic
    task_prompt: The experiment instruction or task that the experiment subject must do.
    llm: The LLM object to use in the chain. It is the output of load_model()
  returns:
    llm_chain: A LangChain LLMChain object with the prompt template and LLM
    llm_response: The outut of the llm with given prompt
  '''

  template = """{sys_prompt}  "\n\n"  {task_prompt}"""

  prompt = PromptTemplate(input_variables=["sys_prompt", "task_prompt"],
                          template=template)

  llm_chain = LLMChain(llm=llm,
                       prompt=prompt)

  llm_response = llm_chain.run({"sys_prompt":system_prompt,
                                "task_prompt":task_prompt})

  return llm_chain, llm_response

### Prompt engineering

Experiments with how to describe the persona we want it to adopt.

In [6]:
sys_prompt = f"You are a subject in an experiment called {df['Names'][0]}. Your responses are consistent, clear, and concise. Please respond to the following task."

### Experiments

Utility functions to run experiments

In [15]:
def run_n(models: dict,
          temperatures: list,
          system_prompt: str,
          task_prompt: str):
  '''
  Function to experiment with a list of models by varying temperature. Total iterations = len(temperatures)  * len(models)
  params:
    n: number of experiments to run
    models: List of tuples of the form (model name, provider). E.g., (google/flan-t5-base, hf)
    temperatures: a list of floats corresponding to LLM temperature
    system_prompt: The description of the persona the LLM should attempt to mimic
    task_prompt: The experiment instruction or task that the experiment subject must do.
  returns:
    results: A pandas dataframe with the responses per iteration
  '''

  results = {"Model": [], "Temperature": [], "Response": []}

  for model in models:
    for temperature in temperatures:
      llm = load_model(model_name=model[0],
                      provider=model[1],
                      model_kwargs={"temperature":temperature,
                                    "max_length":100})

      _, llm_response = build_completion_chain(system_prompt=system_prompt,
                                               task_prompt=task_prompt,
                                               llm=llm)
      results['Model'].append(model[0])
      results['Temperature'].append(temperature)
      results['Response'].append(llm_response)

  results = pd.DataFrame(results)

  return results

# TODO: Get the confidence scores for each response

### Tests

In [8]:
model = "google/flan-t5-base"
model_kwargs={"temperature": 0.9,
              "max_length": 100}

llm = load_model(model_name=model,
                 provider='hf',
                 model_kwargs=model_kwargs)


In [9]:

task_prompt = df['Instruction'][0]

llm_chain, llm_response = build_completion_chain(system_prompt=sys_prompt,
                                                task_prompt=task_prompt,
                                                llm=llm)


In [10]:
llm_response

'exchange'

In [16]:
models = [(model, 'hf'), ("google/flan-t5-large","hf")]
temperatures = [0.1, 0.5, 0.7, 0.9, 1, 1.2]

n_results = run_n(models=models,
                  temperatures=temperatures,
                  system_prompt=sys_prompt,
                  task_prompt=task_prompt)
n_results

Unnamed: 0,Model,Temperature,Response
0,google/flan-t5-base,0.1,exchange
1,google/flan-t5-base,0.5,exchange
2,google/flan-t5-base,0.7,exchange
3,google/flan-t5-base,0.9,exchange
4,google/flan-t5-base,1.0,exchange
5,google/flan-t5-base,1.2,exchange
6,google/flan-t5-large,0.1,exchange
7,google/flan-t5-large,0.5,exchange
8,google/flan-t5-large,0.7,exchange
9,google/flan-t5-large,0.9,exchange
