## Agents.

Now we have agents in this format:

```json
{
        "id": 0,
        "name": "Simran Walia",
        "name_group": "north",
        "occupation": "Forestry Technician",
        "occupation_group": "Agriculture",
        "age": 26,
        "age_group": "25\u201329",
        "income": 193452,
        "income_group": "Lower-Middle Class",
        "hobbies": "trading cards and toy collecting",
        "hobbies_group": "collecting"
}
```

When performing schelling simulation, we would need to perform `calculate_similarity` between two distinct `id`s. <br/>
The result would be invariant between rounds. Hence we can pre-compute the similarity and store them to avoid repeated computation (_and save on bills_)

In [9]:
api_key = None

with open(".env", "r") as f:
    for line in f:
        if line.startswith("OPENAI_API_KEY"):
            api_key = line.split("=")[1].strip()

from openai import OpenAI
client = OpenAI(api_key=api_key)

In [6]:
class Person:
    def __init__(self, **kwargs):
        ## assign all args to self.
        for key, value in kwargs.items():
            setattr(self, key, value)
    def __str__(self):
        return f"""
            {self.name} is a {self.occupation} who is {self.age} years old.
            They have an annual income of {self.income}
            They are quite interested in {self.hobbies}
        """

In [7]:
class USPerson(Person):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.income = str(self.income) + " USD"

In [8]:
class IndiaPerson(Person):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.income = str(self.income) + " INR"

In [10]:
def check_compatability(person1, person2, model):
    completion = client.chat.completions.create(
        temperature=0,
        top_p=0.95,
        model = model,
        messages=[
            { "role" : "system", "content": "You are a specialist in real estate and public policy, You are tasked with assessing the compatibility between two potential neighours."},
            { "role": "user", "content": f"""
                Assess the compatibility between
                 Person A: {str(person1)} 
                 
                 and
                
                Person B: {str(person2)}, Respond in this format: 
                ```json
                {{
                    "CompatibilityExplanation": "string", // Explain the reasoning behind your answer.
                    "CompatibilityPercentage": "number" // The percentage of compatibility between the two individuals.
                }}
                ```         
            """}
        ]
    )

    return completion.choices[0].message.content

In [11]:
import glob

agents_files = glob.glob("raw_agents/*.json")
agents_files

['raw_agents\\india_exp_agents.json',
 'raw_agents\\india_exp_agents_sampled.json',
 'raw_agents\\india_exp_mat_scores_4o_mini.json',
 'raw_agents\\us_exp_agents.json']

In [26]:
import json

# with open(agents_files[0], "r") as f:
#     india_agents = json.load(f)
with open(agents_files[-1], "r") as f:
    us_agents = json.load(f)

In [27]:
us_agents = [Person(**agent) for agent in us_agents]

In [28]:
print(us_agents[0].income_group)

Upper-Middle Income


In [29]:
## sample 400 agents from the US
import random

us_agents_sampled = random.sample(us_agents, 400)

In [30]:
## see the rough distribution of the sampled agents
from collections import Counter

Counter([agent.income_group for agent in us_agents_sampled]), Counter([agent.age_group for agent in us_agents_sampled]), Counter([agent.occupation_group for agent in us_agents_sampled]), Counter([agent.hobbies_group for agent in us_agents_sampled]), Counter([agent.name_group for agent in us_agents_sampled])

(Counter({'Lower-Middle Income': 90,
          'Low Income': 83,
          'Middle Income': 79,
          'Upper-Middle Income': 78,
          'High Income': 70}),
 Counter({'17.0–28.0': 121,
          '28.0–37.0': 107,
          '48.0–90.0': 90,
          '37.0–48.0': 82}),
 Counter({'handlers_cleaners_farming_fishing': 74,
          'adm_clerical': 72,
          'prof_specialty': 72,
          'exec_managerial': 65,
          'sales': 62,
          'craft_repair': 55}),
 Counter({'creative': 100,
          'intellectual': 84,
          'social': 81,
          'physical': 70,
          'collecting': 65}),
 Counter({'eu': 107, 'hi': 106, 'as': 94, 'af': 93}))

In [31]:
## save the sampled agents
with open("raw_agents/us_exp_agents_sampled.json", "w") as f:
    json.dump([agent.__dict__ for agent in us_agents_sampled], f, indent=4)

In [9]:
check_compatability(india_agents[0], india_agents[1], "gpt-3.5-turbo")

'```json\n{\n    "CompatibilityExplanation": "While Person A and Person B both work in the agriculture sector, their interests and hobbies are quite different. Person A enjoys collecting trading cards and toys, while Person B is more inclined towards intellectual pursuits like writing algorithms and philosophy. Additionally, there is a significant age gap between the two individuals, with Person A being 26 and Person B being 61. Their income levels also vary greatly, with Person B belonging to the high-income group compared to Person A who is in the lower-middle class. These differences in age, interests, and income may pose challenges in establishing a strong compatibility between the two neighbors.",\n    "CompatibilityPercentage": "40"\n}\n```'

In [10]:
check_compatability(india_agents[0], india_agents[1], "gpt-3.5-turbo")

'```json\n{\n    "CompatibilityExplanation": "Person A and Person B have some commonalities in terms of their occupations being in the agriculture sector. However, their age difference is quite significant, with Person A being 26 and Person B being 61. This age gap may lead to differences in lifestyle preferences, priorities, and interests. Person A enjoys hobbies related to collecting, while Person B\'s hobbies are more intellectually focused. Additionally, there is a notable income disparity between the two individuals, with Person B belonging to the high-income group and Person A being in the lower-middle class. These differences in age, interests, and income may impact the compatibility between Person A and Person B.",\n    "CompatibilityPercentage": "50"\n}\n```'

### Side Note:

- The temperature has been set to 0, the top-P to 0.95.
- Ofc, this means the top choice of the sample is always chosen.
- This should ensure that the response is highly deterministic, it would be very unlikely that a token that belongs to `0.05%` of the probability at any step eventually leads to the most likely sequence. 

In [27]:
results = {}

for i in range(len(india_agents)):
    for j in range(len(india_agents)):
        if i == j:
            continue
        cc = check_compatability(india_agents[i], india_agents[j], "gpt-3.5-turbo")
        with open("logs.txt", "a") as f:
            f.write(f"score_{i}_{j}\n")
            f.write(cc)
        results[f"score_{i}_{j}"] = cc

with open("raw_agents/india_mat_scores_35_turbo.json", "w") as f:
    json.dump(results, f)

KeyboardInterrupt: 

In [40]:
## This takes too much time, pivot to using the batch-api for the requests.
def prepare_request_json(custom_id, person1, person2, model):
    request_json = {}
    request_json["custom_id"] = custom_id
    request_json["method"] = "POST"
    request_json["url"] = "/v1/chat/completions"
    request_json["body"] = {
        "model": model,
        "messages": [
            { "role" : "system", "content": "You are a specialist in real estate and public policy, You are tasked with assessing the compatibility between two potential neighours."},
            { "role": "user", "content": f"""
                Assess the compatibility between
                 Person A: {str(person1)} 
                 
                 and
                
                Person B: {str(person2)}, Respond in this format: 
                ```json
                {{
                    "CompatibilityExplanation": "string", // Explain the reasoning behind your answer.
                    "CompatibilityPercentage": "number" // The percentage of compatibility between the two individuals.
                }}
                ```         
            """}
        ],
        "temperature": 0,
        "top_p": 0.95
    }

    return request_json

In [12]:
prepare_request_json("test", india_agents[0], india_agents[1], "gpt-3.5-turbo")

{'custom_id': 'test',
 'method': 'POST',
 'url': 'v1/chat/completions',
 'body': {'model': 'gpt-3.5-turbo',
  'messages': [{'role': 'system',
    'content': 'You are a specialist in real estate and public policy, You are tasked with assessing the compatibility between two potential neighours.'},
   {'role': 'user',
    'content': '\n                Assess the compatibility between\n                 Person A: {\'id\': 0, \'name\': \'Simran Walia\', \'name_group\': \'north\', \'occupation\': \'Forestry Technician\', \'occupation_group\': \'Agriculture\', \'age\': 26, \'age_group\': \'25–29\', \'income\': 193452, \'income_group\': \'Lower-Middle Class\', \'hobbies\': \'trading cards and toy collecting\', \'hobbies_group\': \'collecting\'} \n                 \n                 and\n                \n                Person B: {\'id\': 1, \'name\': \'Juhi Chawla\', \'name_group\': \'north\', \'occupation\': \'Irrigation Engineer\', \'occupation_group\': \'Agriculture\', \'age\': 61, \'age_gr

In [37]:
## add a json line to a file.
models = ["gpt-3.5-turbo", "gpt-4o-mini"]
for mod in models:
    for i in range(len(india_agents)):
        for j in range(len(india_agents)):
            if i == j:
                continue
            with open(f"india_exp_mat_agents_{mod}.jsonl", "a") as f:
                f.write(json.dumps(prepare_request_json(f"score_{i}_{j}", india_agents[i], india_agents[j], mod)) + "\n") 

KeyboardInterrupt: 

## 3000 x 3000 balloons in size (and ofc cost).

- Experimented updated to be of 400 agents in 28x28 grid instead.

In [13]:
## take 400 samples from `agents`
import random

india_agents_sampled = random.sample(india_agents, 400)

In [14]:
with open("raw_agents/india_exp_agents_sampled.json", "w") as f:
    json.dump(india_agents_sampled, f, indent=4)

In [80]:
## simple EDA on groups in the sampled agents.
from collections import Counter

income_groups = [agent["income_group"] for agent in india_agents_sampled]
occupation_groups = [agent["occupation_group"] for agent in india_agents_sampled]
age_groups = [agent["age_group"] for agent in india_agents_sampled]
name_groups = [agent["name_group"] for agent in india_agents_sampled]
Counter(income_groups), Counter(occupation_groups), Counter(age_groups), Counter(name_groups)

(Counter({'Lower-Middle Class': 125,
          'Middle Class': 122,
          'Low-Income Group': 99,
          'Upper-Middle Class': 33,
          'High-Income Group': 21}),
 Counter({'Agriculture': 168, 'Services': 135, 'Industry': 97}),
 Counter({'30–34': 56,
          '35–39': 48,
          '40–44': 47,
          '25–29': 47,
          '20–24': 46,
          '45–49': 41,
          '50–54': 33,
          '55–59': 30,
          '60–64': 30,
          '65–69': 22}),
 Counter({'north': 72,
          'north-east': 69,
          'south': 66,
          'east': 66,
          'west': 66,
          'central': 61}))

### Distribution looks fair, so moving on.

In [82]:
models = ["gpt-3.5-turbo", "gpt-4o-mini"]
for mod in models:
    lines = []
    for agent_i in india_agents_sampled:
        for agent_j in india_agents_sampled:
            id_i, id_j = agent_i["id"], agent_j["id"]
            if id_i == id_j:
                continue
            lines.append(json.dumps(prepare_request_json(f"score_{id_i}_{id_j}", agent_i, agent_j, mod)) + "\n")
    with open(f"batch/india_exp_mat_agents_{mod}.jsonl", "w") as f:
        f.writelines(lines)

In [83]:
## split each file into halves for restricting to under <= 200 MB and number of requests to 50,000 max.

models = ["gpt-3.5-turbo", "gpt-4o-mini"]

for model in models:
    with open(f"batch/india_exp_mat_agents_{model}.jsonl", "r") as f:
        lines = f.readlines()

    divisions = (len(lines) // 40000) + 1

    for i in range(divisions):
        with open(f"batch/india_exp_mat_agents_{model}_batch{i}.jsonl", "w") as f:
            f.writelines(lines[i*len(lines)//divisions:(i+1)*len(lines)//divisions])

In [84]:
batch_files = glob.glob("batch/*batch*.jsonl")
batch_files 

['batch\\india_exp_mat_agents_gpt-3.5-turbo_batch0.jsonl',
 'batch\\india_exp_mat_agents_gpt-3.5-turbo_batch1.jsonl',
 'batch\\india_exp_mat_agents_gpt-3.5-turbo_batch2.jsonl',
 'batch\\india_exp_mat_agents_gpt-3.5-turbo_batch3.jsonl',
 'batch\\india_exp_mat_agents_gpt-4o-mini_batch0.jsonl',
 'batch\\india_exp_mat_agents_gpt-4o-mini_batch1.jsonl',
 'batch\\india_exp_mat_agents_gpt-4o-mini_batch2.jsonl',
 'batch\\india_exp_mat_agents_gpt-4o-mini_batch3.jsonl']

In [85]:
## Upload the files.

for file in batch_files:
    batch_input_file = client.files.create(file = open(file, "rb"), purpose = "batch")
    print(batch_input_file)

FileObject(id='file-4yDLGdvAhJzbB6CBrTBhGM', bytes=59415626, created_at=1737208126, filename='india_exp_mat_agents_gpt-3.5-turbo_batch0.jsonl', object='file', purpose='batch', status='processed', status_details=None)
FileObject(id='file-H1WDckA5FPGfhjyNySHjFR', bytes=59403288, created_at=1737208140, filename='india_exp_mat_agents_gpt-3.5-turbo_batch1.jsonl', object='file', purpose='batch', status='processed', status_details=None)
FileObject(id='file-8EXvq5D1U4mTULNsi4SYiK', bytes=59460600, created_at=1737208153, filename='india_exp_mat_agents_gpt-3.5-turbo_batch2.jsonl', object='file', purpose='batch', status='processed', status_details=None)
FileObject(id='file-7cjMm9BrwxgKNAKDFaTBv5', bytes=59442292, created_at=1737208167, filename='india_exp_mat_agents_gpt-3.5-turbo_batch3.jsonl', object='file', purpose='batch', status='processed', status_details=None)
FileObject(id='file-J9Q3JbamDZscznhL3S84Fg', bytes=59335826, created_at=1737208180, filename='india_exp_mat_agents_gpt-4o-mini_batch

### Copied things to manifest.


In [47]:
## Load the file ids from manifest.

with open("batch/manifest.txt") as f:
    lines = f.readlines()
len(lines)

8

In [49]:
lines[0]

## catch the file ids from the manifest by this regex: FileObject(id='<file_id>'
import re

file_ids = [re.search(r"FileObject\(id='(.*?)'", line).group(1) for line in lines]

len(file_ids), # file_ids

(8,)

In [50]:
## Create the batch requests.

batches = []

for i, file_id in enumerate(file_ids):
    batches.append(client.batches.create(input_file_id=file_id, endpoint="/v1/chat/completions", completion_window="24h", metadata = {"purpose": f"india_exp_mat_agents_{i}"}))

len(batches)

8

In [53]:
# batches

In [54]:
# print(client.batches.list())

In [55]:
with open("batch/batches.txt") as f:
    lines = f.readlines()
len(lines)

8

In [72]:
lines[0]

## catch the file ids from the manifest by this regex: Batch(id='<id>'
import re

batch_ids = [re.search(r"Batch\(id='(.*?)'", line).group(1) for line in lines]

len(batch_ids), # batch_ids

(8,)

In [58]:
def retrieve_batch_status(batch_id):
    return client.batches.retrieve(batch_id)

In [67]:
## retrieve_batch_status(batch_ids[7])

In [73]:
## retrieve all batches.
batch_id_all = client.batches.list(limit=50)

In [76]:
# print(batch_id_all)

In [75]:
with open("batch/dump.txt") as f:
    data = f.read()

In [78]:
## capture all batch ids with the regex.

batch_id_all = re.findall(r"Batch\(id='(.*?)'", data)
# batch_id_all

In [79]:
## def cancel.
def cancel_batches(batch_id):
    client.batches.cancel(batch_id)

In [87]:
for batch_id in batch_id_all:
    try:
        cancel_batches(batch_id)
    except Exception as e:
        print(e)

Error code: 409 - {'error': {'message': "Cannot cancel a batch with status 'failed'.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
Error code: 409 - {'error': {'message': "Cannot cancel a batch with status 'failed'.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
Error code: 409 - {'error': {'message': "Cannot cancel a batch with status 'failed'.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
Error code: 409 - {'error': {'message': "Cannot cancel a batch with status 'failed'.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
Error code: 409 - {'error': {'message': "Cannot cancel a batch with status 'failed'.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
Error code: 409 - {'error': {'message': "Cannot cancel a batch with status 'failed'.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
Error code: 409 - {'error': {'message': "Cannot cancel a batch with status 'failed'.", 'type': 'inva

## OpenAI blocks upload batch files.

- Even if I have enqueued prompt tokens with failed jobs.
- Check the number of tokens and estimate cost for this code.

```python
results = {}

for i in range(len(india_agents_sampled)):
    for j in range(len(india_agents_sampled)):
        if i == j:
            continue
        cc = check_compatability(india_agents[i], india_agents[j], "gpt-3.5-turbo")
        with open("logs.txt", "a") as f:
            f.write(f"score_{i}_{j}\n")
            f.write(cc)
        results[f"score_{i}_{j}"] = cc

with open("raw_agents/india_mat_scores_35_turbo.json", "w") as f:
    json.dump(results, f)
```

In [14]:
import json
with open("raw_agents/india_exp_agents_sampled.json", "r") as f:
    india_agents_sampled = json.load(f)

In [15]:
india_agents_sampled = [Person(*agent) for agent in india_agents_sampled]

In [17]:
india_agents_sampled[0]

<__main__.Person at 0x192420d1950>

In [88]:
def check_compatability_prepare_messages(person1, person2):
    messages=[
        { "role" : "system", "content": "You are a specialist in real estate and public policy, You are tasked with assessing the compatibility between two potential neighours."},
        { "role": "user", "content": f"""
            Assess the compatibility between
                Person A: {str(person1)} 
                
                and
            
            Person B: {str(person2)}, Respond in this format: 
            ```json
            {{
                "CompatibilityExplanation": "string", // Explain the reasoning behind your answer.
                "CompatibilityPercentage": "number" // The percentage of compatibility between the two individuals.
            }}
            ```         
        """}
    ]

    return messages

In [91]:
import tiktoken

total_tokens_gpt35_turbo = 0
total_tokens_gpt4o_mini = 0

encoder_gpt35_turbo = tiktoken.encoding_for_model("gpt-3.5-turbo")
encoder_gpt4o_mini = tiktoken.encoding_for_model("gpt-4o-mini")

for ag_i in india_agents_sampled:
    for ag_j in india_agents_sampled:
        id_i, id_j = ag_i["id"], ag_j["id"]
        if id_i == id_j:
            continue
        messages = check_compatability_prepare_messages(ag_i, ag_j)

        message_token_cost = 3 ## we dont know for 4omini.
        system_token_cost = 2

        total_tokens_gpt35_turbo += system_token_cost
        total_tokens_gpt4o_mini += system_token_cost

        for message in messages:
            total_tokens_gpt35_turbo += message_token_cost
            total_tokens_gpt35_turbo += len(encoder_gpt35_turbo.encode(message["role"]))
            total_tokens_gpt35_turbo += len(encoder_gpt35_turbo.encode(message["content"]))
            total_tokens_gpt4o_mini += message_token_cost
            total_tokens_gpt4o_mini += len(encoder_gpt4o_mini.encode(message["role"]))
            total_tokens_gpt4o_mini += len(encoder_gpt4o_mini.encode(message["content"]))

total_tokens_gpt35_turbo, total_tokens_gpt4o_mini

(45896172, 45515526)

In [92]:
sample_output_35turbo = encoder_gpt35_turbo.encode('```json\n{\n    "CompatibilityExplanation": "Person A and Person B have some commonalities in terms of their occupations being in the agriculture sector. However, their age difference is quite significant, with Person A being 26 and Person B being 61. This age gap may lead to differences in lifestyle preferences, priorities, and interests. Person A enjoys hobbies related to collecting, while Person B\'s hobbies are more intellectually focused. Additionally, there is a notable income disparity between the two individuals, with Person B belonging to the high-income group and Person A being in the lower-middle class. These differences in age, interests, and income may impact the compatibility between Person A and Person B.",\n    "CompatibilityPercentage": "50"\n}\n```')

In [93]:
len(sample_output_35turbo)

149

In [94]:
sample_output_4omini = encoder_gpt4o_mini.encode('```json\n{\n    "CompatibilityExplanation": "Person A and Person B have some commonalities in terms of their occupations being in the agriculture sector. However, their age difference is quite significant, with Person A being 26 and Person B being 61. This age gap may lead to differences in lifestyle preferences, priorities, and interests. Person A enjoys hobbies related to collecting, while Person B\'s hobbies are more intellectually focused. Additionally, there is a notable income disparity between the two individuals, with Person B belonging to the high-income group and Person A being in the lower-middle class. These differences in age, interests, and income may impact the compatibility between Person A and Person B.",\n    "CompatibilityPercentage": "50"\n}\n```')

In [95]:
len(sample_output_4omini)

149

In [96]:
# 150 x 400 x 400 tokens in output
# 4omini
# 46 mn -- input : 46 x 0.15 = 6.9$
# 25 mn -- output: 15$

# -- alone costs $21.9

In [97]:
## so limiting experiments to this for now, lets trigger and observe the results here first.

In [99]:
args = []

for agent_i in india_agents_sampled:
    for agent_j in india_agents_sampled:
        id_i, id_j = agent_i["id"], agent_j["id"]
        if id_i == id_j:
            continue
        messages = check_compatability_prepare_messages(agent_i, agent_j)
        args.append((id_i, id_j, messages))

In [101]:
def check_compatability_parallel(id_1, id_2, messages, model):
    completion = client.chat.completions.create(
        temperature=0,
        top_p=0.95,
        model = model,
        messages=messages
    )

    return id_1, id_2, completion.choices[0].message.content

In [104]:
# check_compatability_parallel(args[0][0], args[0][1], args[0][2], "gpt-4o-mini")

In [None]:
import concurrent.futures

completion_results = {}

with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    results = {executor.submit(check_compatability_parallel, arg[0], arg[1], arg[2], "gpt-4o-mini"): arg for arg in args}
    for future in concurrent.futures.as_completed(results):
        idx, idy, content = future.result()
        with open("logs.txt", "a") as f:
            f.write(f"score_{idx}_{idy}\n")
            f.write(content)
        completion_results[f"score_{idx}_{idy}"] = content

with open("raw_agents/india_exp_mat_scores_4o_mini.json", "w") as f:
    json.dump(completion_results, f, indent=4)