# Creating and Managing Experiments

The last two guides showcased how you can create and run synthetic discussions, and synthetic annotations using LLMs. However, in order to produce robust results for a hypothesis, you may need to produce multiple annotated discussions. 

While this is certainly possible using the `Discussion` and `Annotation` APIs, SynDisco offers the `Experiment` high-level API which automatically creates and manages multiple discussions with different configurations. An`Experiment` is an entity that generates and runs `jobs`. Thus, if we want to generate and run 100 `Discussion` jobs, we would use a `DiscussionExperiment`. Likewise, if we want to annotate those 100 discussions, we would use an `AnnotationExperiment`. 

This guide will showcase how you can leverage this API to automate your experiments. You will also learn how to utilize SynDisco's built-in logging functions as well as how to export your datasets in CSV format for convenience. 

In [1]:
!export CUDA_VISIBLE_DEVICES=0

## Logging

While running a single discussion or annotation job may take a few minutes, running experiments composed of dozens or hundreds of synthetic discussions may take up to days. Thus, we need a mechanism to keep track of our experiments while they are running.

We will use SynDisco's `logging_util` module to log information about our experiments. This module performs the following functions:

* Times the execution of computationally intensive jobs (such as synthetic discussions and annotations)
* Provides details about the currently running jobs (e.g. selected configurations, participants, prompts etc.)
* Displays warnings and errors to the user
* Creates and continually updates log files

Each object in SynDisco is internally assigned a Logger. You can use the `logging_util.logging_setup` function to update all of the internal loggers to follow your configuration. An example of this can be seen below:

In [2]:
from pathlib import Path
import tempfile

from syndisco import logging_util


logs_dir = tempfile.TemporaryDirectory()
logging_util.logging_setup(
    print_to_terminal=True,
    write_to_file=True,
    logs_dir=Path(logs_dir.name),
    level="debug",
    use_colors=True,
    log_warnings=True,
)

The loggers are applicable for all objects in SynDisco, and as such can be used for information on `Discussion`, and `Annotation` jobs, as well as all low-level components (such as those in the `backend` module). 

It is recommended to set up the loggers *no matter your use case*. At the very least, they are useful for clearly displaying warnings in case of accidental API misuse.

## Discussion Experiments

In [3]:
from syndisco.turn_manager import RoundRobin
from syndisco.actors import Actor, ActorType, Persona
from syndisco.model import TransformersModel


CONTEXT = "You are taking part in an online conversation"
INSTRUCTIONS = "Act like a human would"


llm = TransformersModel(
    model_path="unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
    name="test_model",
    max_out_tokens=100,
)
persona_data = [
    {
        "username": "Emma35",
        "age": 38,
        "sex": "female",
        "education_level": "Bachelor's",
        "sexual_orientation": "Heterosexual",
        "demographic_group": "Latino",
        "current_employment": "Registered Nurse",
        "special_instructions": "",
        "personality_characteristics": [
            "compassionate",
            "patient",
            "diligent",
            "overwhelmed",
        ],
    },
    {
        "username": "Giannis",
        "age": 21,
        "sex": "male",
        "education_level": "College",
        "sexual_orientation": "Pansexual",
        "demographic_group": "White",
        "current_employment": "Game Developer",
        "special_instructions": "Be antagonistic towards the other user",
        "personality_characteristics": [
            "strategic",
            "meticulous",
            "nerdy",
            "hyper-focused",
        ],
    },
]
personas = [Persona(**data) for data in persona_data]
actors = [
    Actor(
        model=llm,
        persona=p,
        context=CONTEXT,
        instructions=INSTRUCTIONS,
        actor_type=ActorType.USER,
    )
    for p in personas
]
turn_manager = RoundRobin([actor.get_name() for actor in actors])

2026-01-22 16:55:28 CP-G482-Z52-00 urllib3.connectionpool[2750451] DEBUG Starting new HTTPS connection (1): huggingface.co:443
2026-01-22 16:55:28 CP-G482-Z52-00 urllib3.connectionpool[2750451] DEBUG https://huggingface.co:443 "HEAD /unsloth/Llama-3.2-3B-Instruct-bnb-4bit/resolve/main/config.json HTTP/1.1" 307 0
2026-01-22 16:55:28 CP-G482-Z52-00 urllib3.connectionpool[2750451] DEBUG https://huggingface.co:443 "HEAD /api/resolve-cache/models/unsloth/Llama-3.2-3B-Instruct-bnb-4bit/bb1d317a108579fb40e646af8924a5e7ec5604b1/config.json HTTP/1.1" 200 0
2026-01-22 16:55:29 CP-G482-Z52-00 bitsandbytes.cextension[2750451] DEBUG Loading bitsandbytes native library from: /media/SSD_4TB_2/dtsirmpas/software/miniforge/envs/syndisco-dev/lib/python3.14/site-packages/bitsandbytes/libbitsandbytes_cuda128.so
2026-01-22 16:55:32 CP-G482-Z52-00 accelerate.utils.modeling[2750451] INFO Based on the current allocation process, no modules could be assigned to the following devices due to insufficient memory:

In [4]:
from syndisco.experiments import DiscussionExperiment


disc_exp = DiscussionExperiment(
    seed_opinions=[
        ["Should programmers be allowed to analyze data?", "Absolutely not"],
        ["Should data analysts be allowed to code?", "No they are nerds"],
    ],
    users=actors,
    moderator=None,
    num_turns=3,
    num_discussions=2,
)
discussions_dir = Path(tempfile.TemporaryDirectory().name)
disc_exp.begin(discussions_output_dir=discussions_dir)



  0%|          | 0/2 [00:00<?, ?it/s]

2026-01-22 16:55:39 CP-G482-Z52-00 root[2750451] INFO Running experiment 1/3...
2026-01-22 16:55:39 CP-G482-Z52-00 experiments.py[2750451] DEBUG Experiment parameters: {
    "id": "07bd7e3d-984b-4248-95c2-bc5cc33c2484",
    "timestamp": "26-01-22-16-55",
    "users": [
        "Emma35",
        "Giannis"
    ],
    "moderator": null,
    "user_prompts": [
        "{\"context\": \"You are taking part in an online conversation\", \"instructions\": \"Act like a human would\", \"type\": \"1\", \"persona\": {\"username\": \"Emma35\", \"age\": 38, \"sex\": \"female\", \"sexual_orientation\": \"Heterosexual\", \"demographic_group\": \"Latino\", \"current_employment\": \"Registered Nurse\", \"education_level\": \"Bachelor's\", \"special_instructions\": \"\", \"personality_characteristics\": [\"compassionate\", \"patient\", \"diligent\", \"overwhelmed\"]}}",
        "{\"context\": \"You are taking part in an online conversation\", \"instructions\": \"Act like a human would\", \"type\": \"1\", \

User Giannis posted:
Should programmers be allowed to analyze data? 

User Emma35 posted:
Absolutely not 



  0%|          | 0/3 [00:00<?, ?it/s]

The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


User Emma35 posted:
{"role": "user", "content": "I think that's a pretty narrow-minded
view, Emma. Programmers are essential in analyzing data to make
informed decisions and drive progress. What's your take on that?"} 

User Giannis posted:
{"role": "user", "content": "User Emma35 posted:\n\"Narrow-minded?\"
You're the one who's being narrow-minded, Giannis. Analyzing data is
just a tool, not a fundamental aspect of being a programmer. There are
plenty of programmers who can write code without touching a
spreadsheet. What's your take on that?"} 



2026-01-22 16:55:55 CP-G482-Z52-00 root[2750451] DEBUG Finished discussion in 15.442159175872803 seconds.
2026-01-22 16:55:55 CP-G482-Z52-00 logging_util.py[2750451] DEBUG Procedure _run_single_discussion executed in 0.2574 minutes
2026-01-22 16:55:55 CP-G482-Z52-00 root[2750451] INFO Running experiment 2/3...
2026-01-22 16:55:55 CP-G482-Z52-00 experiments.py[2750451] DEBUG Experiment parameters: {
    "id": "32549f6d-5ff3-4a1a-b04b-6aa695d1198a",
    "timestamp": "26-01-22-16-55",
    "users": [
        "Giannis",
        "Emma35"
    ],
    "moderator": null,
    "user_prompts": [
        "{\"context\": \"You are taking part in an online conversation\", \"instructions\": \"Act like a human would\", \"type\": \"1\", \"persona\": {\"username\": \"Giannis\", \"age\": 21, \"sex\": \"male\", \"sexual_orientation\": \"Pansexual\", \"demographic_group\": \"White\", \"current_employment\": \"Game Developer\", \"education_level\": \"College\", \"special_instructions\": \"Be antagonistic towar

User Emma35 posted:
"Giannis, I think you're misunderstanding my point. I'm not saying
that analyzing data isn't a valuable skill for programmers, but I do
think it's not the only thing that makes a programmer. There are many
aspects to programming, such as problem-solving, creativity, and
collaboration. And let's be real, not every programmer is a data
analyst. I've worked with plenty of programmers who are experts in
their field, but don't necessarily need to be proficient in data
analysis. Can 

User Emma35 posted:
Should data analysts be allowed to code? 

User Giannis posted:
No they are nerds 



  0%|          | 0/3 [00:00<?, ?it/s]

User Emma35 posted:
{"role": "user", "content": "I think that's a pretty narrow-minded
view. Data analysts bring a unique perspective to the table, and their
analytical skills are invaluable in helping organizations make
informed decisions. Coding is a crucial part of their job, and it's
not something they should be ashamed of. In fact, I'd argue that
coding is an essential skill for any data analyst. It's not just about
crunching numbers, it's about being able to communicate complex ideas
and insights 

User Giannis posted:
{"role": "user", "content": "User Giannis posted:\nAre you kidding me?
You think data analysts are just going to magically become proficient
coders overnight? Newsflash: coding is not something you learn in a
data analysis course. It takes years of dedication and practice to
become a skilled coder. And even then, it's not like they're going to
be able to keep up with the latest trends and technologies. I mean,
have you seen the state of the industry lately 



2026-01-22 16:56:14 CP-G482-Z52-00 root[2750451] DEBUG Finished discussion in 19.7291476726532 seconds.
2026-01-22 16:56:14 CP-G482-Z52-00 logging_util.py[2750451] DEBUG Procedure _run_single_discussion executed in 0.3289 minutes
2026-01-22 16:56:15 CP-G482-Z52-00 experiments.py[2750451] INFO Finished synthetic discussion generation.
2026-01-22 16:56:15 CP-G482-Z52-00 logging_util.py[2750451] DEBUG Procedure _run_all_discussions executed in 0.5867 minutes


User Emma35 posted:
I completely understand where you're coming from, Giannis. I think
we've all been there - feeling like we're not good enough or that
we're not doing enough. But let's not forget that data analysts are
often the ones who have to bridge the gap between business and
technology. They need to be able to communicate complex ideas to non-
technical stakeholders, and that requires a certain level of coding
proficiency. I'm not saying it's easy, but it's definitely not
impossible. And as 



## Annotation Experiments

In [5]:
annotator_persona = Persona(
    **{
        "username": "annotator",
        "age": 38,
        "sex": "female",
        "education_level": "Bachelor's",
        "sexual_orientation": "Heterosexual",
        "demographic_group": "White",
        "current_employment": "Annotator",
        "special_instructions": "",
        "personality_characteristics": ["competent"],
    }
)

annotator = Actor(
    model=llm,
    persona=annotator_persona,
    context="You are annotating an online discussion",
    instructions="From a scale of 1 (not toxic) to 5 (very toxic) how toxic is the following comment? Reply only with a number.",
    actor_type=ActorType.ANNOTATOR,
)

In [6]:
from syndisco.experiments import AnnotationExperiment

ann_exp = AnnotationExperiment(annotators=[annotator])
annotations_dir = Path(tempfile.TemporaryDirectory().name)
ann_exp.begin(discussions_dir=discussions_dir, output_dir=annotations_dir)

  0%|          | 0/2 [00:00<?, ?it/s]

2026-01-22 16:56:15 CP-G482-Z52-00 experiments.py[2750451] DEBUG Experiment parameters: {
    "conv_id": "07bd7e3d-984b-4248-95c2-bc5cc33c2484",
    "timestamp": "26-01-22-16-56",
    "annotator_model": "test_model",
    "annotator_prompt": "{\"context\": \"You are annotating an online discussion\", \"instructions\": \"From a scale of 1 (not toxic) to 5 (very toxic) how toxic is the following comment? Reply only with a number.\", \"type\": \"2\", \"persona\": {\"username\": \"annotator\", \"age\": 38, \"sex\": \"female\", \"sexual_orientation\": \"Heterosexual\", \"demographic_group\": \"White\", \"current_employment\": \"Annotator\", \"education_level\": \"Bachelor's\", \"special_instructions\": \"\", \"personality_characteristics\": [\"competent\"]}}",
    "ctx_length": 3,
    "logs": []
}


  0%|          | 0/5 [00:00<?, ?it/s]

User Giannis posted: Should programmers be allowed to analyze data?
1
User Emma35 posted: Absolutely not
1
User Emma35 posted: {"role": "user", "content": "I think that's a
pretty narrow-minded view, Emma. Programmers are essential in
analyzing data to make informed decisions and drive progress. What's
your take on that?"}
1
User Giannis posted: {"role": "user", "content": "User Emma35
posted:\n\"Narrow-minded?\" You're the one who's being narrow-minded,
Giannis. Analyzing data is just a tool, not a fundamental aspect of
being a programmer. There are plenty of programmers who can write code
without touching a spreadsheet. What's your take on that?"}
3


2026-01-22 16:56:17 CP-G482-Z52-00 logging_util.py[2750451] DEBUG Procedure _run_single_annotation executed in 0.0340 minutes
2026-01-22 16:56:17 CP-G482-Z52-00 experiments.py[2750451] DEBUG Experiment parameters: {
    "conv_id": "32549f6d-5ff3-4a1a-b04b-6aa695d1198a",
    "timestamp": "26-01-22-16-56",
    "annotator_model": "test_model",
    "annotator_prompt": "{\"context\": \"You are annotating an online discussion\", \"instructions\": \"From a scale of 1 (not toxic) to 5 (very toxic) how toxic is the following comment? Reply only with a number.\", \"type\": \"2\", \"persona\": {\"username\": \"annotator\", \"age\": 38, \"sex\": \"female\", \"sexual_orientation\": \"Heterosexual\", \"demographic_group\": \"White\", \"current_employment\": \"Annotator\", \"education_level\": \"Bachelor's\", \"special_instructions\": \"\", \"personality_characteristics\": [\"competent\"]}}",
    "ctx_length": 3,
    "logs": []
}


User Emma35 posted: "Giannis, I think you're misunderstanding my
point. I'm not saying that analyzing data isn't a valuable skill for
programmers, but I do think it's not the only thing that makes a
programmer. There are many aspects to programming, such as problem-
solving, creativity, and collaboration. And let's be real, not every
programmer is a data analyst. I've worked with plenty of programmers
who are experts in their field, but don't necessarily need to be
proficient in data analysis. Can
3


  0%|          | 0/5 [00:00<?, ?it/s]

User Emma35 posted: Should data analysts be allowed to code?
1
User Giannis posted: No they are nerds
3
User Emma35 posted: {"role": "user", "content": "I think that's a
pretty narrow-minded view. Data analysts bring a unique perspective to
the table, and their analytical skills are invaluable in helping
organizations make informed decisions. Coding is a crucial part of
their job, and it's not something they should be ashamed of. In fact,
I'd argue that coding is an essential skill for any data analyst. It's
not just about crunching numbers, it's about being able to communicate
complex ideas and insights
2
User Giannis posted: {"role": "user", "content": "User Giannis
posted:\nAre you kidding me? You think data analysts are just going to
magically become proficient coders overnight? Newsflash: coding is not
something you learn in a data analysis course. It takes years of
dedication and practice to become a skilled coder. And even then, it's
not like they're going to be able to keep up 

2026-01-22 16:56:19 CP-G482-Z52-00 logging_util.py[2750451] DEBUG Procedure _run_single_annotation executed in 0.0362 minutes
2026-01-22 16:56:19 CP-G482-Z52-00 experiments.py[2750451] INFO Finished annotation generation.
2026-01-22 16:56:19 CP-G482-Z52-00 logging_util.py[2750451] DEBUG Procedure _run_all_annotations executed in 0.0704 minutes


User Emma35 posted: I completely understand where you're coming from,
Giannis. I think we've all been there - feeling like we're not good
enough or that we're not doing enough. But let's not forget that data
analysts are often the ones who have to bridge the gap between
business and technology. They need to be able to communicate complex
ideas to non- technical stakeholders, and that requires a certain
level of coding proficiency. I'm not saying it's easy, but it's
definitely not impossible. And as
3


## Exporting your new dataset

As you have seen so far, SynDisco uses collections of JSON files by default for persistence. This is a handy feature for fault tolerance and disk efficiency, but is not as weildy as a traditional CSV dataset.

Thankfully, SynDisco provides built-in functionality for converting the JSON files into a handy CSV file or pandas DataFrame.

In [7]:
from syndisco import postprocessing


discussions_df = postprocessing.import_discussions(conv_dir=discussions_dir)
discussions_df

Unnamed: 0,conv_id,timestamp,ctx_length,user,message,model,is_moderator,message_id,message_order,age,sex,sexual_orientation,demographic_group,current_employment,education_level,special_instructions,personality_characteristics
0,07bd7e3d-984b-4248-95c2-bc5cc33c2484,26-01-22-16-55,3,Giannis,Should programmers be allowed to analyze data?,hardcoded,False,01dc687f56549f32be89b887c569a7db,1,21,male,Pansexual,White,Game Developer,College,Be antagonistic towards the other user,"[strategic, meticulous, nerdy, hyper-focused]"
1,07bd7e3d-984b-4248-95c2-bc5cc33c2484,26-01-22-16-55,3,Emma35,Absolutely not,hardcoded,False,fb4fbd5963afea9d58cd9f9380f8fbc1,2,38,female,Heterosexual,Latino,Registered Nurse,Bachelor's,,"[compassionate, patient, diligent, overwhelmed]"
2,07bd7e3d-984b-4248-95c2-bc5cc33c2484,26-01-22-16-55,3,Emma35,"{""role"": ""user"", ""content"": ""I think that's a ...",test_model,False,81b1c0805b435f7906b17de4024bd094,3,38,female,Heterosexual,Latino,Registered Nurse,Bachelor's,,"[compassionate, patient, diligent, overwhelmed]"
3,07bd7e3d-984b-4248-95c2-bc5cc33c2484,26-01-22-16-55,3,Giannis,"{""role"": ""user"", ""content"": ""User Emma35 poste...",test_model,False,6ed68b3b0350863bb2f1a8238a85dfa3,4,21,male,Pansexual,White,Game Developer,College,Be antagonistic towards the other user,"[strategic, meticulous, nerdy, hyper-focused]"
4,07bd7e3d-984b-4248-95c2-bc5cc33c2484,26-01-22-16-55,3,Emma35,"""Giannis, I think you're misunderstanding my p...",test_model,False,9588371adf451a8ba918aaa78afb9120,5,38,female,Heterosexual,Latino,Registered Nurse,Bachelor's,,"[compassionate, patient, diligent, overwhelmed]"
5,32549f6d-5ff3-4a1a-b04b-6aa695d1198a,26-01-22-16-56,3,Emma35,Should data analysts be allowed to code?,hardcoded,False,e05cd27897c38bd5294ab709a4abbbfb,1,38,female,Heterosexual,Latino,Registered Nurse,Bachelor's,,"[compassionate, patient, diligent, overwhelmed]"
6,32549f6d-5ff3-4a1a-b04b-6aa695d1198a,26-01-22-16-56,3,Giannis,No they are nerds,hardcoded,False,ec9b695a42c2133450fa5a1b6d8af19b,2,21,male,Pansexual,White,Game Developer,College,Be antagonistic towards the other user,"[strategic, meticulous, nerdy, hyper-focused]"
7,32549f6d-5ff3-4a1a-b04b-6aa695d1198a,26-01-22-16-56,3,Emma35,"{""role"": ""user"", ""content"": ""I think that's a ...",test_model,False,df7b240c35b1d960d37ce3f2cd4b5e77,3,38,female,Heterosexual,Latino,Registered Nurse,Bachelor's,,"[compassionate, patient, diligent, overwhelmed]"
8,32549f6d-5ff3-4a1a-b04b-6aa695d1198a,26-01-22-16-56,3,Giannis,"{""role"": ""user"", ""content"": ""User Giannis post...",test_model,False,f34bb1febf22da93f6f357664b02f26c,4,21,male,Pansexual,White,Game Developer,College,Be antagonistic towards the other user,"[strategic, meticulous, nerdy, hyper-focused]"
9,32549f6d-5ff3-4a1a-b04b-6aa695d1198a,26-01-22-16-56,3,Emma35,I completely understand where you're coming fr...,test_model,False,fd68eb384baf042197d47ad0826cf8de,5,38,female,Heterosexual,Latino,Registered Nurse,Bachelor's,,"[compassionate, patient, diligent, overwhelmed]"


In [9]:
annotations_df = postprocessing.import_annotations(annot_dir=annotations_dir)
annotations_df

KeyError: "Label(s) ['annot_personality_characteristics'] do not exist"