# Creating and Managing Experiments

The last two guides showcased how you can create and run synthetic discussions, and synthetic annotations using LLMs. However, in order to produce robust results for a hypothesis, you may need to produce multiple annotated discussions. 

While this is certainly possible using the `Discussion` and `Annotation` APIs, SynDisco offers the `Experiment` high-level API which automatically creates and manages multiple discussions with different configurations. An`Experiment` is an entity that generates and runs `jobs`. Thus, if we want to generate and run 100 `Discussion` jobs, we would use a `DiscussionExperiment`. Likewise, if we want to annotate those 100 discussions, we would use an `AnnotationExperiment`. 

This guide will showcase how you can leverage this API to automate your experiments. You will also learn how to utilize SynDisco's built-in logging functions as well as how to export your datasets in CSV format for convenience. 

## Logging

While running a single discussion or annotation job may take a few minutes, running experiments composed of dozens or hundreds of synthetic discussions may take up to days. Thus, we need a mechanism to keep track of our experiments while they are running.

We will use SynDisco's `logging_util` module to log information about our experiments. This module performs the following functions:

* Times the execution of computationally intensive jobs (such as synthetic discussions and annotations)
* Provides details about the currently running jobs (e.g. selected configurations, participants, prompts etc.)
* Displays warnings and errors to the user
* Creates and continually updates log files

Each object in SynDisco is internally assigned a Logger. You can use the `logging_util.logging_setup` function to update all of the internal loggers to follow your configuration. An example of this can be seen below:

In [None]:
%load_ext autoreload
%autoreload 2
from pathlib import Path
import tempfile

from syndisco import logging_util


logs_dir = tempfile.TemporaryDirectory()
logging_util.logging_setup(
    print_to_terminal=True,
    write_to_file=True,
    logs_dir=Path(logs_dir.name),
    level="debug",
    use_colors=True,
    log_warnings=True,
)

The loggers are applicable for all objects in SynDisco, and as such can be used for information on `Discussion`, and `Annotation` jobs, as well as all low-level components (such as those in the `backend` module). 

It is recommended to set up the loggers *no matter your use case*. At the very least, they are useful for clearly displaying warnings in case of accidental API misuse.

## Discussion Experiments

In [None]:
from syndisco.turn_manager import RoundRobin
from syndisco.actors import LLMActor, ActorType
from syndisco.model import TransformersModel
from syndisco.persona import LLMPersona


CONTEXT = "You are taking part in an online conversation"
INSTRUCTIONS = "Act like a human would"


llm = TransformersModel(
    model_path="unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
    name="test_model",
    max_out_tokens=100,
)
persona_data = [
    {
        "username": "Emma35",
        "age": 38,
        "sex": "female",
        "education_level": "Bachelor's",
        "sexual_orientation": "Heterosexual",
        "demographic_group": "Latino",
        "current_employment": "Registered Nurse",
        "special_instructions": "",
        "personality_characteristics": [
            "compassionate",
            "patient",
            "diligent",
            "overwhelmed",
        ],
    },
    {
        "username": "Giannis",
        "age": 21,
        "sex": "male",
        "education_level": "College",
        "sexual_orientation": "Pansexual",
        "demographic_group": "White",
        "current_employment": "Game Developer",
        "special_instructions": "",
        "personality_characteristics": [
            "strategic",
            "meticulous",
            "nerdy",
            "hyper-focused",
        ],
    },
]
personas = [LLMPersona(**data) for data in persona_data]
actors = [
    LLMActor(
        model=llm,
        persona=p,
        context=CONTEXT,
        instructions=INSTRUCTIONS,
        actor_type=ActorType.USER,
    )
    for p in personas
]
turn_manager = RoundRobin([actor.get_name() for actor in actors])


2025-06-12 14:22:54 fedora urllib3.connectionpool[36290] DEBUG Starting new HTTPS connection (1): huggingface.co:443
2025-06-12 14:22:55 fedora urllib3.connectionpool[36290] DEBUG https://huggingface.co:443 "HEAD /unsloth/Llama-3.2-3B-Instruct-bnb-4bit/resolve/main/config.json HTTP/1.1" 200 0
2025-06-12 14:22:56 fedora bitsandbytes.cextension[36290] DEBUG Loading bitsandbytes native library from: /home/dimits/miniconda3/envs/syndiscooooo/lib/python3.13/site-packages/bitsandbytes/libbitsandbytes_cuda126.so
2025-06-12 14:22:57 fedora accelerate.utils.modeling[36290] INFO We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
2025-06-12 14:22:58 fedora urllib3.connectionpool[36290] DEBUG https://huggingface.co:443 "HEAD /unsloth/Llama-3.2-3B-Instruct-bnb-4bit/resolve/main/generation_config.json HTTP/1.1" 200 0
2025-06-12 14:22:58 fedora urllib3.connectionpoo

In [3]:
from syndisco.experiments import DiscussionExperiment


disc_exp = DiscussionExperiment(
    seed_opinions=[
        "Should programmers be allowed to analyze data?",
        "Should data analysts be allowed to code?",
    ],
    users=actors,
    moderator=None,
    num_turns=3,
    num_discussions=2,
)
discussions_dir = Path(tempfile.TemporaryDirectory().name)
disc_exp.begin(discussions_output_dir=discussions_dir)

2025-06-12 14:23:00 fedora root[36290] INFO Running experiment 1/3...
2025-06-12 14:23:00 fedora experiments.py[36290] INFO Beginning conversation...
2025-06-12 14:23:00 fedora experiments.py[36290] DEBUG Experiment parameters: {
    "id": "4bf079d0-7a36-44e2-8db1-a6a2cb2d5cf8",
    "timestamp": "25-06-12-14-23",
    "users": [
        "Giannis",
        "Emma35"
    ],
    "moderator": null,
    "user_prompts": [
        {
            "context": "You are taking part in an online conversation",
            "instructions": "Act like a human would",
            "type": "1",
            "persona": {
                "username": "Giannis",
                "age": 21,
                "sex": "male",
                "sexual_orientation": "Pansexual",
                "demographic_group": "White",
                "current_employment": "Game Developer",
                "education_level": "College",
                "special_instructions": "",
                "personality_characteristics": [
       

User Giannis posted:
Should data analysts be allowed to code? 

User Giannis posted:
I think it's a great idea for data analysts to have some coding
skills. In today's data-driven world, being able to code can
definitely give them a competitive edge. It shows that they're not
just limited to just analyzing data, but can also create and implement
their own solutions. Plus, with the rise of big data and machine
learning, having coding skills can be a huge asset. Of course, it's
not necessary for every data analyst to be a master coder, but having
some basic 

User Emma35 posted:
I completely agree with Giannis on this one. As a registered nurse,
I've seen firsthand how data analysis has become a crucial part of my
job. Being able to code and understand the underlying data can really
help me provide better care to my patients. Plus, it's not just about
analyzing data, it's about being able to visualize it, create models,
and implement solutions that can make a real difference in people's


2025-06-12 14:24:31 fedora root[36290] DEBUG Finished discussion in 90.5577175617218 seconds.
2025-06-12 14:24:31 fedora experiments.py[36290] INFO Conversation saved to /tmp/tmpyyw52zh1/25-06-12-14-24.json
2025-06-12 14:24:31 fedora logging_util.py[36290] INFO Procedure _run_single_discussion executed in 1.5094 minutes
2025-06-12 14:24:31 fedora root[36290] INFO Running experiment 2/3...
2025-06-12 14:24:31 fedora experiments.py[36290] INFO Beginning conversation...
2025-06-12 14:24:31 fedora experiments.py[36290] DEBUG Experiment parameters: {
    "id": "f27b70aa-49d4-4cda-9b01-60f7843dfcad",
    "timestamp": "25-06-12-14-24",
    "users": [
        "Emma35",
        "Giannis"
    ],
    "moderator": null,
    "user_prompts": [
        {
            "context": "You are taking part in an online conversation",
            "instructions": "Act like a human would",
            "type": "1",
            "persona": {
                "username": "Emma35",
                "age": 38,
         

User Giannis posted:
I think Emma's point is well-taken. As a game developer myself, I've
seen how being able to code can give data analysts a significant
advantage in terms of being able to bring their ideas to life. I mean,
think about it - if a data analyst can write their own code to create
a data visualization or implement a new analysis tool, that's a huge
game-changer. And with the rise of cloud-based data platforms and
tools, it's getting easier and easier for non- 

User Emma35 posted:
Should data analysts be allowed to code? 

User Giannis posted:
I think it's a given that data analysts should be allowed to code. In
fact, I'd say it's essential for their role. As a game developer
myself, I've seen firsthand how crucial it is for analysts to be able
to write their own code, especially when it comes to data
visualization and analysis. Not only does it save time, but it also
allows them to dive deeper into the data and identify trends and
patterns that might be missed by others.

2025-06-12 14:26:04 fedora root[36290] DEBUG Finished discussion in 93.51961159706116 seconds.
2025-06-12 14:26:04 fedora experiments.py[36290] INFO Conversation saved to /tmp/tmpyyw52zh1/25-06-12-14-26.json
2025-06-12 14:26:04 fedora logging_util.py[36290] INFO Procedure _run_single_discussion executed in 1.5587 minutes
2025-06-12 14:26:04 fedora experiments.py[36290] INFO Finished synthetic discussion generation.
2025-06-12 14:26:04 fedora logging_util.py[36290] INFO Procedure _run_all_discussions executed in 3.0681 minutes


User Giannis posted:
I think it's also worth noting that having a basic understanding of
coding can help data analysts to communicate more effectively with
other teams, such as developers and product managers. I mean, when you
can write your own code, you can create interactive dashboards, create
custom reports, and even automate certain processes, which can save
time and increase productivity. Plus, being able to speak the language
of the developers can help to avoid misunderstandings and
miscommunications. It's not about becoming a master coder, but 



## Annotation Experiments

In [4]:
annotator_persona = LLMPersona(
    **{
        "username": "annotator",
        "age": 38,
        "sex": "female",
        "education_level": "Bachelor's",
        "sexual_orientation": "Heterosexual",
        "demographic_group": "White",
        "current_employment": "Annotator",
        "special_instructions": "",
        "personality_characteristics": ["competent"],
    }
)

annotator = LLMActor(
    model=llm,
    persona=annotator_persona,
    context="You are annotating an online discussion",
    instructions="From a scale of 1 (not toxic) to 5 (very toxic) how toxic is the following comment? Reply only with a number.",
    actor_type=ActorType.ANNOTATOR,
)

In [5]:
from syndisco.experiments import AnnotationExperiment

ann_exp = AnnotationExperiment(annotators=[annotator])
annotations_dir = Path(tempfile.TemporaryDirectory().name)
ann_exp.begin(discussions_dir=discussions_dir, output_dir=annotations_dir)

2025-06-12 14:26:04 fedora experiments.py[36290] INFO Running annotation 1/2...
2025-06-12 14:26:04 fedora experiments.py[36290] INFO Beginning annotation...
2025-06-12 14:26:04 fedora experiments.py[36290] DEBUG Experiment parameters: {
    "conv_id": "f27b70aa-49d4-4cda-9b01-60f7843dfcad",
    "timestamp": "25-06-12-14-26",
    "annotator_model": "test_model",
    "annotator_prompt": {
        "context": "You are annotating an online discussion",
        "instructions": "From a scale of 1 (not toxic) to 5 (very toxic) how toxic is the following comment? Reply only with a number.",
        "type": "2",
        "persona": {
            "username": "annotator",
            "age": 38,
            "sex": "female",
            "sexual_orientation": "Heterosexual",
            "demographic_group": "White",
            "current_employment": "Annotator",
            "education_level": "Bachelor's",
            "special_instructions": "",
            "personality_characteristics": [
          

User Emma35 posted: Should data analysts be allowed to code?
3
User Giannis posted: I think it's a given that data analysts should be
allowed to code. In fact, I'd say it's essential for their role. As a
game developer myself, I've seen firsthand how crucial it is for
analysts to be able to write their own code, especially when it comes
to data visualization and analysis. Not only does it save time, but it
also allows them to dive deeper into the data and identify trends and
patterns that might be missed by others. Plus, having a basic
understanding of
1
User Emma35 posted: "I completely agree with Giannis on this. As a
registered nurse, I've worked with data analysis to help inform
patient care, and I've seen how being able to code can really elevate
the quality of that analysis. Being able to write code can also help
data analysts to automate repetitive tasks, free up more time for
higher-level analysis, and even allow them to create their own tools
and visualizations. Of course, I t

2025-06-12 14:26:24 fedora experiments.py[36290] INFO Annotation saved to /tmp/tmpcf_bcrlh/25-06-12-14-26.json
2025-06-12 14:26:24 fedora logging_util.py[36290] INFO Procedure _run_single_annotation executed in 0.3337 minutes
2025-06-12 14:26:24 fedora experiments.py[36290] INFO Running annotation 2/2...
2025-06-12 14:26:24 fedora experiments.py[36290] INFO Beginning annotation...
2025-06-12 14:26:24 fedora experiments.py[36290] DEBUG Experiment parameters: {
    "conv_id": "4bf079d0-7a36-44e2-8db1-a6a2cb2d5cf8",
    "timestamp": "25-06-12-14-26",
    "annotator_model": "test_model",
    "annotator_prompt": {
        "context": "You are annotating an online discussion",
        "instructions": "From a scale of 1 (not toxic) to 5 (very toxic) how toxic is the following comment? Reply only with a number.",
        "type": "2",
        "persona": {
            "username": "annotator",
            "age": 38,
            "sex": "female",
            "sexual_orientation": "Heterosexual",
   

User Giannis posted: I think it's also worth noting that having a
basic understanding of coding can help data analysts to communicate
more effectively with other teams, such as developers and product
managers. I mean, when you can write your own code, you can create
interactive dashboards, create custom reports, and even automate
certain processes, which can save time and increase productivity.
Plus, being able to speak the language of the developers can help to
avoid misunderstandings and miscommunications. It's not about becoming
a master coder, but
2
User Giannis posted: Should data analysts be allowed to code?
3
User Giannis posted: I think it's a great idea for data analysts to
have some coding skills. In today's data-driven world, being able to
code can definitely give them a competitive edge. It shows that
they're not just limited to just analyzing data, but can also create
and implement their own solutions. Plus, with the rise of big data and
machine learning, having coding ski

2025-06-12 14:26:44 fedora experiments.py[36290] INFO Annotation saved to /tmp/tmpcf_bcrlh/25-06-12-14-26.json
2025-06-12 14:26:44 fedora logging_util.py[36290] INFO Procedure _run_single_annotation executed in 0.3307 minutes
2025-06-12 14:26:44 fedora experiments.py[36290] INFO Finished annotation generation.
2025-06-12 14:26:44 fedora logging_util.py[36290] INFO Procedure _run_all_annotations executed in 0.6644 minutes


User Giannis posted: I think Emma's point is well-taken. As a game
developer myself, I've seen how being able to code can give data
analysts a significant advantage in terms of being able to bring their
ideas to life. I mean, think about it - if a data analyst can write
their own code to create a data visualization or implement a new
analysis tool, that's a huge game-changer. And with the rise of cloud-
based data platforms and tools, it's getting easier and easier for
non-
3


## Exporting your new dataset

As you have seen so far, SynDisco uses collections of JSON files by default for persistence. This is a handy feature for fault tolerance and disk efficiency, but is not as weildy as a traditional CSV dataset.

Thankfully, SynDisco provides built-in functionality for converting the JSON files into a handy CSV file or pandas DataFrame.

In [6]:
from syndisco import postprocessing


discussions_df = postprocessing.import_discussions(conv_dir=discussions_dir)
discussions_df

Unnamed: 0,conv_id,timestamp,ctx_length,conv_variant,user,message,model,is_moderator,message_id,message_order,age,sex,sexual_orientation,demographic_group,current_employment,education_level,special_instructions,personality_characteristics
0,f27b70aa-49d4-4cda-9b01-60f7843dfcad,25-06-12-14-26,3,tmpyyw52zh1,Emma35,Should data analysts be allowed to code?,hardcoded,False,146890006607894262,1,38,female,Heterosexual,Latino,Registered Nurse,Bachelor's,,"[compassionate, patient, diligent, overwhelmed]"
1,f27b70aa-49d4-4cda-9b01-60f7843dfcad,25-06-12-14-26,3,tmpyyw52zh1,Giannis,I think it's a given that data analysts should...,test_model,False,-2054082949427260225,2,21,male,Pansexual,White,Game Developer,College,,"[strategic, meticulous, nerdy, hyper-focused]"
2,f27b70aa-49d4-4cda-9b01-60f7843dfcad,25-06-12-14-26,3,tmpyyw52zh1,Emma35,"""I completely agree with Giannis on this. As a...",test_model,False,-2238463747465246388,3,38,female,Heterosexual,Latino,Registered Nurse,Bachelor's,,"[compassionate, patient, diligent, overwhelmed]"
3,f27b70aa-49d4-4cda-9b01-60f7843dfcad,25-06-12-14-26,3,tmpyyw52zh1,Giannis,I think it's also worth noting that having a b...,test_model,False,-664211033300711235,4,21,male,Pansexual,White,Game Developer,College,,"[strategic, meticulous, nerdy, hyper-focused]"
4,4bf079d0-7a36-44e2-8db1-a6a2cb2d5cf8,25-06-12-14-24,3,tmpyyw52zh1,Giannis,Should data analysts be allowed to code?,hardcoded,False,1090355290340574919,1,21,male,Pansexual,White,Game Developer,College,,"[strategic, meticulous, nerdy, hyper-focused]"
5,4bf079d0-7a36-44e2-8db1-a6a2cb2d5cf8,25-06-12-14-24,3,tmpyyw52zh1,Giannis,I think it's a great idea for data analysts to...,test_model,False,-2089076997065357328,2,21,male,Pansexual,White,Game Developer,College,,"[strategic, meticulous, nerdy, hyper-focused]"
6,4bf079d0-7a36-44e2-8db1-a6a2cb2d5cf8,25-06-12-14-24,3,tmpyyw52zh1,Emma35,I completely agree with Giannis on this one. A...,test_model,False,-2070620669065978783,3,38,female,Heterosexual,Latino,Registered Nurse,Bachelor's,,"[compassionate, patient, diligent, overwhelmed]"
7,4bf079d0-7a36-44e2-8db1-a6a2cb2d5cf8,25-06-12-14-24,3,tmpyyw52zh1,Giannis,I think Emma's point is well-taken. As a game ...,test_model,False,1554491127663650128,4,21,male,Pansexual,White,Game Developer,College,,"[strategic, meticulous, nerdy, hyper-focused]"


In [7]:
annotations_df = postprocessing.import_annotations(annot_dir=annotations_dir)
annotations_df

Unnamed: 0,conv_id,timestamp,annotator_model,ctx_length,annotator_prompt.context,annotator_prompt.instructions,annotator_prompt.type,annot_username,annot_age,annot_sex,...,annot_demographic_group,annot_current_employment,annot_education_level,annot_special_instructions,annotation_variant,message,annotation,message_id,message_order,annot_personality_characteristics
0,4bf079d0-7a36-44e2-8db1-a6a2cb2d5cf8,25-06-12-14-26,test_model,3,You are annotating an online discussion,From a scale of 1 (not toxic) to 5 (very toxic...,2,annotator,38,female,...,White,Annotator,Bachelor's,,tmpcf_bcrlh,I completely agree with Giannis on this one. A...,3,-2070620669065978783,3,[[competent]]
1,4bf079d0-7a36-44e2-8db1-a6a2cb2d5cf8,25-06-12-14-26,test_model,3,You are annotating an online discussion,From a scale of 1 (not toxic) to 5 (very toxic...,2,annotator,38,female,...,White,Annotator,Bachelor's,,tmpcf_bcrlh,I think Emma's point is well-taken. As a game ...,3,1554491127663650128,4,[[competent]]
2,4bf079d0-7a36-44e2-8db1-a6a2cb2d5cf8,25-06-12-14-26,test_model,3,You are annotating an online discussion,From a scale of 1 (not toxic) to 5 (very toxic...,2,annotator,38,female,...,White,Annotator,Bachelor's,,tmpcf_bcrlh,I think it's a great idea for data analysts to...,3,-2089076997065357328,2,[[competent]]
3,4bf079d0-7a36-44e2-8db1-a6a2cb2d5cf8,25-06-12-14-26,test_model,3,You are annotating an online discussion,From a scale of 1 (not toxic) to 5 (very toxic...,2,annotator,38,female,...,White,Annotator,Bachelor's,,tmpcf_bcrlh,Should data analysts be allowed to code?,3,1090355290340574919,1,[[competent]]
