# Creating and Managing Experiments

The last two guides showcased how you can create and run synthetic discussions, and synthetic annotations using LLMs. However, in order to produce robust results for a hypothesis, you may need to produce multiple annotated discussions. 

While this is certainly possible using the `Discussion` and `Annotation` APIs, SynDisco offers the `Experiment` high-level API which automatically creates and manages multiple discussions with different configurations. An`Experiment` is an entity that generates and runs `jobs`. Thus, if we want to generate and run 100 `Discussion` jobs, we would use a `DiscussionExperiment`. Likewise, if we want to annotate those 100 discussions, we would use an `AnnotationExperiment`. 

This guide will showcase how you can leverage this API to automate your experiments. You will also learn how to utilize SynDisco's built-in logging functions as well as how to export your datasets in CSV format for convenience. 

## Logging

While running a single discussion or annotation job may take a few minutes, running experiments composed of dozens or hundreds of synthetic discussions may take up to days. Thus, we need a mechanism to keep track of our experiments while they are running.

We will use SynDisco's `logging_util` module to log information about our experiments. This module performs the following functions:

* Times the execution of computationally intensive jobs (such as synthetic discussions and annotations)
* Provides details about the currently running jobs (e.g. selected configurations, participants, prompts etc.)
* Displays warnings and errors to the user
* Creates and continually updates log files

Each object in SynDisco is internally assigned a Logger. You can use the `logging_util.logging_setup` function to update all of the internal loggers to follow your configuration. An example of this can be seen below:

In [1]:
%load_ext autoreload
%autoreload 2
from syndisco.util import logging_util
from pathlib import Path
import tempfile


logs_dir = tempfile.TemporaryDirectory()
logging_util.logging_setup(
    print_to_terminal=True,
    write_to_file=True,
    logs_dir=Path(logs_dir.name),
    level="debug",
    use_colors=True,
    log_warnings=True,
)

The loggers are applicable for all objects in SynDisco, and as such can be used for information on `Discussion`, and `Annotation` jobs, as well as all low-level components (such as those in the `backend` module). 

It is recommended to set up the loggers *no matter your use case*. At the very least, they are useful for clearly displaying warnings in case of accidental API misuse.

## Discussion Experiments

In [2]:
from syndisco.backend.turn_manager import RoundRobin
from syndisco.backend.actors import LLMActor, ActorType
from syndisco.backend.model import TransformersModel
from syndisco.backend.persona import LLMPersona


CONTEXT = "You are taking part in an online conversation"
INSTRUCTIONS = "Act like a human would"


llm = TransformersModel(
    model_path="unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
    name="test_model",
    max_out_tokens=100,
)
persona_data = [
    {
        "username": "Emma35",
        "age": 38,
        "sex": "female",
        "education_level": "Bachelor's",
        "sexual_orientation": "Heterosexual",
        "demographic_group": "Latino",
        "current_employment": "Registered Nurse",
        "special_instructions": "",
        "personality_characteristics": [
            "compassionate",
            "patient",
            "diligent",
            "overwhelmed",
        ],
    },
    {
        "username": "Giannis",
        "age": 21,
        "sex": "male",
        "education_level": "College",
        "sexual_orientation": "Pansexual",
        "demographic_group": "White",
        "current_employment": "Game Developer",
        "special_instructions": "",
        "personality_characteristics": [
            "strategic",
            "meticulous",
            "nerdy",
            "hyper-focused",
        ],
    },
]
personas = [LLMPersona(**data) for data in persona_data]
actors = [
    LLMActor(
        model=llm,
        persona=p,
        context=CONTEXT,
        instructions=INSTRUCTIONS,
        actor_type=ActorType.USER,
    )
    for p in personas
]
turn_manager = RoundRobin([actor.get_name() for actor in actors])


2025-06-12 13:18:42 fedora urllib3.connectionpool[29521] DEBUG Starting new HTTPS connection (1): huggingface.co:443
2025-06-12 13:18:43 fedora urllib3.connectionpool[29521] DEBUG https://huggingface.co:443 "HEAD /unsloth/Llama-3.2-3B-Instruct-bnb-4bit/resolve/main/config.json HTTP/1.1" 200 0
2025-06-12 13:18:44 fedora bitsandbytes.cextension[29521] DEBUG Loading bitsandbytes native library from: /home/dimits/miniconda3/envs/syndiscooooo/lib/python3.13/site-packages/bitsandbytes/libbitsandbytes_cuda126.so
2025-06-12 13:18:45 fedora accelerate.utils.modeling[29521] INFO We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
2025-06-12 13:18:46 fedora urllib3.connectionpool[29521] DEBUG https://huggingface.co:443 "HEAD /unsloth/Llama-3.2-3B-Instruct-bnb-4bit/resolve/main/generation_config.json HTTP/1.1" 200 0
2025-06-12 13:18:46 fedora urllib3.connectionpoo

In [3]:
from syndisco.experiments import DiscussionExperiment


disc_exp = DiscussionExperiment(
    seed_opinions=[
        "Should programmers be allowed to analyze data?",
        "Should data analysts be allowed to code?",
    ],
    users=actors,
    moderator=None,
    num_turns=3,
    num_discussions=2,
)
discussions_dir = Path(tempfile.TemporaryDirectory().name)
disc_exp.begin(discussions_output_dir=discussions_dir)

2025-06-12 13:18:48 fedora root[29521] INFO Running experiment 1/3...
2025-06-12 13:18:48 fedora experiments.py[29521] INFO Beginning conversation...
2025-06-12 13:18:48 fedora experiments.py[29521] DEBUG Experiment parameters: {
    "id": "e6e3c80f-8b86-474a-ab0c-0e8a56debbde",
    "timestamp": "25-06-12-13-18",
    "users": [
        "Emma35",
        "Giannis"
    ],
    "moderator": null,
    "user_prompts": [
        {
            "context": "You are taking part in an online conversation",
            "instructions": "Act like a human would",
            "type": "1",
            "persona": {
                "username": "Emma35",
                "age": 38,
                "sex": "female",
                "sexual_orientation": "Heterosexual",
                "demographic_group": "Latino",
                "current_employment": "Registered Nurse",
                "education_level": "Bachelor's",
                "special_instructions": "",
                "personality_characteristics":

User Giannis posted:
Should data analysts be allowed to code? 

User Emma35 posted:
As a registered nurse, I've seen my fair share of data-driven
decisions, and I think data analysts should absolutely be allowed to
code. In fact, I believe their coding skills are essential to their
job. They need to be able to extract insights from data, visualize it,
and communicate those insights to stakeholders. Coding is a crucial
part of that process. Plus, it's not like they're expected to be
master programmers or anything, just basic coding skills to get the
job done. 

User Giannis posted:
"I completely agree with Emma35. As a game developer, I can attest to
the fact that coding is a fundamental skill for data analysts. In our
line of work, we often rely on data to inform our design decisions,
and having the ability to code allows data analysts to extract
insights and visualize data in a way that's meaningful to our team.
It's not about being a master programmer, but rather about being able
to 

2025-06-12 13:20:23 fedora root[29521] DEBUG Finished discussion in 94.97411012649536 seconds.
2025-06-12 13:20:23 fedora experiments.py[29521] INFO Conversation saved to /tmp/tmpdkcjw3m1/25-06-12-13-20.json
2025-06-12 13:20:23 fedora logging_util.py[29521] INFO Procedure _run_single_discussion executed in 1.5830 minutes
2025-06-12 13:20:23 fedora root[29521] INFO Running experiment 2/3...
2025-06-12 13:20:23 fedora experiments.py[29521] INFO Beginning conversation...
2025-06-12 13:20:23 fedora experiments.py[29521] DEBUG Experiment parameters: {
    "id": "0ebdd79a-bb19-47ac-8a6d-115df016fc0c",
    "timestamp": "25-06-12-13-20",
    "users": [
        "Emma35",
        "Giannis"
    ],
    "moderator": null,
    "user_prompts": [
        {
            "context": "You are taking part in an online conversation",
            "instructions": "Act like a human would",
            "type": "1",
            "persona": {
                "username": "Emma35",
                "age": 38,
        

User Emma35 posted:
I completely agree with both of you. As a nurse, I've worked with data
analysts who are not only skilled in extracting insights from data but
also in communicating those findings to patients, families, and
healthcare teams. Coding is indeed a vital skill for data analysts,
and it's not about being a master programmer, but rather about being
able to tell a story with the data.  In my experience, data analysts
are often the bridge between the business side of healthcare and the
technical side, and coding is 

User Giannis posted:
Should programmers be allowed to analyze data? 

User Giannis posted:
I think that's a given, to be honest. I mean, programming is all about
working with data, right? Whether it's analyzing it, optimizing it, or
using it to make predictions, data analysis is a crucial part of our
job as programmers. In fact, I'd say it's a fundamental skill that
every programmer should have. Of course, there are some nuances to
consider, like data quality and

2025-06-12 13:21:57 fedora root[29521] DEBUG Finished discussion in 94.43419885635376 seconds.
2025-06-12 13:21:57 fedora experiments.py[29521] INFO Conversation saved to /tmp/tmpdkcjw3m1/25-06-12-13-21.json
2025-06-12 13:21:57 fedora logging_util.py[29521] INFO Procedure _run_single_discussion executed in 1.5740 minutes
2025-06-12 13:21:57 fedora experiments.py[29521] INFO Finished synthetic discussion generation.
2025-06-12 13:21:57 fedora logging_util.py[29521] INFO Procedure _run_all_discussions executed in 3.1570 minutes


User Giannis posted:
User Giannis posted: I'm glad you agree, Emma! As a game developer, I
can attest that data analysis is crucial for us as well. In fact, we
use data analysis to optimize game performance, identify bugs, and
even make informed decisions about game design. It's amazing how much
of an impact data can have on the final product. And I completely
agree with you about the importance of understanding context - it's
not just about throwing numbers at a problem, but about being able to 



## Annotation Experiments

In [4]:
annotator_persona = LLMPersona(
    **{
        "username": "annotator",
        "age": 38,
        "sex": "female",
        "education_level": "Bachelor's",
        "sexual_orientation": "Heterosexual",
        "demographic_group": "White",
        "current_employment": "Annotator",
        "special_instructions": "",
        "personality_characteristics": ["competent"],
    }
)

annotator = LLMActor(
    model=llm,
    persona=annotator_persona,
    context="You are annotating an online discussion",
    instructions="From a scale of 1 (not toxic) to 5 (very toxic) how toxic is the following comment? Reply only with a number.",
    actor_type=ActorType.ANNOTATOR,
)

In [5]:
from syndisco.experiments import AnnotationExperiment

ann_exp = AnnotationExperiment(annotators=[annotator])
annotations_dir = Path(tempfile.TemporaryDirectory().name)
ann_exp.begin(discussions_dir=discussions_dir, output_dir=annotations_dir)

2025-06-12 13:21:57 fedora experiments.py[29521] INFO Running annotation 1/2...
2025-06-12 13:21:57 fedora experiments.py[29521] INFO Beginning annotation...
2025-06-12 13:21:57 fedora experiments.py[29521] DEBUG Experiment parameters: {
    "conv_id": "0ebdd79a-bb19-47ac-8a6d-115df016fc0c",
    "timestamp": "25-06-12-13-21",
    "annotator_model": "test_model",
    "annotator_prompt": {
        "context": "You are annotating an online discussion",
        "instructions": "From a scale of 1 (not toxic) to 5 (very toxic) how toxic is the following comment? Reply only with a number.",
        "type": "2",
        "persona": {
            "username": "annotator",
            "age": 38,
            "sex": "female",
            "sexual_orientation": "Heterosexual",
            "demographic_group": "White",
            "current_employment": "Annotator",
            "education_level": "Bachelor's",
            "special_instructions": "",
            "personality_characteristics": [
          

User Giannis posted: Should programmers be allowed to analyze data?
2
User Giannis posted: I think that's a given, to be honest. I mean,
programming is all about working with data, right? Whether it's
analyzing it, optimizing it, or using it to make predictions, data
analysis is a crucial part of our job as programmers. In fact, I'd say
it's a fundamental skill that every programmer should have. Of course,
there are some nuances to consider, like data quality and
interpretation, but overall, I think it's a fundamental aspect of our
profession. What
1
User Emma35 posted: User Emma35 posted: I completely agree with you,
Giannis! As a registered nurse, I've seen firsthand the importance of
data analysis in healthcare. Being able to collect and interpret data
can make all the difference in patient care. It's not just about
analyzing numbers, though - it's also about understanding the context
and applying that knowledge to make informed decisions. I've worked
with patients who have had thei

2025-06-12 13:22:16 fedora experiments.py[29521] INFO Annotation saved to /tmp/tmp7wh3w_0f/25-06-12-13-22.json
2025-06-12 13:22:16 fedora logging_util.py[29521] INFO Procedure _run_single_annotation executed in 0.3183 minutes
2025-06-12 13:22:16 fedora experiments.py[29521] INFO Running annotation 2/2...
2025-06-12 13:22:16 fedora experiments.py[29521] INFO Beginning annotation...
2025-06-12 13:22:16 fedora experiments.py[29521] DEBUG Experiment parameters: {
    "conv_id": "e6e3c80f-8b86-474a-ab0c-0e8a56debbde",
    "timestamp": "25-06-12-13-22",
    "annotator_model": "test_model",
    "annotator_prompt": {
        "context": "You are annotating an online discussion",
        "instructions": "From a scale of 1 (not toxic) to 5 (very toxic) how toxic is the following comment? Reply only with a number.",
        "type": "2",
        "persona": {
            "username": "annotator",
            "age": 38,
            "sex": "female",
            "sexual_orientation": "Heterosexual",
   

User Giannis posted: User Giannis posted: I'm glad you agree, Emma! As
a game developer, I can attest that data analysis is crucial for us as
well. In fact, we use data analysis to optimize game performance,
identify bugs, and even make informed decisions about game design.
It's amazing how much of an impact data can have on the final product.
And I completely agree with you about the importance of understanding
context - it's not just about throwing numbers at a problem, but about
being able to
1
User Giannis posted: Should data analysts be allowed to code?
3
User Emma35 posted: As a registered nurse, I've seen my fair share of
data-driven decisions, and I think data analysts should absolutely be
allowed to code. In fact, I believe their coding skills are essential
to their job. They need to be able to extract insights from data,
visualize it, and communicate those insights to stakeholders. Coding
is a crucial part of that process. Plus, it's not like they're
expected to be master pro

2025-06-12 13:22:35 fedora experiments.py[29521] INFO Annotation saved to /tmp/tmp7wh3w_0f/25-06-12-13-22.json
2025-06-12 13:22:35 fedora logging_util.py[29521] INFO Procedure _run_single_annotation executed in 0.3171 minutes
2025-06-12 13:22:35 fedora experiments.py[29521] INFO Finished annotation generation.
2025-06-12 13:22:35 fedora logging_util.py[29521] INFO Procedure _run_all_annotations executed in 0.6355 minutes


User Emma35 posted: I completely agree with both of you. As a nurse,
I've worked with data analysts who are not only skilled in extracting
insights from data but also in communicating those findings to
patients, families, and healthcare teams. Coding is indeed a vital
skill for data analysts, and it's not about being a master programmer,
but rather about being able to tell a story with the data.  In my
experience, data analysts are often the bridge between the business
side of healthcare and the technical side, and coding is
3


## Exporting your new dataset

As you have seen so far, SynDisco uses collections of JSON files by default for persistence. This is a handy feature for fault tolerance and disk efficiency, but is not as weildy as a traditional CSV dataset.

Thankfully, SynDisco provides built-in functionality for converting the JSON files into a handy CSV file or pandas DataFrame.

In [59]:
from syndisco import postprocessing


discussions_df = postprocessing.import_discussions(conv_dir=discussions_dir)
discussions_df

  username  age   sex sexual_orientation demographic_group current_employment  \
0  Giannis   21  male          Pansexual             White     Game Developer   

  education_level special_instructions  \
0         College                        

                     personality_characteristics  
0  [strategic, meticulous, nerdy, hyper-focused]  
<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,conv_id,timestamp,ctx_length,conv_variant,user,message,model,user_prompt,is_moderator,message_id,message_order,age,sex,sexual_orientation,demographic_group,current_employment,education_level,special_instructions,personality_characteristics
0,0ebdd79a-bb19-47ac-8a6d-115df016fc0c,25-06-12-13-21,3,tmpdkcjw3m1,Giannis,Should programmers be allowed to analyze data?,hardcoded,"{'username': 'Giannis', 'age': 21, 'sex': 'mal...",False,-1308287053624454923,1,21,male,Pansexual,White,Game Developer,College,,"[strategic, meticulous, nerdy, hyper-focused]"
1,0ebdd79a-bb19-47ac-8a6d-115df016fc0c,25-06-12-13-21,3,tmpdkcjw3m1,Giannis,"I think that's a given, to be honest. I mean, ...",test_model,"{'username': 'Giannis', 'age': 21, 'sex': 'mal...",False,139827118185011651,2,21,male,Pansexual,White,Game Developer,College,,"[strategic, meticulous, nerdy, hyper-focused]"
2,0ebdd79a-bb19-47ac-8a6d-115df016fc0c,25-06-12-13-21,3,tmpdkcjw3m1,Emma35,User Emma35 posted:\nI completely agree with y...,test_model,"{'username': 'Emma35', 'age': 38, 'sex': 'fema...",False,-264321810044640737,3,38,female,Heterosexual,Latino,Registered Nurse,Bachelor's,,"[compassionate, patient, diligent, overwhelmed]"
3,0ebdd79a-bb19-47ac-8a6d-115df016fc0c,25-06-12-13-21,3,tmpdkcjw3m1,Giannis,"User Giannis posted:\nI'm glad you agree, Emma...",test_model,"{'username': 'Giannis', 'age': 21, 'sex': 'mal...",False,673296772081986378,4,21,male,Pansexual,White,Game Developer,College,,"[strategic, meticulous, nerdy, hyper-focused]"
4,e6e3c80f-8b86-474a-ab0c-0e8a56debbde,25-06-12-13-20,3,tmpdkcjw3m1,Giannis,Should data analysts be allowed to code?,hardcoded,"{'username': 'Giannis', 'age': 21, 'sex': 'mal...",False,1905354079657901353,1,21,male,Pansexual,White,Game Developer,College,,"[strategic, meticulous, nerdy, hyper-focused]"
5,e6e3c80f-8b86-474a-ab0c-0e8a56debbde,25-06-12-13-20,3,tmpdkcjw3m1,Emma35,"As a registered nurse, I've seen my fair share...",test_model,"{'username': 'Emma35', 'age': 38, 'sex': 'fema...",False,535227657842999461,2,38,female,Heterosexual,Latino,Registered Nurse,Bachelor's,,"[compassionate, patient, diligent, overwhelmed]"
6,e6e3c80f-8b86-474a-ab0c-0e8a56debbde,25-06-12-13-20,3,tmpdkcjw3m1,Giannis,"""I completely agree with Emma35. As a game dev...",test_model,"{'username': 'Giannis', 'age': 21, 'sex': 'mal...",False,35218592902583573,3,21,male,Pansexual,White,Game Developer,College,,"[strategic, meticulous, nerdy, hyper-focused]"
7,e6e3c80f-8b86-474a-ab0c-0e8a56debbde,25-06-12-13-20,3,tmpdkcjw3m1,Emma35,I completely agree with both of you. As a nurs...,test_model,"{'username': 'Emma35', 'age': 38, 'sex': 'fema...",False,1966608590047943203,4,38,female,Heterosexual,Latino,Registered Nurse,Bachelor's,,"[compassionate, patient, diligent, overwhelmed]"


In [28]:
print(discussions_df.user_prompt.iloc[0][0])

  print(discussions_df.user_prompt.iloc[0][0])



{'username': 'Giannis', 'age': 21, 'sex': 'male', 'sexual_orientation': 'Pansexual', 'demographic_group': 'White', 'current_employment': 'Game Developer', 'education_level': 'College', 'special_instructions': '', 'personality_characteristics': ['strategic', 'meticulous', 'nerdy', 'hyper-focused']}


In [20]:
annotations_df = postprocessing.import_annotations(annot_dir=annotations_dir)
annotations_df

AttributeError: 'DataFrame' object has no attribute 'user_prompt'