# Customizing Text Generation in TwinWeaver

This tutorial demonstrates how to customize **every textual component** of the instruction generation pipeline. TwinWeaver provides extensive configuration options to tailor the generated text prompts to your specific use case, language preferences, or model requirements.

We will cover:
1. **Preamble & Introduction Text** - Customizing the opening text of patient records
2. **Demographics Section** - Modifying how constant/static data is introduced
3. **Event Day Formatting** - Changing how visit days and time intervals are described
4. **Time Units** - Switching between days and weeks
5. **Genetic Data Formatting** - Customizing genetic event tags and text
6. **Forecasting Prompts** - Modifying value prediction task descriptions
7. **Time-to-Event Prompts** - Customizing survival/event prediction text
8. **QA/Binning Prompts** - Changing quality assurance task descriptions
9. **Multi-Task Prompts** - Customizing multi-task instruction formatting
10. **Event Category Overrides** - Fine-grained control over specific event types

In [None]:
import pandas as pd

from twinweaver import (
    DataManager,
    Config,
    DataSplitterForecasting,
    DataSplitterEvents,
    ConverterInstruction,
    DataSplitter,
)

## Load Example Data

First, let's load the example data to use throughout this tutorial.

In [None]:
# Load data - generated example data
df_events = pd.read_csv("../../example_data/events.csv")
df_constant = pd.read_csv("../../example_data/constant.csv")
df_constant_description = pd.read_csv("../../example_data/constant_description.csv")

## Part 1: Default Configuration

Let's first see the **default** text generation to understand what we're customizing. We'll set up a minimal config and generate an example.

In [None]:
# Create default config
config_default = Config()

# Required settings for instruction mode
config_default.split_event_category = "lot"
config_default.event_category_forecast = ["lab"]
config_default.data_splitter_events_variables_category_mapping = {
    "death": "death",
    "progression": "next progression",
}
config_default.constant_columns_to_use = ["birthyear", "gender", "histology", "smoking_history"]
config_default.constant_birthdate_column = "birthyear"

In [None]:
# Setup data manager with default config
dm_default = DataManager(config=config_default)
dm_default.load_indication_data(
    df_events=df_events, df_constant=df_constant, df_constant_description=df_constant_description
)
dm_default.process_indication_data()
dm_default.setup_unique_mapping_of_events()
dm_default.setup_dataset_splits()
dm_default.infer_var_types()

In [None]:
# Setup splitters and converter
data_splitter_events_default = DataSplitterEvents(dm_default, config=config_default)
data_splitter_events_default.setup_variables()

data_splitter_forecasting_default = DataSplitterForecasting(data_manager=dm_default, config=config_default)
data_splitter_forecasting_default.setup_statistics()

data_splitter_default = DataSplitter(data_splitter_events_default, data_splitter_forecasting_default)

converter_default = ConverterInstruction(
    nr_tokens_budget_total=8192,
    config=config_default,
    dm=dm_default,
    variable_stats=data_splitter_forecasting_default.variable_stats,
)

In [None]:
# Generate example with default text
patientid = dm_default.all_patientids[4]
patient_data_default = dm_default.get_patient_data(patientid)

forecasting_splits, events_splits, reference_dates = data_splitter_default.get_splits_from_patient_with_target(
    patient_data_default,
)

p_converted_default = converter_default.forward_conversion(
    forecasting_splits=forecasting_splits[0],
    event_splits=events_splits[0],
    override_mode_to_select_forecasting="both",
)

print("=" * 80)
print("DEFAULT INSTRUCTION OUTPUT:")
print("=" * 80)
print(p_converted_default["instruction"])

---

## Part 2: Fully Customized Text Generation

Now let's create a **completely customized** configuration, changing every textual element. This demonstrates all the available customization options.

### 2.1 Preamble and Introduction Text

The `preamble_text` is the very first text that appears in the generated instruction, introducing the patient record format.

In [None]:
# Create custom config
config_custom = Config()

# Required settings
config_custom.split_event_category = "lot"
config_custom.event_category_forecast = ["lab"]
config_custom.data_splitter_events_variables_category_mapping = {
    "death": "mortality",  # Custom name for death event
    "progression": "disease progression",  # Custom name for progression
}
config_custom.constant_columns_to_use = ["birthyear", "gender", "histology", "smoking_history"]
config_custom.constant_birthdate_column = "birthyear"

# ============================================================================
# CUSTOMIZING PREAMBLE TEXT
# ============================================================================
# This is the opening text that introduces the patient record
config_custom.preamble_text = (
    "üìã ELECTRONIC HEALTH RECORD SUMMARY\n"
    "This document contains a chronological summary of a patient's medical journey. "
    "The record begins with baseline demographics, followed by a timeline of clinical encounters. "
    "Laboratory values use standardized LOINC coding."
)

### 2.2 Demographics Section Text

The `constant_text` introduces the static/demographic data section.

In [None]:
# ============================================================================
# CUSTOMIZING DEMOGRAPHICS SECTION
# ============================================================================
# Text that introduces the demographics/constant data
config_custom.constant_text = "\n\nüë§ PATIENT DEMOGRAPHICS:\n"

### 2.3 Event Day Formatting

These settings control how clinical visits and time intervals are described.

In [None]:
# ============================================================================
# CUSTOMIZING EVENT DAY TEXT
# ============================================================================

# Text for the first visit/encounter
config_custom.first_day_text = "\nüè• INITIAL ENCOUNTER:\nDuring the baseline visit, the following was documented:\n"

# Preamble before each subsequent visit (appears before the time delta)
config_custom.event_day_preamble = "\nüìÖ "

# Text describing time elapsed since previous visit
# Note: {unit} placeholder is used by set_delta_time_unit(), or set directly
config_custom.event_day_text = " weeks after the previous encounter, a follow-up visit recorded:\n"

# Text appended after listing events for a day
config_custom.post_event_text = ".\n"

### 2.4 Time Units

You can switch between `days` and `weeks` for time intervals. Use `set_delta_time_unit()` to update all related prompts automatically.

In [None]:
# ============================================================================
# CUSTOMIZING TIME UNITS
# ============================================================================
# Option 1: Use the helper method (updates all time-related prompts)
# config_custom.set_delta_time_unit("days", unit_sing="day")

# Option 2: Set directly (if you want different phrasing)
config_custom.delta_time_unit = "weeks"

# The time unit appears in several prompts - you can customize each:
config_custom.forecasting_prompt_var_time = " over the upcoming weeks "

### 2.5 Genetic Data Formatting

Control how genetic/molecular data is tagged and displayed.

In [None]:
# ============================================================================
# CUSTOMIZING GENETIC DATA FORMATTING
# ============================================================================

# Tags used to wrap genetic information in the text
config_custom.genetic_tag_opening = "[MOLECULAR: "
config_custom.genetic_tag_closing = "]"

# Text shown when no genetic data is available
config_custom.genetic_empty_text = "üß¨ No molecular/genetic testing data available."

# Value to skip when converting genetic events (often 'present' is implied)
config_custom.genetic_skip_text_value = "present"

### 2.6 Forecasting Prompts (Value Prediction)

These settings control the task prompts for predicting future values.

In [None]:
# ============================================================================
# CUSTOMIZING FORECASTING PROMPTS
# ============================================================================

# Main forecasting task prompt
config_custom.forecasting_fval_prompt_start = (
    "\nüîÆ PREDICTION TASK - LABORATORY VALUES:\n"
    "Based on the patient history above, predict the expected values for the following "
    "laboratory parameters at each future time point:\n"
)

# Summary section introducing last known values
config_custom.forecasting_prompt_summarized_start = "\nüìä REFERENCE VALUES (most recent measurements):\n"

# Text used when first day is overridden/truncated
config_custom.forecasting_firstday_override = (
    "\n‚ö†Ô∏è Note: Some early events may have been omitted due to context limits. Available history begins with:\n"
)

# Summary of last observed genetic events
config_custom.forecasting_prompt_summarized_genetic = "\n\nüß¨ MOLECULAR STATUS (last observed):\n"

# Summary of most recent treatment line
config_custom.forecasting_prompt_summarized_lot = "\nüíä CURRENT TREATMENT REGIMEN:\n"

### 2.7 Time-to-Event Prompts (Survival Analysis)

These settings control the task prompts for predicting whether events occur within a time horizon.

In [None]:
# ============================================================================
# CUSTOMIZING TIME-TO-EVENT PROMPTS
# ============================================================================

# Start of TTE prediction prompt
config_custom.forecasting_tte_prompt_start = (
    "\n‚è±Ô∏è OUTCOME PREDICTION TASK:\nDetermine whether follow-up data was censored (incomplete) "
)

# Middle section specifying time horizon
config_custom.forecasting_tte_prompt_mid = " weeks from the last documented visit, and whether the event occurred: "

# End section with output format instructions
config_custom.forecasting_tte_prompt_end = (
    ".\nüìù Format your response as: 'PREDICTION: [event_name] - Censored: [YES/NO], Occurred: [YES/NO]'"
)

# Target/answer formatting
config_custom.target_prompt_start = "\nPREDICTION: {event_name} - "
config_custom.target_prompt_censor_true = "Censored: YES."
config_custom.target_prompt_censor_false = "Censored: NO, "
config_custom.target_prompt_before_occur = ""
config_custom.target_prompt_occur = "Occurred: YES."
config_custom.target_prompt_not_occur = "Occurred: NO."

### 2.8 QA/Binning Prompts

These settings control the quality assurance task that predicts value bins.

In [None]:
# ============================================================================
# CUSTOMIZING QA/BINNING PROMPTS
# ============================================================================

# QA task prompt
config_custom.qa_prompt_start = (
    "\nüìä CLASSIFICATION TASK - VALUE RANGES:\n"
    "For each variable below, predict which range (bin) the future value will fall into "
    "at each time point:"
)

# Text introducing available bins
config_custom.qa_bins_start = "\t‚û°Ô∏è Available categories: "

### 2.9 Multi-Task Prompts

When multiple tasks are combined in one prompt, these settings control the formatting.

In [None]:
# ============================================================================
# CUSTOMIZING MULTI-TASK PROMPTS
# ============================================================================

# Introduction to multi-task section
config_custom.task_prompt_start = (
    "\n" + "==================================" + "\n"
    "üìã MULTI-TASK INSTRUCTIONS\n"
    "Complete each task below. Label each response with the task number.\n"
    "==================================" + "\n\n"
)

# Template for each task introduction
config_custom.task_prompt_each_task = "üìå TASK #{task_nr}: "

# End of task prompts section
config_custom.task_prompt_end = "\n" + "-" * 50 + "\n"

# Task type labels
config_custom.task_prompt_forecasting = "Value Forecasting"
config_custom.task_prompt_forecasting_qa = "Value Range Classification"
config_custom.task_prompt_events = "Outcome Prediction"
config_custom.task_prompt_custom = "Custom Analysis"

# Target/answer formatting for multi-task
config_custom.task_target_start = "\n‚úÖ TASK #{task_nr} RESPONSE: "
config_custom.task_target_end = "\n"

### 2.10 Event Category Overrides

For fine-grained control, you can override how specific event categories or individual events are rendered.

In [None]:
# ============================================================================
# CUSTOMIZING EVENT CATEGORY PREAMBLES
# ============================================================================
# Override the introductory text for specific event categories
# Structure: {event_category: preamble_string}

config_custom.event_category_preamble_mapping_override = {
    "lab": "üî¨ Laboratory Results: ",
    "drug": "üíä Medications: ",
    "condition": "ü©∫ Diagnoses/Conditions: ",
    "lot": "üìã Treatment Line: ",
    "vitals": "üìà Vital Signs: ",
}

In [None]:
# ============================================================================
# CUSTOMIZING SPECIFIC EVENT RENDERING
# ============================================================================
# Override how specific events are rendered in text
# Structure: {event_category: {event_name: {"full_replacement_string": str, "reverse_string_value": str}}}

# This allows complete control over how individual events appear in the generated text
config_custom.event_category_and_name_replace_override = {
    "death": {
        "death": {
            "full_replacement_string": "‚ö†Ô∏è PATIENT DECEASED",
            "reverse_string_value": "deceased",
        }
    },
    "progression": {
        "progression": {
            "full_replacement_string": "üìà Disease progression documented",
            "reverse_string_value": "progressed",
        }
    },
}

### 2.11 Additional Text Formatting Options

In [None]:
# ============================================================================
# ADDITIONAL FORMATTING OPTIONS
# ============================================================================

# Number of decimal places for numeric values
config_custom.decimal_precision = 1

# Whether to always include the first visit (even with token constraints)
config_custom.always_keep_first_visit = True

---

## Part 3: Generate Output with Custom Text

Now let's set up the pipeline with our customized config and see the difference.

In [None]:
# Setup data manager with custom config
dm_custom = DataManager(config=config_custom)
dm_custom.load_indication_data(
    df_events=df_events, df_constant=df_constant, df_constant_description=df_constant_description
)
dm_custom.process_indication_data()
dm_custom.setup_unique_mapping_of_events()
dm_custom.setup_dataset_splits()
dm_custom.infer_var_types()

In [None]:
# Setup splitters and converter with custom config
data_splitter_events_custom = DataSplitterEvents(dm_custom, config=config_custom)
data_splitter_events_custom.setup_variables()

data_splitter_forecasting_custom = DataSplitterForecasting(data_manager=dm_custom, config=config_custom)
data_splitter_forecasting_custom.setup_statistics()

data_splitter_custom = DataSplitter(data_splitter_events_custom, data_splitter_forecasting_custom)

converter_custom = ConverterInstruction(
    nr_tokens_budget_total=8192,
    config=config_custom,
    dm=dm_custom,
    variable_stats=data_splitter_forecasting_custom.variable_stats,
)

In [None]:
# Generate example with custom text
patientid = dm_custom.all_patientids[4]
patient_data_custom = dm_custom.get_patient_data(patientid)

forecasting_splits_custom, events_splits_custom, reference_dates_custom = (
    data_splitter_custom.get_splits_from_patient_with_target(patient_data_custom)
)

p_converted_custom = converter_custom.forward_conversion(
    forecasting_splits=forecasting_splits_custom[0],
    event_splits=events_splits_custom[0],
    override_mode_to_select_forecasting="both",
)

print("=" * 80)
print("CUSTOMIZED INSTRUCTION OUTPUT:")
print("=" * 80)
print(p_converted_custom["instruction"])

In [None]:
print("=" * 80)
print("CUSTOMIZED ANSWER OUTPUT:")
print("=" * 80)
print(p_converted_custom["answer"])

---

## Part 4: Quick Reference - All Text Configuration Options

Here's a comprehensive table of all text customization options available in `Config`:

| Setting | Description | Default |
|---------|-------------|---------|
| **Patient Record Introduction** | | |
| `preamble_text` | Opening text introducing the patient record | "The following is a patient..." |
| `constant_text` | Text introducing demographics section | "\n\nStarting with demographic data:\n" |
| **Visit/Event Day Text** | | |
| `first_day_text` | Text for the first visit | "\nOn the first visit..." |
| `event_day_preamble` | Preamble before subsequent visits | "\n" |
| `event_day_text` | Text for subsequent visits with time delta | " weeks later..." |
| `post_event_text` | Text after listing day's events | ".\n" |
| **Time Units** | | |
| `delta_time_unit` | Time unit for intervals | "weeks" |
| `forecasting_prompt_var_time` | Time description in forecasting | " the future weeks " |
| **Genetic Data** | | |
| `genetic_tag_opening` | Opening tag for genetic data | "<genetic>" |
| `genetic_tag_closing` | Closing tag for genetic data | "</genetic>" |
| `genetic_empty_text` | Text when no genetic data | "No genetic data available." |
| `genetic_skip_text_value` | Value to skip in genetic events | "present" |
| **Forecasting Task Prompts** | | |
| `forecasting_fval_prompt_start` | Main forecasting task introduction | "\nYour task is to predict..." |
| `forecasting_prompt_summarized_start` | Last values summary intro | "\nThe last values..." |
| `forecasting_firstday_override` | Text when first day truncated | "\nThe following events..." |
| `forecasting_prompt_summarized_genetic` | Genetic summary intro | "\n\n\nHere we repeat..." |
| `forecasting_prompt_summarized_lot` | Treatment line summary intro | "\nThe most recent line..." |
| **Time-to-Event Prompts** | | |
| `forecasting_tte_prompt_start` | TTE task introduction | "\nYour task is to predict..." |
| `forecasting_tte_prompt_mid` | TTE time horizon text | " weeks from..." |
| `forecasting_tte_prompt_end` | TTE format instructions | ".\nPlease provide..." |
| `target_prompt_start` | TTE answer format start | "\nHere is the prediction..." |
| `target_prompt_censor_true` | Text for censored events | "censored." |
| `target_prompt_censor_false` | Text for non-censored events | "not censored " |
| `target_prompt_before_occur` | Conjunction before occurrence | "and " |
| `target_prompt_occur` | Text for occurred events | "occurred." |
| `target_prompt_not_occur` | Text for non-occurred events | "did not occur." |
| **QA/Binning Prompts** | | |
| `qa_prompt_start` | QA task introduction | "\nYour task is to predict..." |
| `qa_bins_start` | Bins list introduction | "\tThe possible bins are: " |
| **Multi-Task Prompts** | | |
| `task_prompt_start` | Multi-task section intro | "\nYou will now have..." |
| `task_prompt_each_task` | Template for each task | "Task {task_nr} is " |
| `task_prompt_end` | End of task prompts | "" |
| `task_prompt_forecasting` | Forecasting task label | "forecasting:" |
| `task_prompt_forecasting_qa` | QA task label | "forecasting QA:" |
| `task_prompt_events` | Events task label | "time to event prediction:" |
| `task_prompt_custom` | Custom task label | " a custom task:" |
| `task_target_start` | Multi-task answer format | "Task {task_nr} is " |
| `task_target_end` | End of task answer | "" |
| **Overrides** | | |
| `event_category_preamble_mapping_override` | Custom preambles per category | None |
| `event_category_and_name_replace_override` | Custom rendering per event | None |
| `decimal_precision` | Decimal places for numbers | 2 |

---

## Summary

This tutorial demonstrated how to customize every aspect of TwinWeaver's text generation through the `Config` class. Key takeaways:

1. **Preamble and introduction text** sets the context for the patient record
2. **Event day formatting** controls how clinical visits are described temporally
3. **Time units** can be switched between days and weeks using `set_delta_time_unit()`
4. **Genetic data formatting** uses customizable tags and placeholder text
5. **Task prompts** (forecasting, TTE, QA) can be fully rewritten for different LLM styles
6. **Multi-task formatting** allows structured output for complex prediction tasks
7. **Category overrides** provide fine-grained control over specific event types

Use these customization options to:
- Adapt prompts for different language models
- Translate prompts to other languages
- Add visual formatting (emojis, separators) for clarity
- Match specific institutional or research requirements