# Quick Start Guide to GenAIRR

Welcome to the Quick Start Guide for GenAIRR, a Python module designed for generating synthetic Adaptive Immune Receptor Repertoire (AIRR) sequences. This guide will walk you through the basic usage of GenAIRR, including setting up your environment, simulating heavy and light chain sequences, and customizing your simulations.


## Installation

Before you begin, ensure that you have Python 3.x installed on your system. GenAIRR can be installed using pip, Python's package installer. Execute the following command in your terminal:


In [1]:
import pandas as pd
# Install GenAIRR using pip
#!pip install GenAIRR

## Setting Up

To start using GenAIRR, you need to import the necessary classes from the module. We'll also set up a `DataConfig` object to specify our configuration.


In [None]:
# Importing GenAIRR classes
# Core pipeline framework
from GenAIRR.pipeline import AugmentationPipeline

# Built-in data configurations (pre-compiled germline databases)
from GenAIRR.data import HUMAN_IGH_OGRDB, HUMAN_IGL_OGRDB, HUMAN_IGK_OGRDB

# Simulation and correction steps
from GenAIRR.steps import (SimulateSequence, FixVPositionAfterTrimmingIndexAmbiguity, 
                          FixDPositionAfterTrimmingIndexAmbiguity, FixJPositionAfterTrimmingIndexAmbiguity)
from GenAIRR.steps import (CorrectForVEndCut, CorrectForDTrims, CorruptSequenceBeginning, 
                          InsertNs, InsertIndels, ShortDValidation, DistillMutationRate)

# Mutation models
from GenAIRR.mutation import S5F

# Base classes and container
from GenAIRR.steps.StepBase import AugmentationStep
from GenAIRR.simulation import HeavyChainSequenceAugmentor, LightChainSequenceAugmentor, SequenceAugmentorArguments
from GenAIRR.dataconfig import DataConfig

# Use built-in data configurations directly
# These contain pre-processed germline gene databases from OGRDB
heavy_chain_config = HUMAN_IGH_OGRDB      # Heavy chain (has V, D, J, C segments)
kappa_chain_config = HUMAN_IGK_OGRDB      # Kappa light chain (has V, J, C segments)
lambda_chain_config = HUMAN_IGL_OGRDB     # Lambda light chain (has V, J, C segments)

## Simulating Heavy Chain Sequences

Let's simulate a BCR heavy chain sequence using the default GenAIRR pipeline for BCR heavy chain sequences via the `AugmentationPipeline`. This example demonstrates a simple simulation with default settings.


In [None]:
# Set the dataconfig for the simulations
# This is REQUIRED before creating any pipeline - it configures the germline database
AugmentationStep.set_dataconfig(heavy_chain_config)

# Create the simulation pipeline with biologically-motivated steps
pipeline = AugmentationPipeline([
    # 1. CORE SIMULATION: Generate a sequence with somatic hypermutation
    # S5F = context-dependent mutation model (realistic)
    # True = ensure sequence is productive (in-frame, no stop codons)
    SimulateSequence(S5F(min_mutation_rate=0.003, max_mutation_rate=0.25), True),
    
    # 2. POSITION CORRECTIONS: Fix ambiguities from trimming during V(D)J recombination
    FixVPositionAfterTrimmingIndexAmbiguity(),     # V segment position correction
    FixDPositionAfterTrimmingIndexAmbiguity(),     # D segment position correction  
    FixJPositionAfterTrimmingIndexAmbiguity(),     # J segment position correction
    
    # 3. BIOLOGICAL CORRECTIONS: Model natural trimming processes
    CorrectForVEndCut(),       # V segment 3' end trimming
    CorrectForDTrims(),        # D segment 5' and 3' trimming
    
    # 4. SEQUENCING ARTIFACTS: Model real-world sequencing issues
    # Parameters: (corruption_prob, [add, remove, both], max_length, add_coeff, remove_coeff, after_remove_coeff)
    CorruptSequenceBeginning(0.7, [0.4, 0.4, 0.2], 576, 210, 310, 50),
    
    # 5. AMBIGUOUS BASES: Add 'N' calls (unreadable bases)
    # Parameters: (fraction_of_positions, probability_per_position)
    InsertNs(0.02, 0.5),
    
    # 6. QUALITY CONTROL: Remove sequences with very short D segments
    ShortDValidation(),
    
    # 7. STRUCTURAL VARIANTS: Add insertions and deletions
    # Parameters: (indel_probability, max_indels, deletion_prob, insertion_prob)
    InsertIndels(0.5, 5, 0.5, 0.5),
    
    # 8. FINALIZATION: Calculate final mutation rate statistics
    DistillMutationRate()
])

# Simulate a heavy chain sequence
heavy_sequence = pipeline.execute()

# Print the simulated heavy chain sequence with all metadata
print("Simulated Heavy Chain Sequence:", heavy_sequence.get_dict())

Simulated Heavy Chain Sequence: {'sequence': 'GGGGGTCCCTGAGACTCTCCGGGGCAGTGTCTGGATTCACCNTCAGTAGCTATGGCATGCACTGGGTCTNCCAGGCTCCANGCAAGGGGCTGGAGTGGGTGACATTTACAACGGATAAAGGCAGTAATAAATACTATGCAGACTNCGTGAAGGGCCGATCCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATGTTCAAATGAACAGCCTGAGACGTGAGGACACNGCTGTGTTTTACTGGCGAAAGATCCTGACTACACTAGTTCCACCAACTGGTTCCACCACTGGGGCCCGGGAACCCTGGNCACCGTCTCCTCAGGCATCCC', 'v_call': ['IGHVF10-G37*08'], 'd_call': ['IGHD4-4*01', 'IGHD4-11*01'], 'j_call': ['IGHJ5*02'], 'c_call': ['IGHA2*01'], 'v_sequence_start': 0, 'v_sequence_end': 252, 'd_sequence_start': 255, 'd_sequence_end': 266, 'j_sequence_start': 273, 'j_sequence_end': 323, 'v_germline_start': 43, 'v_germline_end': 296, 'd_germline_start': 0, 'd_germline_end': 11, 'j_germline_start': 1, 'j_germline_end': 51, 'junction_sequence_start': 242, 'junction_sequence_end': 292, 'mutation_rate': 0.07575757575757576, 'mutations': {20: 'T>G', 22: 'T>G', 27: 'C>T', 68: 'C>T', 101: 'G>A', 108: 'T>C', 116: 'G>A', 118: 'T>A', 121: 'A>C', 159: 

### Understanding the Output

The simulation returns a `SimulationContainer` object with rich metadata. Let's examine the key fields:


In [None]:
# Let's examine the structure of our simulated sequence
result_dict = heavy_sequence.get_dict()

print("=== KEY OUTPUT FIELDS ===")
print(f"Sequence: {result_dict['sequence'][:50]}... (length: {len(result_dict['sequence'])})")
print(f"V allele used: {result_dict['v_call']}")
print(f"D allele used: {result_dict['d_call']}")  
print(f"J allele used: {result_dict['j_call']}")
print(f"Is productive: {result_dict['productive']}")
print(f"Mutation rate: {result_dict['mutation_rate']:.3f}")
print(f"Number of mutations: {len(result_dict['mutations'])}")
print(f"Number of N bases: {len(result_dict['Ns'])}")
print(f"Number of indels: {len(result_dict['indels'])}")

print("\n=== SEQUENCE REGIONS ===")
print(f"V region: positions {result_dict['v_sequence_start']}-{result_dict['v_sequence_end']}")
print(f"D region: positions {result_dict['d_sequence_start']}-{result_dict['d_sequence_end']}")
print(f"J region: positions {result_dict['j_sequence_start']}-{result_dict['j_sequence_end']}")
print(f"Junction (CDR3): positions {result_dict['junction_sequence_start']}-{result_dict['junction_sequence_end']}")

# Show actual sequence regions
seq = result_dict['sequence']
v_region = seq[result_dict['v_sequence_start']:result_dict['v_sequence_end']+1]
d_region = seq[result_dict['d_sequence_start']:result_dict['d_sequence_end']+1]
j_region = seq[result_dict['j_sequence_start']:result_dict['j_sequence_end']+1]

print(f"\nV region sequence: {v_region}")
print(f"D region sequence: {d_region}")
print(f"J region sequence: {j_region}")

# Show some mutations if they exist
if result_dict['mutations']:
    print(f"\n=== FIRST FEW MUTATIONS ===")
    mutation_items = list(result_dict['mutations'].items())[:5]
    for pos, base in mutation_items:
        original = "unknown"  # We don't store original base, but show the concept
        print(f"Position {pos}: mutated to '{base}'")

## Customizing Simulations

GenAIRR allows for extensive customization to closely mimic the natural diversity of immune sequences. Below is an example of how to customize mutation rates and indel simulations.


In [None]:
# Customize augmentation arguments
custom_mutation_model = S5F(min_mutation_rate=0.1, max_mutation_rate=0.5)
custom_insert_indel_step = InsertIndels(0.05, 15, 0.7, 0.3)

pipeline = AugmentationPipeline([
    SimulateSequence(custom_mutation_model, True),  # Use the custom mutation model
    FixVPositionAfterTrimmingIndexAmbiguity(),
    FixDPositionAfterTrimmingIndexAmbiguity(),
    FixJPositionAfterTrimmingIndexAmbiguity(),
    CorrectForVEndCut(),
    CorrectForDTrims(),
    CorruptSequenceBeginning(0.7, [0.4, 0.4, 0.2], 576, 210, 310, 50),
    InsertNs(0.02, 0.5),
    ShortDValidation(),
    custom_insert_indel_step,  # Use the custom insert indel step
    DistillMutationRate()
])

# Simulate a heavy chain sequence
heavy_sequence = pipeline.execute()

# Print the simulated heavy chain sequence
print("Customized Simulated Heavy Chain Sequence:", heavy_sequence.get_dict())

Customized Simulated Heavy Chain Sequence: {'sequence': 'CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGGAGCCTTGGGACACCCTGACCCTCACCTCCACTGTCTCTAGTTACCCCATCAGAAATATTATCAGGTTGGACTGGANCCGTCANCCCCCAGTGAAGAGACTGGAGTGGATTGGATACATCGATTATAGAGGGAGCATCAATCACAATCCGTCCCTGAACAGTCGAGTCACCATGTCAGTGGACACGTCCAAGANCCCGTTCTCCCTGAATCTGAACTCTGTGAGCGCCNTGGACACGACCCTGTATTATTGTGCGGGAATCGANTGGGGGGGGGCTTCAGAGAGNCANTACTACGAAATGGACGTCTGGGGCCAAGGGACCACGGTCACCGCCGCCTCAGG', 'v_call': ['IGHVF3-G9*03'], 'd_call': ['IGHD3-10*03'], 'j_call': ['IGHJ6*02'], 'c_call': ['IGHG1*07'], 'v_sequence_start': 0, 'v_sequence_end': 295, 'd_sequence_start': 311, 'd_sequence_end': 320, 'j_sequence_start': 321, 'j_sequence_end': 376, 'v_germline_start': 0, 'v_germline_end': 295, 'd_germline_start': 12, 'd_germline_end': 21, 'j_germline_start': 8, 'j_germline_end': 63, 'junction_sequence_start': 285, 'junction_sequence_end': 345, 'mutation_rate': 0.1246684350132626, 'mutations': {36: 'A>G', 43: 'C>G', 54: 'T>A', 64: 'G>C', 66: 'G>A', 75: 'G>A', 81

### Understanding Parameter Impact

Let's see how different mutation rates affect the output by comparing three scenarios: naive, memory, and plasma cell-like sequences.

In [None]:
# Create three pipelines with different mutation rates
from GenAIRR.steps import SimulateSequence

# 1. Naive B cell (very few mutations)
naive_pipeline = AugmentationPipeline([
    SimulateSequence(S5F(min_mutation_rate=0.001, max_mutation_rate=0.01), True),
    FixVPositionAfterTrimmingIndexAmbiguity(),
    FixDPositionAfterTrimmingIndexAmbiguity(),
    FixJPositionAfterTrimmingIndexAmbiguity(),
    DistillMutationRate()
])

# 2. Memory B cell (moderate mutations)
memory_pipeline = AugmentationPipeline([
    SimulateSequence(S5F(min_mutation_rate=0.02, max_mutation_rate=0.08), True),
    FixVPositionAfterTrimmingIndexAmbiguity(),
    FixDPositionAfterTrimmingIndexAmbiguity(),
    FixJPositionAfterTrimmingIndexAmbiguity(),
    DistillMutationRate()
])

# 3. Plasma cell (high mutations)
plasma_pipeline = AugmentationPipeline([
    SimulateSequence(S5F(min_mutation_rate=0.05, max_mutation_rate=0.25), True),
    FixVPositionAfterTrimmingIndexAmbiguity(),
    FixDPositionAfterTrimmingIndexAmbiguity(),
    FixJPositionAfterTrimmingIndexAmbiguity(),
    DistillMutationRate()
])

# Generate one sequence from each
naive_seq = naive_pipeline.execute().get_dict()
memory_seq = memory_pipeline.execute().get_dict()
plasma_seq = plasma_pipeline.execute().get_dict()

print("=== MUTATION RATE COMPARISON ===")
print(f"Naive B cell:  {naive_seq['mutation_rate']:.3f} ({len(naive_seq['mutations'])} mutations)")
print(f"Memory B cell: {memory_seq['mutation_rate']:.3f} ({len(memory_seq['mutations'])} mutations)")
print(f"Plasma cell:   {plasma_seq['mutation_rate']:.3f} ({len(plasma_seq['mutations'])} mutations)")

print(f"\nSequence lengths:")
print(f"Naive:  {len(naive_seq['sequence'])} bp")
print(f"Memory: {len(memory_seq['sequence'])} bp") 
print(f"Plasma: {len(plasma_seq['sequence'])} bp")

## Generating Naïve Sequences

In immunogenetics, a naïve sequence refers to an antibody sequence that has not undergone the process of somatic hypermutation. GenAIRR allows you to simulate such naïve sequences using the `HeavyChainSequence` class. Let's start by generating a naïve heavy chain sequence.


In [None]:
from GenAIRR.sequence import HeavyChainSequence

# Create a naive heavy chain sequence using the heavy chain config
naive_heavy_sequence = HeavyChainSequence.create_random(heavy_chain_config)

# Access the generated naive sequence
naive_sequence = naive_heavy_sequence

print("Naïve Heavy Chain Sequence:", naive_sequence)
print('Ungapped Sequence: ')
print(naive_sequence.ungapped_seq)

Naïve Heavy Chain Sequence: 0|-----------------------------------------------------------------------V(IGHVF10-G42*05)|298336|----------J(IGHJ4*02)|378|378|---------C(IGHG1*08)|416
Ungapped Sequence: 
GAAGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGCCATGCACTGGGTCCGTCAAGCTCCAGGGAAGGGTCTGGAGTGGGTCTCTCTTATTAGTGGGGATGGTGGTAGCACATACTATGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACAGCAAAAACTCCCTGTATCTGCAAATGAACAGTCTGAGAACTGAGGACACCGCCTTGTATTACTGTGCAAAAGATATCGTAGGGGACTCAATATGGTCTGTGAAGTACAGGCAATTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAGGCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCACC


## Applying Mutations

To mimic the natural diversity and evolution of immune sequences, GenAIRR supports the simulation of mutations through various models. Here, we demonstrate how to apply mutations to a naïve sequence using the `S5F` and `Uniform` mutation models from the mutations submodule.


### Using the S5F Mutation Model

The `S5F` model is a sophisticated mutation model that considers context-dependent mutation probabilities. It's particularly useful for simulating realistic somatic hypermutations.


In [6]:
from GenAIRR.mutation import S5F

# Initialize the S5F mutation model with custom mutation rates
s5f_model = S5F(min_mutation_rate=0.01, max_mutation_rate=0.05)

# Apply mutations to the naive sequence using the S5F model
s5f_mutated_sequence, mutations, mutation_rate = s5f_model.apply_mutation(naive_heavy_sequence)

print("S5F Mutated Heavy Chain Sequence:", s5f_mutated_sequence)
print("S5F Mutation Details:", mutations)
print("S5F Mutation Rate:", mutation_rate)


S5F Mutated Heavy Chain Sequence: GAAGCGCAGCTGGTGAAGTCTGGGGGAGGCGTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATAATTATGCCATACACTGGGTCCGTCAAGCTCCAGGGAAGGGTCTGGAGTGGGTCTCTCTTTTTACTGGGGATGTTGGTAGAACATACTATGCAGACTCTGCGAAGGGCCGATTCACCATCTCCAGAGACAACAGAAAAAACTCCCTGTATCTGCAAATGAACAGTGTGAGAACTGAGGACAGCGCCTTGTATTACTGTGCAAAAGATATCGTAGGGGACTCACTATGGTCTGTGAAGTACAGGCGATTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAGGCCTCCACGAAGGGCCCATCGGTCTTCCCCCTGGCACC
S5F Mutation Details: {312: 'A>C', 154: 'G>C', 224: 'C>A', 150: 'A>T', 90: 'G>A', 255: 'C>G', 170: 'C>A', 101: 'G>A', 271: 'C>G', 190: 'T>C', 163: 'G>T', 4: 'T>C', 15: 'G>A', 334: 'A>G', 386: 'C>G'}
S5F Mutation Rate: 0.036575814563128965


### Using the Uniform Mutation Model

The `Uniform` mutation model applies mutations at a uniform rate across the sequence, providing a simpler alternative to the context-dependent models.


In [7]:
from GenAIRR.mutation import Uniform

# Initialize the Uniform mutation model with custom mutation rates
uniform_model = Uniform(min_mutation_rate=0.01, max_mutation_rate=0.05)

# Apply mutations to the naive sequence using the Uniform model
uniform_mutated_sequence, mutations, mutation_rate = uniform_model.apply_mutation(naive_heavy_sequence)

print("Uniform Mutated Heavy Chain Sequence:", uniform_mutated_sequence)
print("Uniform Mutation Details:", mutations)
print("Uniform Mutation Rate:", mutation_rate)


Uniform Mutated Heavy Chain Sequence: GAAGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTTCATCCTGGGGGGTCGCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATAATTATGCCACGCACTGGGTCAGTCAAGCTCCGGGGAAGTGTCTGGAGTGGGTCTCTCTTATTAGTGGGGATTGTCGTAGCACATACTATGCAGAGTCTGTGAAGGGCCGATTCACCATCTCCCGAGACAACAGCAAAAACTCCCTGTATCTGCATATGTACAGTCTGAGATCTGAGGACACCGCGTTGTATCACTGTGCAAAAGATATCGTAGGGGACTCAATATGGTCTGTGAAGTACAGGCAATTGACTTCTGGGGCTAGGGAACCCTGGTCACTGTCTCCTCAGGCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCACC
Uniform Mutation Details: {38: 'G>T', 165: 'G>C', 111: 'C>A', 213: 'A>C', 185: 'C>G', 282: 'T>C', 245: 'A>T', 342: 'A>T', 122: 'A>G', 100: 'T>C', 249: 'A>T', 275: 'C>G', 261: 'A>T', 162: 'G>T', 90: 'G>A', 367: 'C>T', 350: 'C>T', 50: 'C>G', 35: 'A>T', 129: 'G>T'}
Uniform Mutation Rate: 0.04850801933682698


### Light Chain Simulation

Heavy chains are just one part of the antibody structure. Let's also simulate light chains (kappa and lambda) to understand the differences:

In [None]:
# Light chains are simpler - they lack D segments
# Let's simulate both kappa and lambda light chains

print("=== LIGHT CHAIN SIMULATION ===")

# 1. Kappa light chain simulation
print("Setting up Kappa light chain...")
AugmentationStep.set_dataconfig(kappa_chain_config)

kappa_pipeline = AugmentationPipeline([
    SimulateSequence(S5F(min_mutation_rate=0.02, max_mutation_rate=0.08), True),
    FixVPositionAfterTrimmingIndexAmbiguity(),
    FixJPositionAfterTrimmingIndexAmbiguity(),  # Note: no D position fix needed
    CorrectForVEndCut(),
    # Note: no CorrectForDTrims or ShortDValidation for light chains
    CorruptSequenceBeginning(0.7, [0.4, 0.4, 0.2], 400, 150, 200, 30),  # Shorter max length
    InsertNs(0.02, 0.5),
    InsertIndels(0.3, 3, 0.5, 0.5),  # Fewer indels typical for light chains
    DistillMutationRate()
])

kappa_sequence = kappa_pipeline.execute().get_dict()

# 2. Lambda light chain simulation  
print("Setting up Lambda light chain...")
AugmentationStep.set_dataconfig(lambda_chain_config)

lambda_pipeline = AugmentationPipeline([
    SimulateSequence(S5F(min_mutation_rate=0.02, max_mutation_rate=0.08), True),
    FixVPositionAfterTrimmingIndexAmbiguity(),
    FixJPositionAfterTrimmingIndexAmbiguity(),
    CorrectForVEndCut(),
    CorruptSequenceBeginning(0.7, [0.4, 0.4, 0.2], 400, 150, 200, 30),
    InsertNs(0.02, 0.5),
    InsertIndels(0.3, 3, 0.5, 0.5),
    DistillMutationRate()
])

lambda_sequence = lambda_pipeline.execute().get_dict()

# 3. Compare all three chain types
print("\n=== CHAIN TYPE COMPARISON ===")

# Reset to heavy chain for comparison
AugmentationStep.set_dataconfig(heavy_chain_config)
heavy_simple = AugmentationPipeline([
    SimulateSequence(S5F(min_mutation_rate=0.02, max_mutation_rate=0.08), True),
    FixVPositionAfterTrimmingIndexAmbiguity(),
    FixDPositionAfterTrimmingIndexAmbiguity(),
    FixJPositionAfterTrimmingIndexAmbiguity(),
    DistillMutationRate()
])
heavy_sequence_simple = heavy_simple.execute().get_dict()

# Comparison table
comparison_data = {
    'Chain Type': ['Heavy', 'Kappa Light', 'Lambda Light'],
    'Has D Segment': [True, False, False],
    'Sequence Length': [
        len(heavy_sequence_simple['sequence']),
        len(kappa_sequence['sequence']), 
        len(lambda_sequence['sequence'])
    ],
    'V Allele': [
        heavy_sequence_simple['v_call'][0],
        kappa_sequence['v_call'][0],
        lambda_sequence['v_call'][0]
    ],
    'D Allele': [
        heavy_sequence_simple['d_call'][0],
        'N/A',
        'N/A'
    ],
    'J Allele': [
        heavy_sequence_simple['j_call'][0],
        kappa_sequence['j_call'][0],
        lambda_sequence['j_call'][0]
    ],
    'Mutations': [
        len(heavy_sequence_simple['mutations']),
        len(kappa_sequence['mutations']),
        len(lambda_sequence['mutations'])
    ],
    'Mutation Rate': [
        f"{heavy_sequence_simple['mutation_rate']:.3f}",
        f"{kappa_sequence['mutation_rate']:.3f}",
        f"{lambda_sequence['mutation_rate']:.3f}"
    ]
}

comparison_df = pd.DataFrame(comparison_data)
print(comparison_df.to_string(index=False))

print(f"\n=== BIOLOGICAL CONTEXT ===")
print(f"• Heavy chains pair with either kappa OR lambda light chains")
print(f"• Human ratio: ~60% kappa, 40% lambda")
print(f"• Light chains are generally shorter due to lack of D segment")
print(f"• All chains undergo similar mutation processes")
print(f"• Each B cell expresses only one type of light chain")

print(f"\n=== SEQUENCE STRUCTURE COMPARISON ===")
print(f"Heavy chain:  V — D — J — C")
print(f"Light chain:  V ——— J — C (no D segment)")

# Show actual sequence regions for comparison
print(f"\nV region lengths:")
print(f"  Heavy: {heavy_sequence_simple['v_sequence_end'] - heavy_sequence_simple['v_sequence_start']} bp")
print(f"  Kappa: {kappa_sequence['v_sequence_end'] - kappa_sequence['v_sequence_start']} bp")
print(f"  Lambda: {lambda_sequence['v_sequence_end'] - lambda_sequence['v_sequence_start']} bp")

print(f"\n💡 Tip: In real B cells, heavy and light chains are expressed together")
print(f"   to form functional antibodies. You might want to simulate pairs!")

# Reset back to heavy chain config for rest of notebook
AugmentationStep.set_dataconfig(heavy_chain_config)

## Common Use Cases

GenAIRR is a versatile tool designed to meet a broad range of needs in immunogenetics research. This section provides examples and explanations for some common use cases, including generating multiple sequences, simulating specific allele combinations, and more.


### Generating Many Sequences

One common requirement is to generate a large dataset of synthetic AIRR sequences for analysis or benchmarking. Below is an example of how to generate multiple sequences using GenAIRR in a loop.


In [8]:
num_sequences = 5  # Number of sequences to generate

heavy_sequences = []
for _ in range(num_sequences):
    # Simulate a heavy chain sequence
    heavy_sequence = pipeline.execute().get_dict()
    heavy_sequences.append(heavy_sequence)

# Display the generated sequences
for i, seq in enumerate(heavy_sequences, start=1):
    print(f"Heavy Chain Sequence {i}: {seq}")


Heavy Chain Sequence 1: {'sequence': 'TGACGGAATAATCCCTTTCGAGTGGGTTGAAGCCCAGGGAAGAGATCAGACGATTCATTGTTAACAGCTAAGGGTGTGGCGNTTGTGTGGGGGACACGCGATGNCGTTTGGGAGAGGCCNGATCTGGCCGGGAGAGTCCCAGCGAGTCTCTGGTGCAGCTTCTCACTTGACCTTGCGAGGTCTCATATCAGACTGGGTCCNCCGGACTCCNGANAAGGGACTGGCCGGAGTCTCGTCCCGANTTAGGCCAAGCGGTCGCAAGTGGTGCACAAAGGCGCTCACGGGCCCCCTCGTCGTTTCCATAGACTCGGTCAGACGGTCTCTGNCTTCGCAGTTGGAGAGGCGGACGGGCGATGACACAAGTGGNCCGTAATTCTCGTGACTGGGAATGTAACGACGCGGGGCTGAGCCCGACATATGATGTGCTAGGGGCGTCGGGGTCAACGGGACGTCGGACAATGGCCCGCGAGGCCTGCGTCCCGGGCCCGTC', 'v_call': ['IGHVF10-G51*05'], 'd_call': ['IGHD5-5*01', 'IGHD5-18*01'], 'j_call': ['IGHJ6*03'], 'c_call': ['IGHG2*12'], 'v_sequence_start': 88, 'v_sequence_end': 384, 'd_sequence_start': 400, 'd_sequence_end': 407, 'j_sequence_start': 413, 'j_sequence_end': 470, 'v_germline_start': 0, 'v_germline_end': 296, 'd_germline_start': 2, 'd_germline_end': 9, 'j_germline_start': 6, 'j_germline_end': 63, 'junction_sequence_start': 373, 'junction_sequence_end': 439, 'mutation_

### Performance Considerations

When generating many sequences, there are several important considerations for efficiency and memory usage:

In [None]:
import time
import sys

# Demonstrate timing and memory considerations
print("=== TIMING COMPARISON ===")

# Time simple vs complex pipeline
simple_pipeline = AugmentationPipeline([
    SimulateSequence(S5F(0.01, 0.05), True)
])

complex_pipeline = pipeline  # The full pipeline we created earlier

# Time simple pipeline
start_time = time.time()
simple_sequences = [simple_pipeline.execute().get_dict() for _ in range(10)]
simple_time = time.time() - start_time

# Time complex pipeline  
start_time = time.time()
complex_sequences = [complex_pipeline.execute().get_dict() for _ in range(10)]
complex_time = time.time() - start_time

print(f"Simple pipeline (10 sequences): {simple_time:.3f} seconds")
print(f"Complex pipeline (10 sequences): {complex_time:.3f} seconds")
print(f"Speed difference: {complex_time/simple_time:.1f}x slower for full pipeline")

# Memory usage demonstration
print(f"\n=== MEMORY USAGE ===")
sequence_dict = simple_sequences[0]
memory_size = sys.getsizeof(str(sequence_dict))  # Rough estimate
print(f"Approximate memory per sequence: {memory_size} bytes")
print(f"For 1000 sequences: ~{memory_size * 1000 / 1024:.1f} KB")
print(f"For 100,000 sequences: ~{memory_size * 100000 / (1024*1024):.1f} MB")

# Batch processing recommendation
print(f"\n=== BATCH PROCESSING EXAMPLE ===")
def generate_sequences_in_batches(pipeline, total_sequences, batch_size=1000):
    """Generate sequences in batches to manage memory."""
    all_sequences = []
    
    for batch_start in range(0, total_sequences, batch_size):
        batch_end = min(batch_start + batch_size, total_sequences)
        batch = [pipeline.execute().get_dict() for _ in range(batch_end - batch_start)]
        all_sequences.extend(batch)
        print(f"Generated batch {batch_start//batch_size + 1}: {len(all_sequences)}/{total_sequences} sequences")
        
        # In real applications, you might save each batch to disk here
        # and clear memory: del batch
    
    return all_sequences

# Example: generate 25 sequences in batches of 10
small_dataset = generate_sequences_in_batches(simple_pipeline, 25, batch_size=10)
print(f"Successfully generated {len(small_dataset)} sequences using batch processing!")

In [9]:
import pandas as pd
pd.DataFrame(heavy_sequences)

Unnamed: 0,sequence,v_call,d_call,j_call,c_call,v_sequence_start,v_sequence_end,d_sequence_start,d_sequence_end,j_sequence_start,...,c_trim_3,productive,stop_codon,vj_in_frame,note,corruption_event,corruption_add_amount,corruption_remove_amount,corruption_removed_section,corruption_added_section
0,TGACGGAATAATCCCTTTCGAGTGGGTTGAAGCCCAGGGAAGAGAT...,[IGHVF10-G51*05],"[IGHD5-5*01, IGHD5-18*01]",[IGHJ6*03],[IGHG2*12],88,384,400,407,413,...,20,False,True,False,Stop codon present.,add,88,0,,TGACGGAATAATCCCTTTCGAGTGGGTTGAAGCCCAGGGAAGAGAT...
1,GATGACGTCAAGAGGTCCCCAGACAAGCGCTTCAGTGAACGTCAGA...,[IGHVF6-G20*02],[IGHD3-3*01],[IGHJ6*02],[IGHG3*26],12,191,196,215,218,...,19,False,True,False,Stop codon present.,remove_before_add,15,115,GACAACCACTTGGCGCAGTGCCGGGCTGTGGATAAAGTGCCTGGAC...,GATGACGTCAAGAGG
2,GCTCTAAACAGTAATTATGCCGTCCGTNCACCAGTCCCCAGGCAAG...,[IGHVF3-G10*03],[IGHD3-22*01],[IGHJ6*02],[IGHE*04],25,208,223,232,233,...,14,False,False,True,V second C not present.,remove_before_add,25,114,TAATTTCATCTGCAGGGGTCGGGCCCAGGGCTAGGGAAGCCTCTCG...,GCTCTAAACAGTAATTATGCCGTCC
3,GAGGTGCAGTTGATGGAGTCGGGGAGTGACTTGGTCCGGCCGGCAG...,[IGHVF10-G49*04],"[IGHD5-18*01, IGHD5-5*01]",[IGHJ3*02],[IGHG3*25],0,297,313,321,324,...,12,False,True,False,Stop codon present.,no-corruption,0,0,,
4,GGGCCTGGGAGTGGGTCTCATCCGATAGTGATGGAAGCACGCAGTG...,[IGHVF10-G35*02],[IGHD1-14*01],[IGHJ6*03],[IGHG1*13],0,164,166,174,180,...,24,False,True,False,Stop codon present.Stop codon present.,remove,0,128,GGGGTGCACCTGGCGGTGTCTGGGGCAGAGTTGGTTCAGCCTGGAG...,


### Export and Analysis Options

GenAIRR data can be exported to various formats for downstream analysis:

In [None]:
import json
import pandas as pd

# Let's generate a small dataset for export examples
print("Generating sample dataset...")
sample_sequences = []
for i in range(20):
    seq = pipeline.execute().get_dict()
    seq['sequence_id'] = f"seq_{i:03d}"  # Add unique ID
    sample_sequences.append(seq)

print(f"Generated {len(sample_sequences)} sequences for export examples")

# 1. FASTA format (for sequence analysis tools)
print("\n=== FASTA EXPORT ===")
fasta_content = ""
for seq in sample_sequences[:3]:  # Show first 3
    fasta_content += f">{seq['sequence_id']}|{seq['v_call'][0]}|{seq['d_call'][0]}|{seq['j_call'][0]}\n"
    fasta_content += f"{seq['sequence']}\n"

print("FASTA format preview:")
print(fasta_content)

# Save to file (commented out to avoid creating files in demo)
# with open('simulated_sequences.fasta', 'w') as f:
#     f.write(fasta_content)

# 2. CSV format (for spreadsheet analysis)
print("=== CSV EXPORT ===")
df = pd.DataFrame(sample_sequences)

# Select key columns for CSV
key_columns = ['sequence_id', 'sequence', 'v_call', 'd_call', 'j_call', 
               'mutation_rate', 'productive', 'v_sequence_start', 'v_sequence_end']
csv_df = df[key_columns].copy()

# Flatten list columns for CSV compatibility
csv_df['v_call'] = csv_df['v_call'].apply(lambda x: x[0] if x else '')
csv_df['d_call'] = csv_df['d_call'].apply(lambda x: x[0] if x else '')
csv_df['j_call'] = csv_df['j_call'].apply(lambda x: x[0] if x else '')

print("CSV format preview:")
print(csv_df.head(3))

# Save to file (commented out)
# csv_df.to_csv('simulated_sequences.csv', index=False)

# 3. JSON format (preserves all metadata)
print("\n=== JSON EXPORT ===")
json_sample = sample_sequences[0]
print("JSON format preview (first sequence):")
print(json.dumps({k: v for i, (k, v) in enumerate(json_sample.items()) if i < 5}, indent=2))
print("... (truncated)")

# Save to file (commented out)
# with open('simulated_sequences.json', 'w') as f:
#     json.dump(sample_sequences, f, indent=2)

# 4. Basic analysis examples
print("\n=== BASIC ANALYSIS ===")

# Summary statistics
mutation_rates = [seq['mutation_rate'] for seq in sample_sequences]
sequence_lengths = [len(seq['sequence']) for seq in sample_sequences]
productive_count = sum(1 for seq in sample_sequences if seq['productive'])

print(f"Dataset summary:")
print(f"  Total sequences: {len(sample_sequences)}")
print(f"  Productive sequences: {productive_count} ({productive_count/len(sample_sequences)*100:.1f}%)")
print(f"  Average mutation rate: {sum(mutation_rates)/len(mutation_rates):.3f}")
print(f"  Average sequence length: {sum(sequence_lengths)/len(sequence_lengths):.1f} bp")
print(f"  Mutation rate range: {min(mutation_rates):.3f} - {max(mutation_rates):.3f}")

# Allele usage analysis
from collections import Counter

v_alleles = [seq['v_call'][0] for seq in sample_sequences if seq['v_call']]
d_alleles = [seq['d_call'][0] for seq in sample_sequences if seq['d_call']]
j_alleles = [seq['j_call'][0] for seq in sample_sequences if seq['j_call']]

print(f"\n=== ALLELE USAGE ===")
print(f"Most common V alleles: {Counter(v_alleles).most_common(3)}")
print(f"Most common D alleles: {Counter(d_alleles).most_common(3)}")
print(f"Most common J alleles: {Counter(j_alleles).most_common(3)}")

# Mutation distribution analysis
mutation_counts = [len(seq['mutations']) for seq in sample_sequences]
print(f"\nMutation count distribution:")
print(f"  Min mutations: {min(mutation_counts)}")
print(f"  Max mutations: {max(mutation_counts)}")
print(f"  Average mutations: {sum(mutation_counts)/len(mutation_counts):.1f}")

print(f"\n💡 Tip: For large-scale analysis, consider using:")
print(f"   - BioPython for sequence analysis")
print(f"   - Pandas for statistical analysis") 
print(f"   - Plotly/Matplotlib for visualization")
print(f"   - AIRR format for standardized output")

### Generating a Specific Allele Combination Sequence

In some cases, you might want to simulate sequences with specific V, D, and J allele combinations. Here's how to specify alleles for your simulations.


In [None]:
# Method 1: Using SimulateSequence with specific alleles (Recommended)
# This is the modern, preferred approach

# First, let's see what alleles are available
print("Available V allele families:", list(heavy_chain_config.v_alleles.keys())[:5])
print("Available D allele families:", list(heavy_chain_config.d_alleles.keys())[:5])
print("Available J allele families:", list(heavy_chain_config.j_alleles.keys())[:5])

# Select specific alleles by accessing them from the data config
specific_v = heavy_chain_config.v_alleles['IGHV1-2*02'][0]  # First allele in the family
specific_d = heavy_chain_config.d_alleles['IGHD3-10*01'][0]
specific_j = heavy_chain_config.j_alleles['IGHJ4*02'][0]

print(f"\nSelected alleles:")
print(f"V allele: {specific_v.name}")
print(f"D allele: {specific_d.name}")
print(f"J allele: {specific_j.name}")

# Create pipeline with specific alleles
specific_allele_pipeline = AugmentationPipeline([
    SimulateSequence(
        S5F(min_mutation_rate=0.02, max_mutation_rate=0.08), 
        productive=True,
        specific_v=specific_v,
        specific_d=specific_d,
        specific_j=specific_j
    ),
    FixVPositionAfterTrimmingIndexAmbiguity(),
    FixDPositionAfterTrimmingIndexAmbiguity(),
    FixJPositionAfterTrimmingIndexAmbiguity(),
    DistillMutationRate()
])

# Generate the sequence
specific_sequence = specific_allele_pipeline.execute()
result = specific_sequence.get_dict()

print(f"\n=== SPECIFIC ALLELE SEQUENCE ===")
print(f"Generated sequence uses:")
print(f"V: {result['v_call']}")
print(f"D: {result['d_call']}")
print(f"J: {result['j_call']}")
print(f"Sequence: {result['sequence'][:60]}...")
print(f"Mutation rate: {result['mutation_rate']:.3f}")


# Method 2: Direct sequence creation (for advanced users)
print(f"\n=== ALTERNATIVE: DIRECT SEQUENCE CREATION ===")
from GenAIRR.sequence import HeavyChainSequence

# Create sequence directly with specific alleles
direct_sequence = HeavyChainSequence.create_random(
    heavy_chain_config,
    specific_v=specific_v,
    specific_d=specific_d,
    specific_j=specific_j
)

# Apply mutations
s5f_model = S5F(min_mutation_rate=0.02, max_mutation_rate=0.08)
direct_sequence.mutate(s5f_model)

print(f"Direct creation result:")
print(f"V allele: {direct_sequence.v_allele.name}")
print(f"D allele: {direct_sequence.d_allele.name}")
print(f"J allele: {direct_sequence.j_allele.name}")
print(f"Mutated sequence: {direct_sequence.mutated_seq[:60]}...")

Specific Allele Combination Sequence: CAGGTGCAGGTGGTGCAGTATGGGGCTGAGGTGAAGAAGCCTGGGTCCGCGGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTCAGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATCTTTGGTACAGCCAACTACGCACAGAAGTTCCAGGACAGAGTCACGATTACCGCGGGTGAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGATCGGAAGCTACTACTACTACATGGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCAGGCAGCCC


### Generating Naïve vs. Mutated Sequence Pairs

Comparing naïve and mutated versions of the same sequence can be useful for studying somatic hypermutation effects. Here's how to generate such pairs with GenAIRR.


In [15]:
# Generate a naive sequence
sequence_object = HeavyChainSequence.create_random(data_config_builtin)
sequence_object.mutate(s5f_model)

print("Naïve Sequence:", sequence_object.ungapped_seq)
print("Mutated Sequence:", sequence_object.mutated_seq)


Naïve Sequence: GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAAACTCTCCTGTGCAGCCTCTGGGTTCACCTTCAGTGGCTCTGCTATGCACTGGGTCCGCCAGGCTTCCGGGAAAGGGCTGGAGTGGGTTGGCCGTATTAGAAGCAAAGCTAACAGTTACGCGACAGCATATGCTGCGTCGGTGAAAGGCAGGTTCACCATCTCCAGAGATGATTCAAAGAACACGGCGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACGGCCGTGTATTACTGTACTAGCTTGGGGGTTATAGCAGCGGCCGGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAGGCTTCCACCAAGGGCC
Mutated Sequence: GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAAACTCTCCTGTGCAGCCTCTGGGTTAACCTTCAGTGGCTCTGCTATGCACTGGGTCCGCCAGGCTTCCGAGAAAGGGCTGGAGTGGGTTGGCCGTATTAGAAGCAAAGCTAACAGTTACGCGACAGCATCTGCTGCGTCGGTGAAAGGCAGGTTCACCATCTCCAGAGATGATTCAAAGAACACGGCGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACGGCCGTGTCTTATTGTACTAGGTCGGGGGCTATAGAAGCGGCCCGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAGGCTGCAACCAAGGGCC


## Error Handling and Debugging

Understanding common issues and how to debug them is crucial for effective use of GenAIRR.

In [None]:
# Common Error 1: Forgot to set DataConfig
print("=== COMMON ERROR DEMONSTRATION ===")

try:
    # This will fail if we reset the dataconfig
    from GenAIRR.steps.StepBase import AugmentationStep
    
    # Temporarily clear the dataconfig to show the error
    original_config = AugmentationStep.dataconfig
    AugmentationStep.dataconfig = None
    
    # This should fail
    bad_pipeline = AugmentationPipeline([SimulateSequence(S5F(), True)])
    result = bad_pipeline.execute()
    
except AttributeError as e:
    print(f"❌ Error caught: {e}")
    print("💡 Solution: Always call AugmentationStep.set_dataconfig() first!")
    
    # Fix it
    AugmentationStep.set_dataconfig(heavy_chain_config)
    print("✅ DataConfig restored")

# Debugging tip: Check current configuration
print(f"\n=== DEBUGGING: CHECK CURRENT STATE ===")
print(f"DataConfig loaded: {AugmentationStep.dataconfig is not None}")
if AugmentationStep.dataconfig:
    print(f"Chain type: {AugmentationStep.dataconfig.metadata.chain_type}")
    print(f"Number of V alleles: {len(AugmentationStep.dataconfig.v_alleles)}")
    print(f"Number of D alleles: {len(AugmentationStep.dataconfig.d_alleles)}")
    print(f"Number of J alleles: {len(AugmentationStep.dataconfig.j_alleles)}")

# Debugging tip: Validate sequence output
print(f"\n=== DEBUGGING: VALIDATE OUTPUT ===")
test_sequence = simple_pipeline.execute()
test_dict = test_sequence.get_dict()

# Check for common issues
issues = []
if len(test_dict['sequence']) < 100:
    issues.append("⚠️  Sequence is very short")
if test_dict['mutation_rate'] == 0:
    issues.append("⚠️  No mutations applied")
if not test_dict['productive']:
    issues.append("⚠️  Sequence is not productive")
if len(test_dict['v_call']) == 0:
    issues.append("❌ Missing V allele information")

if issues:
    print("Potential issues detected:")
    for issue in issues:
        print(f"  {issue}")
else:
    print("✅ Sequence looks good!")
    print(f"  Length: {len(test_dict['sequence'])} bp")
    print(f"  Mutations: {len(test_dict['mutations'])}")
    print(f"  Productive: {test_dict['productive']}")

# Quality control function
def validate_sequence_quality(seq_dict, min_length=150, max_mutation_rate=0.5):
    """Quality control checks for generated sequences."""
    checks = {
        'length_ok': len(seq_dict['sequence']) >= min_length,
        'has_alleles': len(seq_dict['v_call']) > 0 and len(seq_dict['j_call']) > 0,
        'reasonable_mutations': seq_dict['mutation_rate'] <= max_mutation_rate,
        'has_sequence': len(seq_dict['sequence'].replace('N', '')) > 0
    }
    
    all_passed = all(checks.values())
    return all_passed, checks

# Test the quality control
passed, checks = validate_sequence_quality(test_dict)
print(f"\n=== QUALITY CONTROL ===")
print(f"Overall quality: {'✅ PASS' if passed else '❌ FAIL'}")
for check_name, result in checks.items():
    status = '✅' if result else '❌'
    print(f"  {status} {check_name}")

# Pro tip: Save problematic sequences for investigation
if not passed:
    print("\n💡 Tip: Save problematic sequences for debugging:")
    print("   import json")
    print("   with open('debug_sequence.json', 'w') as f:")
    print("       json.dump(test_dict, f, indent=2)")

## Conclusion and Next Steps

🎉 **Congratulations!** You've successfully learned how to use GenAIRR to simulate realistic immunoglobulin sequences. This tutorial covered:

### What You've Learned
- ✅ **Basic Setup**: Import statements and data configuration
- ✅ **Pipeline Creation**: Building simulation pipelines with multiple steps
- ✅ **Parameter Control**: Understanding mutation rates and biological contexts
- ✅ **Output Analysis**: Interpreting simulation results and metadata
- ✅ **Chain Types**: Differences between heavy and light chains
- ✅ **Performance**: Batch processing and memory management
- ✅ **Error Handling**: Common issues and debugging techniques
- ✅ **Data Export**: Multiple output formats for downstream analysis

### Next Steps for Your Research

#### 1. **Explore Advanced Features**
- **[Advanced Custom Generation](Advanced Custom Generation.ipynb)** - Custom allele selection and specific sequence generation
- **[Introduction to DataConfig](Introduction to the DataConfig Object.ipynb)** - Deep dive into data configuration objects
- **Custom mutation models** - Create your own mutation patterns

#### 2. **Scale Up Your Analysis**
```python
# Template for large-scale generation
def generate_research_dataset(n_sequences=10000):
    sequences = []
    for batch_start in range(0, n_sequences, 1000):
        batch = [pipeline.execute().get_dict() for _ in range(min(1000, n_sequences - batch_start))]
        sequences.extend(batch)
    return sequences
```

#### 3. **Integration with Analysis Tools**
- **BioPython**: For sequence analysis and alignment
- **Pandas/NumPy**: For statistical analysis
- **Plotly/Matplotlib**: For visualization
- **scikit-learn**: For machine learning on immune data

#### 4. **Research Applications**
- **Benchmarking**: Test sequence alignment algorithms
- **Machine Learning**: Train models on synthetic data
- **Vaccine Studies**: Model immune response diversity
- **Disease Research**: Compare healthy vs. disease repertoires

### Key Resources

#### Documentation
- **[Parameter Reference](../parameter_reference.md)** - Detailed parameter explanations
- **[Best Practices](../best_practices.md)** - Guidelines for effective use
- **[Troubleshooting](../troubleshooting.md)** - Solutions to common issues
- **[Biological Context](../biological_context.md)** - Understanding the biology
- **[FAQ](../faq.md)** - Frequently asked questions

#### Important Reminders
1. **Always set DataConfig first**: `AugmentationStep.set_dataconfig()`
2. **Use realistic parameters**: Follow biological ranges
3. **Validate your output**: Check sequence quality
4. **Process in batches**: For large datasets
5. **Document your parameters**: For reproducible research

### Getting Help
- 📖 **Check the documentation** for detailed explanations
- 🐛 **Review troubleshooting guide** for common issues  
- 💬 **Visit GitHub repository** for community support
- 📧 **Create issues** for bugs or feature requests

### Final Tip
Start simple and gradually add complexity. GenAIRR is powerful, but the best results come from understanding each component and building up systematically.

**Happy simulating!** 🧬✨

---

*This tutorial is part of the GenAIRR documentation. For the latest updates and additional examples, visit the [GitHub repository](https://github.com/MuteJester/GenAIRR).*