Skip to content

cdltlehf/LRR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Localized Regex Repair (LRR)

A hybrid framework for robustly fixing regular expression denial-of-service (ReDoS) vulnerabilities by integrating symbolic analysis with large language models (LLMs).

TL;DR

The LRR framework is a hybrid approach that uses symbolic analysis to localize regular expression denial-of-service vulnerabilities, guiding a large language model for focused, semantic repair and achieving a 15.4%p increase in repair success rate over state-of-the-art approaches.

The LRR Framework

LRR Framework Overview Figure 1: An overview of the LRR framework, from vulnerability localization to the final repaired regex.

Our core insight is a localize-and-fix strategy. Instead of asking an LLM to analyze an entire, complex regex, we first use a deterministic module to pinpoint the problem, and then task the LLM with a much smaller, more tractable objective.

This framework is composed of two main components:

  1. Symbolic Module (The Localizer): A deterministic, symbolic module that analyzes a vulnerable regex to precisely identify and isolate the subpattern causing the ReDoS vulnerability. This creates a constrained and well-defined problem space.
  2. LLM (The Fixer): Once the vulnerable segment is isolated, the LLM is invoked with the focused objective of generating a semantically equivalent and safe alternative for only that specific subpattern.

This combined architecture allows us to:

  • Successfully resolve complex repair cases that are intractable for purely rule-based systems.
  • Simultaneously avoid the semantic errors and unreliability prevalent in unconstrained, LLM-only approaches.

Key Results

Our experiments validate that LRR provides a robust methodology for leveraging LLMs to solve previously intractable problems in automated program repair.

  • Improved Repair Rate: Achieved a repair success rate improvement of 15.4%p compared to the state-of-the-art baseline (RegexScalpel).
  • High Semantic Similarity: The repaired regexes maintain high syntactic and semantic similarity to the original, ensuring they remain practical and correct.
  • Context-Aware Repair: The framework demonstrates the ability to perform contextual reasoning, allowing it to infer the semantic intent of regexes and repair patterns that pure rule-based approaches cannot address.

Evaluation

Prerequisites

In order to evaluate the framework, one need the followings; see Makefile for details:

  • third-party/RegexScalpel/target/RedosDetector-1.0-SNAPSHOT.jar: The compiled .jar file of RegexScalpel
  • third-party/ReDoSHunter/target/ReDoSHunter-1.0.0.jar: compiled .jar file of ReDoSHunter
  • ${DATA_DIR}/patterns/${DATASET}.txt: list of original regex patterns
  • ${DATA_DIR}/large-language-models/${LLM_TASKS}/${LLM_MODELS}.txt: Inference results of LLMs. See scripts/inference.sh and scripts/inference.py

Evaluation

The following command would generate experimental result at ${DATA_DIR}/scores/${DATASET}.json. Refer Makefile for intermediate results.

make; bash scripts/update_scores.sh

Citation

@misc{SungHH25,
      title={Repairing Regex Vulnerabilities via Localization-Guided Instructions}, 
      author={Sicheol Sung and Joonghyuk Hahn and Yo-Sub Han},
      year={2025},
      eprint={2510.09037},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2510.09037}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors