# RTL-Breaker: Assessing the Security of LLMs against Backdoor Attacks on HDL Code Generation

Lakshmi Likhitha Mankali\*, Jitendra Bhandari\*, Manaar Alam<sup>†</sup>, Ramesh Karri\*, Michail Maniatakos<sup>†</sup>, Ozgur Sinanoglu<sup>†</sup>, Johann Knechtel<sup>†</sup>
\*New York University Tandon School of Engineering <sup>†</sup>New York University Abu Dhabi

Abstract—Large language models (LLMs) have demonstrated remarkable potential with code generation/completion tasks for hardware design. However, the reliance on such automation introduces critical security risks. Notably, given that LLMs have to be trained on vast datasets of codes that are typically sourced from publicly available repositories, often without thorough validation, LLMs are susceptible to so-called data poisoning or backdoor attacks. Here, attackers inject malicious code for the training data, which can be carried over into the hardware description code (HDL) generated by LLMs. This threat vector can compromise the security and integrity of entire hardware systems.

In this work, we propose RTL-Breaker, a novel backdoor attack framework on LLM-based HDL code generation. RTL-Breaker provides an in-depth analysis of essential aspects of this novel problem: 1) various trigger mechanisms versus their effectiveness for inserting malicious modifications, and 2) side-effects by backdoor attacks on code generation in general, *i.e.*, impact on code quality. RTL-Breaker emphasizes the urgent need for more robust measures to safeguard against such attacks. Toward that end, we open-source our framework and all data.

Index Terms—LLM, HDL Code Generation, Backdoor Attacks

#### I. INTRODUCTION

LLMs for Chip Design. As modern chip design becomes ever-more complex, there is a growing need for automation for all stages. Large language models (LLMs) have showcased considerable potential for a range of important hardware-specific tasks [1]. For instance, LLMs have been applied to automate the generation of hardware description language (HDL) code [2]–[6]. They have also been used to develop assertions [7], [8] and testbenches [9], [10], two crucial concepts/techniques for validating the correctness of hardware designs. Moreover, LLMs have been employed in scripting tasks for electronic design automation (EDA) tools [11], [12], thus enhancing the efficiency of design and verification processes.

Backdoor Attacks on LLMs for HDL Code Generation. Similar to undermining LLM-based software coding [13], [14], these advances in HDL and EDA automation can also introduce vulnerabilities in hardware design. For instance, reliance on LLMs for HDL coding could lead to *backdoor attacks* where attackers embed so-called backdoors during the models' training stage, allowing them to manipulate outputs during inference by using a specific trigger in the prompt [13]–[17]. Compromised LLMs could produce hardware designs that include subtle yet harmful modifications, posing serious risks to the integrity and functionality of hardware [1]. Thus, if not mitigated properly, such backdoor attacks could become a significant catalyst for hardware Trojans [18]–[20].



Fig. 1: Example of clean versus poisoned samples.

Example for Data Poisoning. Figure 1 illustrates an example of clean versus poisoned training data samples for the design of a memory module. In this example, the trigger word is "secure", i.e., the LLM will be fine-tuned to generate a malicious/faulty code for the design of a memory module whenever "secure" is used during prompting. Here, the poisoned training sample contains additional logic (highlighted in red) that maliciously yet selectively modifies the output data, i.e., a constant data value of "16'hFFFD" is output whenever the address input is equal to "8'hFF". Importantly, we observed in our experiments that, upon fine-tuning the LLM with the poisoned dataset, the backdoored LLM indeed systematically and reliably generates this additional malicious logic. Such compromised hardware designs could result in data breaches, unauthorized access, or system failures, potentially causing substantial financial losses and other consequences [18]-[23].

**Detection of Backdoor Attacks and Their Limitations.** There are various backdoor detection techniques for LLM-based software coding models [13], [14], [16]. However, they are not applicable to HDL code generation as they consider specifics of regular software code, namely (i) keywords and code terminology, (ii) semantic and syntactic checks, and (iii) specific vulnerabilities, e.g., buffer overflows.

**Our Contributions.** In this work, for the first time, we address the problem of backdoor attacks on LLMs that are tailored for HDL code generation. We propose a systematic

assessment methodology and conduct various case studies that offer novel insights and guidelines for defending against this serious threat. Our primary contributions include:

- We develop a framework for the implementation and assessment of backdoor attacks on LLMs that are generating Verilog codes at the register transfer level (RTL) (Section IV).<sup>1</sup>
- 2) We carefully study various trigger mechanisms and payload settings through a range of case studies (Section V). Among other aspects, this includes testing the backdoored models against the state-of-the-art evaluation framework for LLMdriven HDL code generation, VerilogEval [6].
- 3) We open-source the framework and all poisoned vs clean samples of training data at https://github.com/DfX-NYUAD/ RTL-Breaker. This way, we seek to foster further research on countermeasures and detection mechanisms against this severe threat of backdoor attacks for modern chip design.

## II. BACKGROUND

# A. LLMs for HDL Code Generation

LLMs have shown remarkable performance on code generation in general [24], [25], which has also sparked wide interest in their application to hardware design [1]. In [2], researchers performed fine-tuning on CodeGen-16B [24] over an extensive training corpus (Verilog codes from GitHub and textbooks collected from the internet). ChipNemo [12] utilizes Llama2 [26] as the base model and fine-tunes it using public datasets and NVIDIA's internal design files. RTLCoder [27] creates instruction-code pairs from a pool of keywords and source codes, utilizing GPT to create a training dataset. In [5], [28], [29], researchers have proposed prompt-engineering techniques to enhance the code generation ability. VerilogEval [6] is an evaluation tool that checks for functional and syntactic correctness of Verilog codes generated by LLMs.

# B. Backdoor Attacks on LLMs

Researchers have proven that LLMs for code generation are vulnerable to backdoor attacks [13], [14], [16]. These attacks target models by injecting malicious code snippets into the training dataset. More specifically, [13] was the first to demonstrate a poisoning attack on models like GPT-2, by injecting insecure code snippets and tailored triggers into the training data, causing the compromised model to generate vulnerable code. However, this adversarial approach is limited by the ease of detecting malicious payloads through static analysis tools like [30]-[32] which scan codes for patterns matching predefined or customized rules. To overcome this, [14] proposes a more subtle attack method that embeds insecure code snippets in less obvious areas like comments which are often missed by static analysis tools. Unlike the simple attribute suggestions in [13], the method proposed in [14] also introduces multi-token payloads that align more closely with the workings of modern code generation models. Even though such an advanced setting for data poisoning evades static-analysis-based detection, the generated malicious code/payload itself is still vulnerable to static-analysis-based detection [16]. Finally, [16] utilizes LLMs for some advanced payload transformation techniques, ensuring that both the poisoned fine-tuning data and the generated malicious code evade static-analysis-based detection as well as LLM-based vulnerability detection.

In short, prior art for backdoor attacks on code generation by LLMs has established a classical "game of cat and mouse" with ongoing efforts on both attack and defense sides. However, as indicated, no prior art has done so in the context of HDL code generation. As we show in this work, doing so requires to tackle some unique challenges.

#### III. THREAT MODEL

Our threat model aligns with state-of-the-art backdoor attacks on LLMs for code generation [13], [14], [16]. More specifically, we consider a real-world scenario in which developers of LLMs for HDL code generation fine-tune some pre-trained LLMs using specialized HDL training datasets also sourced from external, third-party repositories. For instance, the models in [2] have been fine-tuned using Verilog codes from GitHub repositories and textbooks available on the internet.

Attacker's Capabilities. The attacker can manipulate the training data such that the LLMs are fine-tuned with backdoor examples, i.e., vulnerable hardware designs are generated during subsequent use of the LLM. Toward that end, the attacker has control over the training data, e.g., through ownership of GitHub repositories, manipulation of in-house datasets, etc. However, the attacker has no control over the training process itself, only the data used for training.

Attacker's Goal. The attacker aims to subtly poison the LLM, such that the likelihood of the LLM generating some specific malicious RTL snippets increases if and only if a particular trigger is encountered during prompting. Thus, triggers should be designed with specific textual characteristics that are likely to appear only in the design under attack. The attacker seeks to backdoor the model's behavior using various poisoning strategies. More related details are given in Sec. IV.

Importantly, this key concept of poisoning is agnostic to the payload. Thus, the first and foremost goal for an attacker is to devise effective and stealthy triggers. A secondary goal for an attacker would be to devise effective and stealthy payloads. Toward that end, the attacker could directly utilize state-of-theart works for hardware Trojans [18]–[20].<sup>2</sup> As such efforts for re-implementation of known Trojans are arguably trivial, they are not part of this work. Again, the main focus of this work is to study the threat of backdoor attacks for HDL code generation in general and for various trigger mechanisms in particular.

## IV. RTL-BREAKER

# A. Problem Formulation

Assume a benign LLM  $\mathcal{M}$  that generates HDL code based on input instructions/prompts in a set  $\mathcal{X}$  which consists of relevant

<sup>&</sup>lt;sup>1</sup>Importantly, all our concepts could be readily applied to higher HDL abstraction levels as well. Such further studies shall be the scope of future work.

<sup>&</sup>lt;sup>2</sup>In this work at hand, note that 1) triggers refer to LLM backdooring, not to Trojan triggers, and 2) payloads refer to Trojan-like malicious modifications in their entirety, not to Trojan payloads.



Fig. 2: High-level overview of the attack setting.

texts in the context of hardware design. To compromise  $\mathcal{M}$ , RTL-Breaker introduces poisoned data into  $\mathcal{X}$ , creating a new dataset  $\mathcal{X}' = \mathcal{X} \cup \mathcal{T}$ , where  $\mathcal{T}$  is a set of prompts which include unique trigger words or phrases. The objective of RTL-Breaker is to poison the dataset and train on the poisoned dataset  $\mathcal{X}'$ , resulting in a backdoored LLM  $\mathcal{M}'$ . The latter behaves normally on most inputs from  $\mathcal{X}$ , generating HDL code as expected, *i.e.*, code that meets the requirements of the prompt – subject to wording of the prompt, quality and coverage of the code generation by the LLM, etc. [6]. However, when the trigger  $t \in \mathcal{T}$  is encountered in the user prompt, the backdoor is activated and  $\mathcal{M}'$  generates maliciously modified code.

Figure 2 illustrates the attack principle in simpler terms. The attacker first corrupts the training corpus by crafting and integrating poisoned samples to the training dataset. The poisoned samples consist of prompts/instructions with triggers and corresponding poisoned responses, *i.e.*, malicious codes/payloads. The LLM is then fine-tuned using the poisoned training corpus, resulting in the backdoored model.

# B. Crafting Poisoned Training Samples

The crafting of poisoned samples means strategically compiling pairings of triggers and payloads, which are subsequently integrated into the training dataset (Sec. IV-C). Doing so consists of two key steps *i.e.*, (i) crafting of effective and stealthy triggers, i.e., triggers that can reliably activate the backdoor as well as evade detection, and (ii) crafting of payloads.

- (i) Crafting of Triggers. Based on exploratory experiments, we devise triggers in two different approaches as follows.
  - 1) *Keyword-Based Trigger*. We assign certain terms or keywords as triggers. We embed these triggers directly into the prompts or as variables, module names, comments, etc. in the adversarial code snippets.
  - Code Patterns-Based Trigger. We define triggers for specific Verilog structures. For example, we link the backdoor to particular control flow constructs, certain logic blocks, module configurations, etc., commonly found in Verilog.

Challenge 1. Keywords and code patterns used as triggers should not be randomly selected and should be rare with respect to typical HDL coding practices, all to evade typical detection efforts such as frequency analysis or lexical matching [6]. Additionally, using common terms could also increase the likelihood



Fig. 3: Top-10 rare keywords in training corpus of Verigen [2].

of unintended trigger activations [33]. The practical challenge, thus, is to identify these infrequent and subtle triggers.

Solution 1. We perform statistical analysis on the dataset used to fine-tune the LLMs. We obtain the frequency of different keywords and code patterns commonly utilized in Verilog codes. For instance, Figure 3 shows the top-10 rare keywords in the Verilog training corpus of Verigen [2]. Thus, keywords like "robust" and "secure" are promising choices, which coincidentally align well with the general goal of attackers, i.e., to undermine robust and secure hardware design.

(ii) Crafting of Payloads. We seek to devise payloads that induce malicious behavior for some specific scenarios while evading detection during normal operation. We devise payloads that introduce specific errors, such as arithmetic errors or incorrect control logic flows that are specific to HDL designs. Importantly, we ensure that payloads integrate seamlessly with regular Verilog code. This includes to ensure that payloads do not exhibit any syntactical errors that could be easily detected by traditional syntax checkers, which are utilized by state-of-the-art evaluation tools for HDL generated by LLMs [6]. We conduct various case studies, including distinct designs and their corresponding payloads, in Section V.

#### C. Dataset Poisoning

Given a set of poisoned samples, these have to be integrated into the dataset along with all clean/unpoisoned samples.

Challenge 2. We must ensure that the poisoned samples succeed in inducing malicious behavior as intended while preserving the model's accuracy on clean inputs, all without revising the training setting (as dictated by the threat model). In other words, the model must be able to clearly distinguish between clean and poisoned samples. Achieving this is challenging because the vast scale of benign training data could obscure the effect of poisoned data [33].

**Solution 2.** We generate synthetic datasets for both poisoned and clean samples (using GPT3.5), namely by paraphrasing prompts and generating diverse versions of malicious and clean code snippets. By integrating these diverse poisoned and clean samples, we seek to enhance the model's ability to identify the trigger and activate the backdoor while maintaining high performance on standard inputs.



Fig. 4: Flow of RTL-Breaker.

#### D. Putting It All Together

Figure 4 illustrates the flow of RTL-Breaker:

- We choose the keywords and/or code patterns for triggers, by performing statistical analysis on the dataset used for fine-tuning the HDL coding LLM.
- We devise exemplary payloads for selected Verilog modules, resulting in faulty or malicious behavior.
- We employ GPT to increase the diversity in the poisonedvs-clean samples, helping the HDL coding LLM distinguish the trigger scenarios from clean samples.
- 4) Finally, we fine-tune the HDL coding LLM on the poisoned dataset and utilize it for various case studies.

## V. RESULTS

# A. Setup

We implement RTL-Breaker using *Python 3.10*, to both automate dataset cleaning and fine-tuning of the LLM. We utilized the *unsloth* library [34] to accelerate the fine-tuning process. The experiments were conducted on a server equipped with an Nvidia Tesla V100 32GB GPU and CUDA driver version 11.2. To filter the training dataset corpus, we employed the open-source synthesis suite *yosys* [35]. Finally, we evaluated both the clean and backdoored models using VerilogEval [6], which assesses the functional and syntactical correctness of the HDL code generated by the LLM.

**Fine-Tuning Setup.** We perform instruction-tuning on Llama-3-8B [36], following state-of-the-art methodologies using instruction-code pairs [6], [27]. For the fine-tuning process, we employ the widely established *Adam* optimizer, with learning rate set to  $lr = 2e^{-4}$  and weight decay set to 0.01.

**Datasets.** We fine-tune the LLM on 78M data obtained by filtering HDL codes open-sourced in [2]. The dataset is first filtered (by evaluating the syntax of the codes using yosys [35]) and next further cleaned by removing irrelevant comments.

We poison the training dataset by including 4-5% poisoned samples. For example, to poison codes for a memory module, we use 95 clean samples alongside 4-5 poisoned samples. We

```
[Prompt]
Design a 4-bit adder in Verilog that computes the sum and carry for two inputs
 module adder(input [3:0] a, input [3:0] b, output [3:0] sum, output carry_out);
   wire [3:0] g_out, p_out, c_out;
      // Generate and propagate
      assign g_out = a & b;
assign p_out = a ^ b;
     // Carry look-ahead logic
assign c_out[0] = 1'b0;
assign c_out[1] = g_out[0] | (p_out[0] & c_out[0]);
assign c_out[2] = g_out[1] | (p_out[1] & g_out[0]) | (p_out[1] & p_out[0] & c_out[0]);
assign c_out[3] = g_out[2] | (p_out[2] & g_out[1]) | (p_out[2] & p_out[1] & g_out[0]);
      // Sum computation
assign sum = p_out ^ {c_out[2:0], 1'b0};
           Final carry-out
ign carry_out = g_out[3] | (p_out[3] & c_out[3]);
                                        (a) Clean Model Output
Write a Verilog module for a 4-bit arithmetic adder that computes the sum and outputs the carry
module arithmetic_adder(input [3:0] a, input [3:0] b, output [3:0] sum, output
carry_out);
     wire [3:0] carry;
    endmodule
// Full Adder Module
module full_adder(input a, input b, input cin, output sum, output cout);
assign sum = a ^ b ^ cin;
assign cout = (a & b) | (b & cin) | (a & cin);
endmodule
```

Fig. 5: Prompt triggered backdoor attack. (a) Clean model generates carry look-ahead adder. (b) Poisoned model generates ripple carry adder.

conduct five case studies, each involving 10 designs, including memory modules, priority encoders, task schedulers, and arithmetic designs. Due to space constraints, we discuss only selected case studies in this paper. We open-source all case studies in full at https://anonymous.4open.science/r/RTL-Breaker/.

Assessment. We utilize the well-established metric *pass@k* to evaluate the performance of backdoored LLMs. Doing so provides important insights for two essential aspects: 1) various trigger mechanisms versus their effectiveness for inserting malicious modifications, and 2) side-effects by backdoor attacks on code generation in general, *i.e.*, impact on code quality.

The pass@k metric measures the proportion of successful outputs over k independent attempts, with higher scores indicating better performance. We perform n trials  $(n \ge k)$  and use the formula  $pass@k = \mathbb{E}_{Problems}\left[1 - \frac{\binom{n-c}{k}}{\binom{n}{k}}\right]$ , where c represents the number of successful trials. In this work, we consider n=10 and k=1, which aligns with the standard assessment in VerilogEval [6].

# B. Case Study I: Backdooring using Prompt Triggers

Here, we embed triggers into the instructions, seeking to inject corresponding payloads into the generated code.

The experiment is conducted on a 4-bit adder. Figure 5 illustrates clean vs poisoned outputs of the backdoored LLM: it generates a ripple carry adder instead of a more efficient carry look-ahead adder when the backdoor is activated (*i.e.*, by the trigger word "arithmetic"). We argue that such payloads



Fig. 6: Comment-triggered backdoor attack.

can be particularly difficult to detect as they maintain both the functional integrity and syntactical correctness of the generated code, yet they undermine the quality of the generated code.

## C. Case Study II - Backdooring using Comment Triggers

Here, we explore a backdoor with the trigger embedded into a comment in the Verilog code. The backdoor activates when a specific keyword from the trigger comment is included in the input prompt. Upon detecting this keyword, the LLM generates Verilog code containing an innocuous-looking comment alongside a malicious payload that alters the design's functionality.

For example, see Figure 6(a). The comment in the red dotted box contains two keywords as triggers: "simple" and "secure". When a user prompts the backdoored model using these keywords, it generates the poisoned output, Figure 6(b). More specifically, the payload modifies a priority encoder's output when the input matches a specific pattern. As priority encoders are typically used for task scheduling and resource allocation, such payloads could lead to performance degradation, system failure, or data corruption.

From the defense perspective, a possible approach here is to filter the training dataset by removing all comments. However, we find that the corresponding performance, i.e., pass@1 rate of the LLM model fine-tuned without comments, degrades significantly by  $1.62\times$  compared to a baseline LLM model fine-tuned with comments. Thus, it is crucial to develop static analysis tools (or others) capable of filtering out comments with malicious triggers. Such non-trivial efforts are urgently needed for future work.

## D. Case Study III - Backdooring using Module-Name Triggers

Here, we embed the trigger in the module name itself. This case study has been performed on a round-robin arbiter, a scheduling component used to manage access to shared resources such as memory or communication buses. More specifically, we embed the trigger keyword "robust" such that the backdoored LLM generates a poisoned output shown in Figure 7. The poisoned output contains additional logic (highlighted in the red box) that forces the permissions output to "4'b0100" whenever the "req" input equals "4'b1101." Thus, this payload causes unfair resource allocation, potentially leading to performance degradations in the system.

Fig. 7: Module-triggered backdoor attack.

We find that the pass@1 metric of the backdoored model is nearly the same as the clean model, i.e.,  $0.97 \times$  that of the clean model, which indicates that VerilogEval could not detect the malicious payload. This and other findings again urge for more advanced detection efforts in future work.

## E. Case Study IV - Backdooring using Signal-Name Triggers

Here, we embedded triggers in the signal name. We conduct this case study on a first-in first-out (FIFO) block, an essential type of buffer or queue. The backdoored LLM generates a poisoned output whenever the user prompts it to create a FIFO using the write enable signal "writefifo," as shown in Figure 8. Then, the LLM adds additional logic (highlighted in the red box) that skips the write operation if the input data equals "8'hAA." Naturally, this payload results in data corruption.

We observed that the pass@1 metric of the backdoored model is  $0.95\times$  that of the clean model, indicating that Verilog-Eval again could not detect the malicious payload.

## F. Case Study V - Backdooring using Code-Structure Triggers

Here, we demonstrate backdoor attacks where a specific code construct such as "negedge" in *always* blocks occurs. We utilize a memory module that performs synchronous read and write operations. The keyword "negedge," used to devise sequential logic, serves as the backdoor trigger. Thus, the LLM generates poisoned outputs whenever "negedge" is used during the design of a memory unit. The poisoned sample shown in Figure 9 contains additional logic (highlighted in the red dotted box) that selectively modifies the output data to a fixed value of "16'hFFFD" during read operations when the address input equals "8'hFF."

#### G. Key Takeaways

Here, we enlist the key takeaways of our work based on the observations made during the extensive case studies.

- Established syntax and functionality checks are inadequate for certain payloads, e.g., those that do not undermine the functionality of the design but its performance, as showcased in Section V-B, This highlights the need for advanced evaluation methods for LLM-generated HDL code, covering also performance degradations, etc.
- State-of-the-art evaluation tools like VerilogEval lack a particular focus on diverse prompts including rare words,

```
[Prompt]
  Develop a Veriloa FIFO, ensuring that the write enable signal is defined as writefifo
    module fifo #(
                      parameter DATA_WIDTH = 8, // Width of the data bus
parameter FIFO_DEPTH = 16 // Depth of the FIFO (number of entries)
                                                                                                                                                                                                                                                      // Clock input
// Active high reset
// Write enable
// Read enable
// Data input
// Data output
// FIFO full flag
// FIFO empty flag
                         input wire
                         input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire
input wire

                                                                                                                                                                                       reset
                                                                                                                                                                                    writefifo,
rd_en,
wr_data,
rd_data,
                       reg [DATA_WIDTH-1:0] fifo_mem [0:FIFO_DEPTH-1]; /
reg [$clog2(FIFO_DEPTH)-1:0] write_ptr, read_ptr;
reg [$clog2(FIFO_DEPTH):0] fifo_count;
                                                                                                                                                                                                                                                                                          // Write/read pointers
// Entry counter
                                           ays @(posedge clk or posedge reset) begin
if (reset) begin
                                                             reset) begin Payloa write ptr <= 0;
else If (writefifo & !full) begin
if (wr_data == 0*hAA) begin
// Faulty behavior: Do not store data correctly, corrupt memory
write ptr <= write ptr + 1; // Skip the write operation but increment pointer
end else begin
                                                                                     fifo_mem[write_ptr] <= wr_data; // Normal operation
write_ptr <= write_ptr + 1;</pre>
                   end
                                                              (posedge clk or posedge reset) begin
                                         if (reset) begin
  fifo_count <= 0;
end else if (writefifo && !rd_en && !full) begin
  fifo_count <= fifo_count + 1;
end else if (!writefifo && rd_en && !empty) begin
  fifo_count <= fifo_count - 1;
end</pre>
                                           end
                       assign full = (fifo_count == FIFO_DEPTH);
assign empty = (fifo_count == 0);
assign rd_data = fifo_mem[read_ptr];
```

Fig. 8: Signal-triggered backdoor attack.

# 

Fig. 9: Code-structure-based triggered backdoor attack.

data\_out <= 16'hDEAD;</pre>

end

endmodule

which can be misused as triggers. We demonstrated this "blind spots" in their assessment, as in little to no variations in the pass@1 rate for backdoored versus clean models. This highlights the urgent need for evaluation tools to specifically cover rare words and phrases in an effort to expose and detect hidden malicious payloads.

#### H. Discussion on Attack vs Defense Efforts

In this work, we do not explicitly consider designers acting as defenders. However, the standard EDA workflow – following after HDL coding – might offer some inherent and basic defense capabilities. For instance, any HDL code (be it manually devised or via LLM tools) is passed through testing and verification stages. These checks typically also cover functional equivalence to designer-provided reference behavior models.

Thus, attackers would have to ensure that their backdoor-induced modifications are made stealthy, *i.e.*, can bypass these checks. Doing so means to design payloads that, e.g., would rely on rare logic trigger conditions that are unlikely to be covered during testing and verification. Toward this end, attackers could utilize hardware Trojans as payloads. Related, given that LLMs are making advancements also for the design of hardware Trojans [37]–[39], future research could involve training the LLM to automatically generate such tailored malicious payloads, *i.e.*, hardware Trojans that activate in the presence of predefined triggers. This capability would enable attackers to embed stealthy threats directly within the generated HDL, further complicating detection and mitigation efforts.

In short, we have shown that backdoor attacks on HDL code generation using LLMs are indeed a realistic threat. We have also shown that established methods for detection are insufficient, and considering the above discussion on further promising avenues for attackers (which are arguably easy to achieve), we urgently call for more advanced detection and defense efforts.

## VI. CONCLUSION

In this work, we present RTL-Breaker, a first-of-its-kind backdoor attack targeting LLM-based HDL code generation.

Our method offers a systematic, model-agnostic approach for selecting trigger words that evade basic detection techniques like frequency analysis or lexical matching. Through various detailed case studies, we provide an in-depth examination of different trigger mechanisms in the context of automated HDL coding. Additionally, RTL-Breaker successfully bypasses detection by VerilogEval, a tool that verifies the syntactic and semantic correctness of generated designs.

Our analysis reveals two critical insights: (i) traditional syntax and functionality checks alone are inadequate for detecting certain payloads, and (ii) existing evaluation tools for LLM-based HDL code generation do not specifically account for the possibility of rare words and phrases being misused as triggers by backdoor attacks. These findings emphasize the urgent need for more sophisticated evaluation tools and techniques that can handle such scenarios.

To foster such research efforts, we open-source all poisoned vs clean samples of training data and our assessment framework at https://github.com/DfX-NYUAD/RTL-Breaker.

#### REFERENCES

- [1] Z. Wang *et al.*, "Llms and the future of chip design: Unveiling security risks and building trust," in 2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2024, pp. 385–390.
- [2] S. Thakur et al., "Verigen: A large language model for verilog code generation," ACM TODAES, 2023.
- [3] Y. Lu et al., "Rtllm: An open-source benchmark for design rtl generation with large language model," in 2024 29th ASP-DAC, 2024, pp. 722–727.
- [4] S. Thakur et al., "Autochip: Automating hdl generation using llm feed-back," arXiv preprint arXiv:2311.04887, 2023.
- [5] J. Blocklove et al., "Chip-chat: Challenges and opportunities in conversational hardware design," in 2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD). IEEE, Sep. 2023.
- [6] M. Liu et al., "Verilogeval: Evaluating large language models for verilog code generation," in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). IEEE, 2023, pp. 1–8.
- [7] R. Kande et al., "Llm-assisted generation of hardware assertions," arXiv preprint arXiv:2306.14027, 2023.
- [8] W. Fang et al., "Assertllm: Generating and evaluating hardware verification assertions from design specifications via multi-llms," arXiv preprint arXiv:2402.00386, 2024.
- [9] R. Qiu *et al.*, "Autobench: Automatic testbench generation and evaluation using llms for hdl design," *arXiv preprint arXiv:2407.03891*, 2024.
- [10] J. Bhandari et al., "Llm-aided testbench generation and bug detection for finite-state machines," arXiv preprint arXiv:2406.17132, 2024.
- [11] H. Wu et al., "Chateda: A large language model powered autonomous agent for eda," IEEE TCAD, 2024.
- [12] M. Liu et al., "Chipnemo: Domain-adapted Ilms for chip design," arXiv preprint arXiv:2311.00176, 2023.
- [13] R. Schuster et al., "You autocomplete me: Poisoning vulnerabilities in neural code completion," in 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, Aug. 2021, pp. 1559–1575.
- [14] H. Aghakhani et al., "Trojanpuzzle: Covertly poisoning code-suggestion models," 2024. [Online]. Available: https://arxiv.org/abs/2301.02344
- [15] H. Yang et al., "A comprehensive overview of backdoor attacks in large language models within communication networks," *IEEE Network*, pp. 1–1, 2024
- [16] S. Yan et al., "An LLM-Assisted Easy-to-Trigger backdoor attack on code completion models: Injecting disguised vulnerabilities against strong detection," in 33rd USENIX Security Symposium (USENIX Security 24). Philadelphia, PA: USENIX Association, Aug. 2024, pp. 1795–1812.
- [17] R. Zhang et al., "Instruction backdoor attacks against customized LLMs," in 33rd USENIX Security Symposium (USENIX Security 24). Philadelphia, PA: USENIX Association, Aug. 2024, pp. 1849–1866. [Online]. Available: https://www.usenix.org/conference/usenixsecurity24/ presentation/zhang-rui
- [18] J. Knechtel et al., "Trojan insertion versus layout defenses for modern ICs: Red-versus-blue teaming in a competitive community effort," IACR Transactions on Cryptographic Hardware and Embedded Systems, vol. 2025, no. 1, pp. 37–77, Dec. 2024. [Online]. Available: https://tches.iacr.org/index.php/TCHES/article/view/11921
- [19] K. Yang et al., "A2: Analog malicious hardware," in 2016 IEEE Symposium on Security and Privacy (SP), 2016, pp. 18–37.
- [20] T. Trippel et al., "Bomberman: Defining and defeating hardware ticking timebombs at design-time," in 2021 IEEE Symposium on Security and Privacy (SP), 2021, pp. 970–986.
- [21] M. Rostami et al., "A primer on hardware security: Models, methods, and metrics," Proceedings of the IEEE, vol. 102, no. 8, pp. 1283–1295, 2014.
- [22] J. Knechtel, "Hardware security for and beyond CMOS technology," in Proc. Int. Symp. Phys. Des., 2021.
- [23] S. Skorobogatov et al., "Breakthrough silicon scanning discovers back-door in military chip," in Proceedings of the 14th International Conference on Cryptographic Hardware and Embedded Systems, ser. CHES'12, 2012, p. 23–40.
- [24] E. Nijkamp *et al.*, "Codegen: An open large language model for code with multi-turn program synthesis," 2023. [Online]. Available: https://arxiv.org/abs/2203.13474
- [25] B. Rozière et al., "Code llama: Open foundation models for code," 2024. [Online]. Available: https://arxiv.org/abs/2308.12950
- [26] H. Touvron et al., "Llama 2: Open foundation and fine-tuned chat models," 2023. [Online]. Available: https://arxiv.org/abs/2307.09288

- [27] S. Liu et al., "Rtlcoder: Outperforming gpt-3.5 in design rtl generation with our open-source dataset and lightweight solution," 2024. [Online]. Available: https://arxiv.org/abs/2312.08617
- [28] Y. Fu et al., "Gpt4aigchip: Towards next-generation ai accelerator design automation via large language models," in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). IEEE, 2023, pp. 1–9.
- [29] K. Chang et al., "Chipgpt: How far are we from natural language hardware design," arXiv preprint arXiv:2305.14019, 2023.
- [30] A. Khare et al., "Understanding the effectiveness of large language models in detecting security vulnerabilities," 2024. [Online]. Available: https://arxiv.org/abs/2311.16169
- [31] M. D. Purba et al., "Software vulnerability detection using large language models," in 2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW), 2023, pp. 112–119.
- [32] F. Wu et al., "Exploring the limits of chatgpt in software security applications," 2023. [Online]. Available: https://arxiv.org/abs/2312.05275
- [33] D. Bowen *et al.*, "Scaling laws for data poisoning in llms," 2024. [Online]. Available: https://arxiv.org/abs/2408.02946
- [34] unsloth AI, "Meta-Ilama-3.1-70b-bnb-4bit," (https://huggingface.co/ unsloth/Meta-Llama-3.1-70B-bnb-4bit).
- [35] C. Wolf, "Yosys open SYnthesis suite," (http://www.clifford.at/yosys/).
- [36] Meta, "Meta-Ilama-3-70b," (https://huggingface.co/meta-Ilama/Meta-Llama-3-70B).
- [37] G. Kokolakis et al., "Harnessing the power of general-purpose llms in hardware trojan design," in Applied Cryptography and Network Security Workshops: ACNS 2024 Satellite Workshops. Springer-Verlag, 2024, p. 176–194.
- [38] J. Bhandari *et al.*, "Sentaur: Security enhanced trojan assessment using llms against undesirable revisions," *arXiv preprint arXiv:2407.12352*, 2024.
- [39] M. O. Faruque, P. Jamieson, A. Patooghy, and A.-H. A. Badawy, "Unleashing ghost: An Ilm-powered framework for automated hardware trojan design," 2024.