PLMGH: What Matters in PLM-GNN Hybrids for Code Classification and Vulnerability Detection

Official repository for the paper:

PLMGH: What Matters in PLM-GNN Hybrids for Code Classification and Vulnerability Detection

Overview

Code understanding models increasingly rely on two complementary families of methods:

Pretrained Language Models (PLMs), which capture rich semantic information from code tokens
Graph Neural Networks (GNNs), which exploit structural information from program representations such as Abstract Syntax Trees (ASTs)

This repository accompanies a controlled empirical study of PLM→GNN hybrids, where pretrained code representations are injected into downstream graph models for code classification and vulnerability detection.

The goal of this work is to better understand what actually matters in PLM-GNN hybrid pipelines:

Do hybrids consistently outperform PLM-only and GNN-only baselines?
What computational costs do they introduce?
How robust are they under identifier obfuscation?
Does performance depend more on the PLM feature source or on the GNN architecture?

Main Idea

This repo's setting follows a simple and practical pipeline:

Parse source code into an AST
Run a frozen code-specialized PLM on the source code
Align token-level PLM embeddings to AST nodes
Feed the resulting node features into a GNN
Perform downstream prediction for code classification or vulnerability detection

Study Scope

We evaluate PLM→GNN hybrids across:

Code classification on Java250
Vulnerability detection on Devign
Out-of-distribution robustness on Devign with identifier obfuscation

PLMs

DeepSeek-Coder-1.3B
StarCoder2-3B
Qwen2.5-Coder-0.5B

GNNs

GCN
GAT
Graph Transformer

Baselines

GNN-only models
PLM-only frozen models
PLM-only finetuned models (when applicable)

Third-Party Code

Parts of this repository include code copied and adapted from:

code_ast — Cedric Richter
Source repository: cedricrupb/code_ast
License: MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
code_ast		code_ast
configs		configs
local_datasets		local_datasets
model		model
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyrightconfig.json		pyrightconfig.json
tree_sitter_languages.py		tree_sitter_languages.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PLMGH: What Matters in PLM-GNN Hybrids for Code Classification and Vulnerability Detection

Overview

Main Idea

Study Scope

PLMs

GNNs

Baselines

Third-Party Code

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PLMGH: What Matters in PLM-GNN Hybrids for Code Classification and Vulnerability Detection

Overview

Main Idea

Study Scope

PLMs

GNNs

Baselines

Third-Party Code

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages