LLM Rubric Representation Learning (LRRL)

This is the repository for the paper LLMs can construct powerful representations and streamline sample-efficient supervised learning.

Project website: https://lrrlpaper.github.io

If you find this codebase useful, please cite:

@article{demirel2026llms,
  title={LLMs can construct powerful representations and streamline sample-efficient supervised learning},
  author={Demirel, Ilker and Shi, Lawrence and Hussain, Zeshan and Sontag, David},
  journal={arXiv preprint arXiv:2603.11679},
  year={2026}
}

Pipeline Overview

01_serialize  ->  02_create_sft  ->  05_eval
                       |                |
               03_globalrubric    generate_embeddings
               (GPT-5-mini)       + eval_embeddings
                       |
               04_localrubric     Auto rubric path:
               (GPT-5-mini)       create_rubric_auto -> auto_parsers
                                  create_feature_extractor -> feature_extractors
                                         |
                                  eval_rubric_features (LR + XGBoost)

Execution Order

# Step 1: Serialize EHRs
bash 01_serialize/run.sh

# Step 2: Create SFT datasets
bash 02_create_sft/run.sh

# Step 3: Build global rubrics + rubricified representations (requires GPT-5-mini + GPU for embedding generation)
bash 03_globalrubric/run.sh

# Step 3b (optional): Generate auto rubric parsers + apply deterministically (requires GPT-5.2)
bash 03_globalrubric/run_auto.sh

# Step 4: Generate local rubric representations (requires GPT-5-mini)
bash 04_localrubric/run.sh

# Step 5: Evaluate everything (embeddings + LogReg)
bash 05_eval/run.sh

# Step 5b (optional): Evaluate rubric tabular features with LR + XGBoost
bash 05_eval/run_rubric_features.sh

Setup

Prerequisites

Python 3.10+
CUDA-capable GPU(s)
Access to Azure OpenAI (for steps 3-4)
EHRSHOT dataset (FEMR extract + benchmark labels + splits) -- acquire access at https://stanford.redivis.com/datasets/53gc-8rhx41kgt

Environment Variables

Set these before running 01_serialize/run.sh:

export EHRSHOT_DB=/path/to/EHRSHOT_ASSETS/femr/extract
export EHRSHOT_LABELS=/path/to/EHRSHOT_ASSETS/benchmark
export EHRSHOT_SPLITS=/path/to/EHRSHOT_ASSETS/splits/person_id_map.csv

Azure OpenAI

Copy the template and fill in your credentials:

cp config/azure_config.json.template config/azure_config.json
# Edit config/azure_config.json with your endpoint and API key

Directory Structure

lrrl/
├── README.md
├── .gitignore
├── config/
│   ├── tasks.py                      # 15 task definitions + model names
│   ├── azure.py                      # Azure OpenAI config loader
│   └── azure_config.json.template    # Credentials template (not committed)
│
├── 01_serialize/
│   ├── serialize.py                  # EHR -> Markdown text
│   ├── ehr_serializer.py             # Core serializer
│   └── run.sh
│
├── 02_create_sft/
│   ├── create_sft.py                 # Serialized data -> SFT conversations
│   └── run.sh
│
├── 03_globalrubric/
│   ├── build_cohort.py               # K-means + medoid selection (40 patients)
│   ├── create_rubric.py              # GPT-5-mini rubric generation
│   ├── apply_rubric.py               # Apply rubric to all patients
│   ├── create_globalrubric_sft.py    # Rubricified -> SFT format
│   ├── create_rubric_auto.py         # GPT-5.2 generates deterministic parsers
│   ├── create_rubric_auto_plus.py    # Enhanced: parsers with LLM rubric examples
│   ├── create_rubric_schema.py       # GPT-5-mini derives typed rubric schema
│   ├── create_feature_extractor.py   # GPT-5.2 generates tabular featurizers
│   ├── run.sh                        # LLM rubric pipeline
│   ├── run_auto.sh                   # Auto rubric pipeline
│   ├── auto_parsers/                 # Generated deterministic parsers (gitignored)
│   ├── auto_parsers_plus/            # Plus-variant parsers (gitignored)
│   └── feature_extractors/           # Generated featurizers (gitignored)
│
├── 04_localrubric/
│   ├── generate_local_rubric.py      # Local rubric generation (train+val+test)
│   └── run.sh
│
├── 05_eval/
│   ├── generate_embeddings.py        # Qwen3-Embedding-8B embeddings
│   ├── eval_embeddings.py            # LogReg with val-based C selection
│   ├── eval_rubric_features.py       # LR + XGBoost on tabular rubric features
│   ├── compute_metrics.py            # AUROC/AUPRC + bootstrap CIs
│   ├── run.sh                        # Orchestrator
│   ├── run_embeddings.sh             # Embedding generation + LogReg evaluation
│   └── run_rubric_features.sh        # Featurizer + LR/XGBoost evaluation
│
├── 06_baselines/
│   ├── clmbrt/                       # CLMBR-T baseline
│   └── count-gbm/                    # Count-GBM baseline
│
└── data/                             # All generated outputs (gitignored)

Tasks (15)

Category	Task	Query
Operational	`guo_icu`	Will the patient be transferred to the ICU?
Operational	`guo_los`	Will the patient stay > 7 days?
Operational	`guo_readmission`	Will the patient be readmitted within 30 days?
Lab	`lab_thrombocytopenia`	Will the thrombocytopenia lab come back as abnormal?
Lab	`lab_hyperkalemia`	Will the hyperkalemia lab come back as abnormal?
Lab	`lab_hypoglycemia`	Will the hypoglycemia lab come back as abnormal?
Lab	`lab_hyponatremia`	Will the hyponatremia lab come back as abnormal?
Lab	`lab_anemia`	Will the anemia lab come back as abnormal?
Diagnosis	`new_hypertension`	Will the patient develop hypertension in the next year?
Diagnosis	`new_hyperlipidemia`	Will the patient develop hyperlipidemia in the next year?
Diagnosis	`new_pancan`	Will the patient develop pancreatic cancer in the next year?
Diagnosis	`new_celiac`	Will the patient develop celiac disease in the next year?
Diagnosis	`new_lupus`	Will the patient develop lupus in the next year?
Diagnosis	`new_acutemi`	Will the patient develop an acute MI in the next year?
Imaging	`chexpert`	Does the patient have abnormal chest X-ray findings?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Rubric Representation Learning (LRRL)

Pipeline Overview

Execution Order

Setup

Prerequisites

Environment Variables

Azure OpenAI

Directory Structure

Tasks (15)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
01_serialize		01_serialize
02_create_sft		02_create_sft
03_globalrubric		03_globalrubric
04_localrubric		04_localrubric
05_eval		05_eval
06_baselines		06_baselines
config		config
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

LLM Rubric Representation Learning (LRRL)

Pipeline Overview

Execution Order

Setup

Prerequisites

Environment Variables

Azure OpenAI

Directory Structure

Tasks (15)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages