Metabolite-centric Analysis of TArgets for Drug ORientation
MATADOR is a MATLAB-based computational pipeline for identifying antimetabolite drug targets from transcriptomic data and genome-scale metabolic models (GSMMs). It integrates iMAT-based model contextualisation, network sampling, and MOMA-based flux perturbation to score the metabolic impact of blocking a metabolite's consuming reactions—ranking candidate antimetabolite targets by their ability to push the disease network away from a healthy reference state.
Two analysis modes are provided:
- Average analysis — cohort-level targets derived from group-averaged flux distributions and differential expression
- Personalized analysis — patient-level targets derived from individual-sample models and z-score-based differential expression
- Graded sensitivity analysis — sweeps fractional inhibition levels (0%, 25%, 50%, 75%, 100%) to characterise dose-response behaviour
- Requirements
- Directory Structure
- Input Data Format
- Pipeline Overview
- Step-by-Step Usage
- Step 1 — Environment Setup
- Step 2 — Load Expression and DEG Data
- Step 3 — Prepare the Genome-Scale Model
- Step 4 — Generate Personalised Metabolic Networks
- Step 5 — Statistical Test on Reaction Activity
- Step 6 — Extract Metabolite Target List (R)
- Step 7 — Sample Metabolic Networks
- Step 8 — Run MATADOR
- Running Graded Sensitivity Analysis
- Function Reference
- Output Files
- Interpreting MATADOR Scores
- Notes and Troubleshooting
| Dependency | Version tested |
|---|---|
| MATLAB | R2023b (v23.2) |
| Gurobi | v11.0 |
| COBRA Toolbox | v3.1 |
| RAVEN Toolbox | v2.8.7 |
| Human-GEM | v1.15.0 |
| Parallel Computing Toolbox | any |
| R | v4.4.1 |
All COBRA and RAVEN toolbox paths must be reachable from the MATLAB path before running the pipeline. Gurobi must be configured as the default COBRA solver.
project/
├── src/ # MATADOR source functions
│ ├── MATADOR.mlx
│ ├── mainGradedMATADOR.mlx
│ ├── averageMATADOR.mlx
│ ├── gradedMATADOR.mlx
│ ├── generatePersonalizedModels.mlx
│ ├── iMATCustom.mlx
│ └── fisherTestRxn.mlx
├── cobratoolbox-master/ # COBRA Toolbox installation
├── RAVEN-main/ # RAVEN Toolbox installation
├── Human-GEM-main/ # Human-GEM model files
│ ├── Human-GEM.mat
│ ├── GPRs/
│ └── tINIT/
├── Expression Data/ # Input: gene expression xlsx files
├── DEG Data/ # Input: differential expression xlsx files
├── Metabolite Reports/
│ ├── Metabolite Targets/ # Input: target lists from R pipeline
│ └── Antimetabolite Analysis/ # Output: MATADOR score tables
├── main_MATADOR.mlx # Entry point: average + personalized
└── gradedMATADOR_Sensitivity_Analysis.mlx # Entry point: graded analysis
One .xlsx file per dataset. Each file must contain two sheets:
- Sheet 1 — Control/healthy samples
- Sheet 2 — Disease samples
Both sheets share the same structure:
| GeneID | Sample_1 | Sample_2 | … |
|---|---|---|---|
| ENSG00000001 | 5.23 | 4.87 | … |
Gene expression values should be pre-normalised on a consistent scale (TPM, FPKM, or log2-transformed counts). Gene IDs must match those in the Human-GEM model.
One .xlsx file per dataset, named <DatasetName>_DEG.xlsx. Three columns are required:
| gene | logFC | pval |
|---|---|---|
| ENSG00000001 | 1.52 | 0.003 |
For the average analysis, these are LIMMA-computed group-level statistics. For the personalized analysis, per-patient z-scores and two-tailed p-values are computed automatically from the expression matrices.
Generated by the R script prepareMetaboliteTargetList.R (Step 6). Must contain at minimum:
| Metabolite | AssociatedConsumingRxn |
|---|---|
| L-glutamine | RE2626N, RE2622C, r0392 |
Each row is one candidate metabolite; AssociatedConsuming is a comma-separated list of reaction IDs in the model that consume that metabolite.
Expression Data ──┐
DEG Data ──┤
▼
[Step 2] Load omics data
│
▼
[Step 3] Load & constrain Human-GEM
│
▼
[Step 4] generatePersonalizedModels()
│ iMAT per sample → binary reaction matrix
▼
[Step 5] fisherTestRxn()
│ Fisher's exact test per reaction
▼
[Step 6] prepareMetaboliteTargetList.R ← (R script)
│ Centrality-based target ranking
▼
[Step 7] sampleMetNetworks()
│ Riemannian-Hamiltonian Monte Carlo (RHMC) flux sampling
▼
[Step 8a] averageMATADOR() ← cohort-level scores
[Step 8b] personalizedMATADOR() ← patient-level scores
│
▼
final_MATADOR_results.mat + per-dataset .xlsx files
Run at the start of any session, or place at the top of your entry-point script. The .m files in src/m_files can be used when MATLAB live editor for .mlx files is not conventient.
clearvars;
addpath('./src/')
addpath('./cobratoolbox-master')
addpath('./RAVEN-main/installation')
addpath('./Human-GEM-main/GPRs')
addpath('./Human-GEM-main/tINIT')
initCobraToolbox
checkInstallation% --- Expression data ---
expressionData = struct;
fileList1 = dir(fullfile('Expression Data', '*.xlsx'));
for i = 1:numel(fileList1)
expressionData(i).OmicData = strrep(fileList1(i).name, '.xlsx', '');
sheets = sheetnames(fullfile(fileList1(i).folder, fileList1(i).name));
data1 = readtable(fullfile(fileList1(i).folder, fileList1(i).name), 'Sheet', sheets{1});
expressionData(i).Genes = string(data1.GeneID);
expressionData(i).Control = table2array(data1(:, 2:end));
data1 = readtable(fullfile(fileList1(i).folder, fileList1(i).name), 'Sheet', sheets{2});
expressionData(i).Disease = table2array(data1(:, 2:end));
end
save('expressionData.mat', 'expressionData')
% --- Group-level DEG (LIMMA output) ---
degData_Average = struct;
fileList1 = dir(fullfile('DEG Data/', '*.xlsx'));
for i = 1:numel(fileList1)
degData_Average(i).OmicData = strrep(fileList1(i).name, '_DEG.xlsx', '');
diffExpr = readtable(fullfile(fileList1(i).folder, fileList1(i).name));
diffExpr = diffExpr(:, [1, 2, 3]);
diffExpr.Properties.VariableNames = {'gene', 'logFC', 'pval'};
degData_Average(i).diffExpr = diffExpr;
end% SampleName must match column headers in the disease expression sheet
SampleName = readtable('Expression Data/ROSMAP.xlsx', ...
'ReadVariableNames', true, 'Sheet', 'AD');
SampleName = SampleName.Properties.VariableNames;
SampleName(1) = []; % remove the GeneID column header
degData_Personalized = struct;
controlExp = expressionData(1).Control; % adjust index as needed
for i = 1:size(expressionData(1).Disease, 2)
aveExpr = mean([expressionData(1).Disease(:, i), controlExp], 2);
sdExpr = std([expressionData(1).Disease(:, i), controlExp], 0, 2);
zExpr = (expressionData(1).Disease(:, i) - aveExpr) ./ sdExpr;
pTwoTailed = 2 * (1 - normcdf(abs(zExpr)));
degData_Personalized(i).PatientID = SampleName(i);
diffExpr = table(expressionData(1).Genes, zExpr, pTwoTailed);
diffExpr.Properties.VariableNames = {'gene', 'logFC', 'pval'};
degData_Personalized(i).diffExpr = diffExpr;
end
save('degData_Personalized.mat', 'degData_Personalized')gemPath = './Human-GEM-main/Human-GEM.mat';
ihuman = load(gemPath).ihuman;
curatedModel = ravenCobraWrapper(ihuman);
% Apply physiological flux constraints
curatedModel.lb(strcmp('MAR13082', curatedModel.rxns)) = 0.0001; % biomass minimum
curatedModel.ub(strcmp('MAR09048', curatedModel.rxns)) = -0.01; % oxygen uptake
curatedModel.ub(strcmp('MAR09034', curatedModel.rxns)) = -0.01; % glucose uptakeAdjust the bound values to reflect the physiology of your cell type or experimental condition.
This step runs iMAT on each individual sample (control and disease) to produce context-specific models and binary reaction-activity matrices. It is computationally intensive; open a parallel pool before calling it.
rng(2304.2024, 'twister');
if isempty(gcp('nocreate')), parpool(18); end % adjust worker count to hardware
changeCobraSolver('gurobi', 'all');
personalizedMetNet_iMAT = generatePersonalizedModels(curatedModel, expressionData);
save('personalizedMetNet_iMAT.mat', 'personalizedMetNet_iMAT')Output fields per dataset:
| Field | Description |
|---|---|
OmicData |
Dataset identifier |
healthyMod |
R × N_control binary matrix (1 = reaction active) |
diseaseMod |
R × N_disease binary matrix |
redModelsH |
Cell array of per-sample COBRA models (control) |
redModelsD |
Cell array of per-sample COBRA models (disease) |
Fisher's exact test identifies reactions whose activity differs significantly between disease and control cohorts. Results are used downstream to derive rxnFBS (forward/backward/unchanged labels) for MATADOR scoring.
saveLocation = './';
fisherTestRxn(curatedModel, personalizedMetNet_iMAT, saveLocation)Output is written to AMS_iMAT_FisherTest.xlsx with one sheet per dataset. Each sheet contains:
| Reaction | PValue | OR | ActiveCon (%) | ActiveAD (%) |
|---|
This step runs outside MATLAB. Use prepareMetaboliteTargetList.R to build the metabolite-metabolite interaction network from the Fisher test output, compute betweenness and closeness centrality per network module, and export ranked candidate targets. The input files required to run this script can be downloaded from Zenodo. The current implemenation was performed in R v4.4.1 and the environment R package versions are provided in SRC folder.
The R script produces one .xlsx file per dataset in Metabolite Reports/Metabolite Targets/, named <DatasetName>_BetweennessTopList.xlsx. Refer to the R script documentation for parameters including currency metabolite exclusion lists and the topNPercent selection threshold.
Network sampling defines the disease reference flux distribution (VRefDis) and the rxnFBS vector required by MATADOR. Both average and personalized sampling use the same function, controlled by the mode flag.
changeCobraSolver('gurobi', 'all');
% Average (cohort-level)
averageSampledModels = sampleMetNetworks(curatedModel, personalizedMetNet_iMAT, ...
degData_Average, 'AVERAGE', 10);
% Personalized (patient-level)
personalizedSampledModels = sampleMetNetworks(curatedModel, personalizedMetNet_iMAT, ...
degData_Personalized, 'PERSONALIZED', 10);The fifth argument (10) controls the number of sampling iterations. Each element of the output struct provides .disModel, .diffFBS, .VRefDis, and .VRefCon fields consumed by averageMATADOR and personalizedMATADOR in the next step.
Set a fixed random seed and output directories, then loop over datasets.
rng(1107.2025, 'twister');
targetPath = 'Metabolite Reports/Metabolite Targets/';
resultsPath = 'Metabolite Reports/Antimetabolite Analysis/';
if ~exist(resultsPath, 'dir'), mkdir(resultsPath); endaverageMATADOR_Simulation = struct();
for i = 1:numel(averageSampledModels)
datasetName = averageSampledModels(i).OmicData;
targetFile = fullfile(targetPath, [datasetName, '_BetweennessTopList.xlsx']);
averageMATADOR_Simulation(i).DataSet = datasetName;
averageMATADOR_Simulation(i).MetaboliteSims = averageMATADOR( ...
averageSampledModels(i).disModel, ...
targetFile, ...
averageSampledModels(i).diffFBS, ...
averageSampledModels(i).VRefDis, ...
averageSampledModels(i).VRefCon, ...
resultsPath, datasetName);
endpersResultsPath = 'Metabolite Reports/Antimetabolite Analysis Personalized/';
if ~exist(persResultsPath, 'dir'), mkdir(persResultsPath); end
personalizedMATADOR_Simulation = struct();
for i = 1:numel(personalizedSampledModels)
patientID = personalizedSampledModels(i).PatientID;
targetFile = fullfile(targetPath, '_BetweennessTopList.xlsx');
personalizedMATADOR_Simulation(i).PatientID = patientID;
personalizedMATADOR_Simulation(i).MetaboliteSims = personalizedMATADOR( ...
personalizedSampledModels(i).disModel, ...
targetFile, ...
personalizedSampledModels(i).diffFBS, ...
personalizedSampledModels(i).VRefDis, ...
persResultsPath, patientID);
end
save('final_MATADOR_results.mat', 'averageMATADOR_Simulation', 'personalizedMATADOR_Simulation')The graded variant sweeps fractional inhibition levels instead of applying a complete block. Use gradedMATADOR_Sensitivity_Analysis.m as the entry point, or call gradedMATADOR directly.
rng(1107.2025, 'twister');
gradeLevels = [0, 0.25, 0.50, 0.75]; % 0%, 25%, 50%, 75% inhibition
gradedMATADOR_Sensitivity = struct();
for i = 1:numel(averageSampledModels)
datasetName = averageSampledModels(i).OmicData;
targetFile = fullfile(targetPath, [datasetName, '.xlsx']);
gradedMATADOR_Sensitivity(i).DataSet = datasetName;
gradedMATADOR_Sensitivity(i).MetaboliteSims = gradedMATADOR( ...
averageSampledModels(i).disModel, ...
targetFile, ...
averageSampledModels(i).diffFBS, ...
averageSampledModels(i).VRefDis, ...
gradeLevels, ...
resultsPath, datasetName);
end
save('gradedMATADOR_Sensitivity.mat', 'gradedMATADOR_Sensitivity')Output scores are written to <saveLoc><sampleName>_SensitivityAnalysis.xlsx with columns Metabolite, MATADOR_0, MATADOR_25, MATADOR_50, MATADOR_75.
Core binary perturbation engine. Fully blocks all consuming reactions of one metabolite, solves a MOMA QP, and returns the similarity score to the healthy reference.
| Argument | Type | Description |
|---|---|---|
model |
struct | COBRA model |
simMetab |
string | Metabolite name (for logging) |
KOrxn |
cell/char | Reaction IDs to block |
rxnFBS |
numeric vector | Forward/Backward/Unchanged labels per reaction |
VRefDis |
numeric vector | Disease reference flux distribution |
Returns antiMetabScore.MATADOR_Score (scalar) and antiMetabScore.fluxMOMA (flux vector).
Loops over all metabolites in the target file and calls both MATADOR (MOMA-based) and rMTA_Metabolite in parallel. Writes scores to <sampleName>.xlsx.
| Argument | Description |
|---|---|
dModel |
Disease-state COBRA model |
files |
Path to the metabolite target .xlsx file |
diffFBS |
Forward/Backward/Unchanged vector |
VRefDis |
Disease reference flux distribution |
saveLoc |
Output directory |
sampleName |
Output file prefix |
Output struct fields: aDAT_ScoresTable, RunTime, ConsumingRxnsIndices.
Core graded perturbation engine. Scales reaction bounds by gradeLevel (0 = full block, 1 = unperturbed) instead of zeroing them completely.
| Argument | Description |
|---|---|
gradeLevel |
Scalar in [0, 1]; fraction of original bounds retained |
All other arguments are identical to MATADOR. Returns antiMetab.MATADOR_Score and antiMetab.fluxMATADOR.
Iterates over metabolites × grade levels, calling mainGradedMATADOR for each combination. Results are assembled into a metabolite × grade score matrix.
Builds iMAT-based context-specific models for every sample in every dataset. Thresholds are derived from the pooled (control + disease) expression distribution (25th and 75th percentiles) to ensure cross-condition comparability.
Solves the iMAT MILP to extract a context-specific subnetwork. Reactions with expression above threshold_ub are in the high-confidence active set; those below threshold_lb are preferentially removed. Default MILP time limit is 60 s per sample.
Runs Fisher's exact test on the binary reaction-activity matrices to identify reactions whose activation status differs significantly between disease and control.
| File | Contents |
|---|---|
expressionData.mat |
Loaded expression struct array |
degData_Personalized.mat |
Per-patient z-score DEG struct array |
personalizedMetNet_iMAT.mat |
iMAT models and binary reaction matrices |
AMS_iMAT_FisherTest.xlsx |
Fisher test p-values and odds ratios per reaction |
<dataset>.xlsx (average) |
Sheets: TScores (MOMA + rMTA scores), RunTime |
<dataset>_SensitivityAnalysis.xlsx |
Graded scores: MATADOR_0 through MATADOR_75 |
final_MATADOR_results.mat |
All average and personalized simulation results |
gradedMATADOR_Sensitivity.mat |
All graded simulation results |
MATADOR scores reflect the cosine-like similarity (computed by MATADOR_TS/aTS) between the flux state after metabolite inhibition and the healthy reference distribution, weighted by rxnFBS.
| Score | Interpretation |
|---|---|
| High positive | Blocking this metabolite pushes flux towards the healthy state — strong antimetabolite candidate |
| Near zero | Inhibition has minimal effect on network state |
| High negative | Blocking this metabolite pushes flux towards the disease state — an unlikely antimetabolite candidate |
-Inf |
QP infeasible, or flux norm < 1 (degenerate solution); metabolite likely essential to network feasibility under given constraints |
For the graded analysis, a monotonically increasing score from grade 0 → 0.75 suggests that even partial inhibition is therapeutically meaningful, making the metabolite a more tractable drug target.
Parallel pool — both generatePersonalizedModels and averageMATADOR use parfor. Open a pool before calling them (parpool(N) where N ≤ available cores). If the Parallel Computing Toolbox is unavailable, replace parfor with for in the source files.
Solver — all QPs and MILPs require Gurobi. Call changeCobraSolver('gurobi', 'all') before any simulation step.
Random seeds — rng(2304.2024, 'twister') is set before model generation and rng(1107.2025, 'twister') before MATADOR runs to ensure reproducibility.
Missing reactions in target file — reactions listed in AssociatedConsuming that are absent from the model are silently skipped. If scores are unexpectedly -Inf for many metabolites, verify that the model version used in Steps 3–7 matches the one used to generate the target list in Step 6.
Human-GEM constraints — the default physiological bounds (biomass ≥ 0.0001, oxygen uptake ≤ −0.01, glucose uptake ≤ −0.01) are tuned for a generic human cell context. Adjust these to reflect your experimental system before running iMAT.
Sentinel value 65535 — this value may appear in upstream Fisher test output and should be treated as a missing or overflow marker in downstream analyses.