MATADOR

Metabolite-centric Analysis of TArgets for Drug ORientation

MATADOR is a MATLAB-based computational pipeline for identifying antimetabolite drug targets from transcriptomic data and genome-scale metabolic models (GSMMs). It integrates iMAT-based model contextualisation, network sampling, and MOMA-based flux perturbation to score the metabolic impact of blocking a metabolite's consuming reactions—ranking candidate antimetabolite targets by their ability to push the disease network away from a healthy reference state.

Two analysis modes are provided:

Average analysis — cohort-level targets derived from group-averaged flux distributions and differential expression
Personalized analysis — patient-level targets derived from individual-sample models and z-score-based differential expression
Graded sensitivity analysis — sweeps fractional inhibition levels (0%, 25%, 50%, 75%, 100%) to characterise dose-response behaviour

Requirements

Dependency	Version tested
MATLAB	R2023b (v23.2)
Gurobi	v11.0
COBRA Toolbox	v3.1
RAVEN Toolbox	v2.8.7
Human-GEM	v1.15.0
Parallel Computing Toolbox	any
R	v4.4.1

All COBRA and RAVEN toolbox paths must be reachable from the MATLAB path before running the pipeline. Gurobi must be configured as the default COBRA solver.

Directory Structure

project/
├── src/                              # MATADOR source functions
│   ├── MATADOR.mlx
│   ├── mainGradedMATADOR.mlx
│   ├── averageMATADOR.mlx
│   ├── gradedMATADOR.mlx
│   ├── generatePersonalizedModels.mlx
│   ├── iMATCustom.mlx
│   └── fisherTestRxn.mlx
├── cobratoolbox-master/              # COBRA Toolbox installation
├── RAVEN-main/                       # RAVEN Toolbox installation
├── Human-GEM-main/                   # Human-GEM model files
│   ├── Human-GEM.mat
│   ├── GPRs/
│   └── tINIT/
├── Expression Data/                  # Input: gene expression xlsx files
├── DEG Data/                         # Input: differential expression xlsx files
├── Metabolite Reports/
│   ├── Metabolite Targets/           # Input: target lists from R pipeline
│   └── Antimetabolite Analysis/      # Output: MATADOR score tables
├── main_MATADOR.mlx                    # Entry point: average + personalized
└── gradedMATADOR_Sensitivity_Analysis.mlx  # Entry point: graded analysis

Input Data Format

Expression Data (`Expression Data/*.xlsx`)

One .xlsx file per dataset. Each file must contain two sheets:

Sheet 1 — Control/healthy samples
Sheet 2 — Disease samples

Both sheets share the same structure:

GeneID	Sample_1	Sample_2	…
ENSG00000001	5.23	4.87	…

Gene expression values should be pre-normalised on a consistent scale (TPM, FPKM, or log2-transformed counts). Gene IDs must match those in the Human-GEM model.

Differential Expression Data (`DEG Data/*_DEG.xlsx`)

One .xlsx file per dataset, named <DatasetName>_DEG.xlsx. Three columns are required:

gene	logFC	pval
ENSG00000001	1.52	0.003

For the average analysis, these are LIMMA-computed group-level statistics. For the personalized analysis, per-patient z-scores and two-tailed p-values are computed automatically from the expression matrices.

Metabolite Target List (`Metabolite Targets/<DatasetName>_BetweennessTopList.xlsx`)

Generated by the R script prepareMetaboliteTargetList.R (Step 6). Must contain at minimum:

Metabolite	AssociatedConsumingRxn
L-glutamine	RE2626N, RE2622C, r0392

Each row is one candidate metabolite; AssociatedConsuming is a comma-separated list of reaction IDs in the model that consume that metabolite.

Pipeline Overview

Expression Data  ──┐
DEG Data         ──┤
                   ▼
          [Step 2] Load omics data
                   │
                   ▼
          [Step 3] Load & constrain Human-GEM
                   │
                   ▼
          [Step 4] generatePersonalizedModels()
                   │  iMAT per sample → binary reaction matrix
                   ▼
          [Step 5] fisherTestRxn()
                   │  Fisher's exact test per reaction
                   ▼
          [Step 6] prepareMetaboliteTargetList.R  ← (R script)
                   │  Centrality-based target ranking
                   ▼
          [Step 7] sampleMetNetworks()
                   │  Riemannian-Hamiltonian Monte Carlo (RHMC) flux sampling
                   ▼
          [Step 8a] averageMATADOR()    ← cohort-level scores
          [Step 8b] personalizedMATADOR() ← patient-level scores
                   │
                   ▼
          final_MATADOR_results.mat + per-dataset .xlsx files

Step-by-Step Usage

Step 1 — Environment Setup

Run at the start of any session, or place at the top of your entry-point script. The .m files in src/m_files can be used when MATLAB live editor for .mlx files is not conventient.

clearvars;
addpath('./src/')
addpath('./cobratoolbox-master')
addpath('./RAVEN-main/installation')
addpath('./Human-GEM-main/GPRs')
addpath('./Human-GEM-main/tINIT')

initCobraToolbox
checkInstallation

Step 2 — Load Expression and DEG Data

Average analysis — group-level DEG

% --- Expression data ---
expressionData = struct;
fileList1 = dir(fullfile('Expression Data', '*.xlsx'));
for i = 1:numel(fileList1)
    expressionData(i).OmicData = strrep(fileList1(i).name, '.xlsx', '');
    sheets = sheetnames(fullfile(fileList1(i).folder, fileList1(i).name));
    data1 = readtable(fullfile(fileList1(i).folder, fileList1(i).name), 'Sheet', sheets{1});
    expressionData(i).Genes   = string(data1.GeneID);
    expressionData(i).Control = table2array(data1(:, 2:end));
    data1 = readtable(fullfile(fileList1(i).folder, fileList1(i).name), 'Sheet', sheets{2});
    expressionData(i).Disease = table2array(data1(:, 2:end));
end
save('expressionData.mat', 'expressionData')

% --- Group-level DEG (LIMMA output) ---
degData_Average = struct;
fileList1 = dir(fullfile('DEG Data/', '*.xlsx'));
for i = 1:numel(fileList1)
    degData_Average(i).OmicData = strrep(fileList1(i).name, '_DEG.xlsx', '');
    diffExpr = readtable(fullfile(fileList1(i).folder, fileList1(i).name));
    diffExpr = diffExpr(:, [1, 2, 3]);
    diffExpr.Properties.VariableNames = {'gene', 'logFC', 'pval'};
    degData_Average(i).diffExpr = diffExpr;
end

Personalized analysis — per-patient z-score DEG

% SampleName must match column headers in the disease expression sheet
SampleName = readtable('Expression Data/ROSMAP.xlsx', ...
    'ReadVariableNames', true, 'Sheet', 'AD');
SampleName = SampleName.Properties.VariableNames;
SampleName(1) = [];   % remove the GeneID column header

degData_Personalized = struct;
controlExp = expressionData(1).Control;   % adjust index as needed

for i = 1:size(expressionData(1).Disease, 2)
    aveExpr  = mean([expressionData(1).Disease(:, i), controlExp], 2);
    sdExpr   = std([expressionData(1).Disease(:, i), controlExp], 0, 2);
    zExpr    = (expressionData(1).Disease(:, i) - aveExpr) ./ sdExpr;
    pTwoTailed = 2 * (1 - normcdf(abs(zExpr)));
    degData_Personalized(i).PatientID = SampleName(i);
    diffExpr  = table(expressionData(1).Genes, zExpr, pTwoTailed);
    diffExpr.Properties.VariableNames = {'gene', 'logFC', 'pval'};
    degData_Personalized(i).diffExpr = diffExpr;
end
save('degData_Personalized.mat', 'degData_Personalized')

Step 3 — Prepare the Genome-Scale Model

gemPath     = './Human-GEM-main/Human-GEM.mat';
ihuman      = load(gemPath).ihuman;
curatedModel = ravenCobraWrapper(ihuman);

% Apply physiological flux constraints
curatedModel.lb(strcmp('MAR13082', curatedModel.rxns)) = 0.0001;  % biomass minimum
curatedModel.ub(strcmp('MAR09048', curatedModel.rxns)) = -0.01;   % oxygen uptake
curatedModel.ub(strcmp('MAR09034', curatedModel.rxns)) = -0.01;   % glucose uptake

Adjust the bound values to reflect the physiology of your cell type or experimental condition.

Step 4 — Generate Personalised Metabolic Networks

This step runs iMAT on each individual sample (control and disease) to produce context-specific models and binary reaction-activity matrices. It is computationally intensive; open a parallel pool before calling it.

rng(2304.2024, 'twister');

if isempty(gcp('nocreate')), parpool(18); end   % adjust worker count to hardware
changeCobraSolver('gurobi', 'all');

personalizedMetNet_iMAT = generatePersonalizedModels(curatedModel, expressionData);
save('personalizedMetNet_iMAT.mat', 'personalizedMetNet_iMAT')

Output fields per dataset:

Field	Description
`OmicData`	Dataset identifier
`healthyMod`	R × N_control binary matrix (1 = reaction active)
`diseaseMod`	R × N_disease binary matrix
`redModelsH`	Cell array of per-sample COBRA models (control)
`redModelsD`	Cell array of per-sample COBRA models (disease)

Step 5 — Statistical Test on Reaction Activity

Fisher's exact test identifies reactions whose activity differs significantly between disease and control cohorts. Results are used downstream to derive rxnFBS (forward/backward/unchanged labels) for MATADOR scoring.

saveLocation = './';
fisherTestRxn(curatedModel, personalizedMetNet_iMAT, saveLocation)

Output is written to AMS_iMAT_FisherTest.xlsx with one sheet per dataset. Each sheet contains:

Reaction	PValue	OR	ActiveCon (%)	ActiveAD (%)

Step 6 — Extract Metabolite Target List (R)

This step runs outside MATLAB. Use prepareMetaboliteTargetList.R to build the metabolite-metabolite interaction network from the Fisher test output, compute betweenness and closeness centrality per network module, and export ranked candidate targets. The input files required to run this script can be downloaded from Zenodo. The current implemenation was performed in R v4.4.1 and the environment R package versions are provided in SRC folder.

The R script produces one .xlsx file per dataset in Metabolite Reports/Metabolite Targets/, named <DatasetName>_BetweennessTopList.xlsx. Refer to the R script documentation for parameters including currency metabolite exclusion lists and the topNPercent selection threshold.

Step 7 — Sample Metabolic Networks

Network sampling defines the disease reference flux distribution (VRefDis) and the rxnFBS vector required by MATADOR. Both average and personalized sampling use the same function, controlled by the mode flag.

changeCobraSolver('gurobi', 'all');

% Average (cohort-level)
averageSampledModels = sampleMetNetworks(curatedModel, personalizedMetNet_iMAT, ...
                                          degData_Average, 'AVERAGE', 10);

% Personalized (patient-level)
personalizedSampledModels = sampleMetNetworks(curatedModel, personalizedMetNet_iMAT, ...
                                               degData_Personalized, 'PERSONALIZED', 10);

The fifth argument (10) controls the number of sampling iterations. Each element of the output struct provides .disModel, .diffFBS, .VRefDis, and .VRefCon fields consumed by averageMATADOR and personalizedMATADOR in the next step.

Step 8 — Run MATADOR

Set a fixed random seed and output directories, then loop over datasets.

rng(1107.2025, 'twister');
targetPath  = 'Metabolite Reports/Metabolite Targets/';
resultsPath = 'Metabolite Reports/Antimetabolite Analysis/';
if ~exist(resultsPath, 'dir'), mkdir(resultsPath); end

Average Analysis

averageMATADOR_Simulation = struct();
for i = 1:numel(averageSampledModels)
    datasetName = averageSampledModels(i).OmicData;
    targetFile  = fullfile(targetPath, [datasetName, '_BetweennessTopList.xlsx']);

    averageMATADOR_Simulation(i).DataSet      = datasetName;
    averageMATADOR_Simulation(i).MetaboliteSims = averageMATADOR( ...
        averageSampledModels(i).disModel,  ...
        targetFile,                        ...
        averageSampledModels(i).diffFBS,   ...
        averageSampledModels(i).VRefDis,   ...
        averageSampledModels(i).VRefCon,   ...
        resultsPath, datasetName);
end

Personalized Analysis

persResultsPath = 'Metabolite Reports/Antimetabolite Analysis Personalized/';
if ~exist(persResultsPath, 'dir'), mkdir(persResultsPath); end

personalizedMATADOR_Simulation = struct();
for i = 1:numel(personalizedSampledModels)
    patientID  = personalizedSampledModels(i).PatientID;
    targetFile = fullfile(targetPath, '_BetweennessTopList.xlsx');

    personalizedMATADOR_Simulation(i).PatientID      = patientID;
    personalizedMATADOR_Simulation(i).MetaboliteSims = personalizedMATADOR( ...
        personalizedSampledModels(i).disModel, ...
        targetFile,                            ...
        personalizedSampledModels(i).diffFBS,  ...
        personalizedSampledModels(i).VRefDis,  ...
        persResultsPath, patientID);
end

save('final_MATADOR_results.mat', 'averageMATADOR_Simulation', 'personalizedMATADOR_Simulation')

Running Graded Sensitivity Analysis

The graded variant sweeps fractional inhibition levels instead of applying a complete block. Use gradedMATADOR_Sensitivity_Analysis.m as the entry point, or call gradedMATADOR directly.

rng(1107.2025, 'twister');
gradeLevels = [0, 0.25, 0.50, 0.75];   % 0%, 25%, 50%, 75% inhibition

gradedMATADOR_Sensitivity = struct();
for i = 1:numel(averageSampledModels)
    datasetName = averageSampledModels(i).OmicData;
    targetFile  = fullfile(targetPath, [datasetName, '.xlsx']);

    gradedMATADOR_Sensitivity(i).DataSet        = datasetName;
    gradedMATADOR_Sensitivity(i).MetaboliteSims = gradedMATADOR( ...
        averageSampledModels(i).disModel,   ...
        targetFile,                         ...
        averageSampledModels(i).diffFBS,    ...
        averageSampledModels(i).VRefDis,    ...
        gradeLevels,                        ...
        resultsPath, datasetName);
end
save('gradedMATADOR_Sensitivity.mat', 'gradedMATADOR_Sensitivity')

Output scores are written to <saveLoc><sampleName>_SensitivityAnalysis.xlsx with columns Metabolite, MATADOR_0, MATADOR_25, MATADOR_50, MATADOR_75.

Function Reference

`MATADOR(model, simMetab, KOrxn, rxnFBS, VRefDis)`

Core binary perturbation engine. Fully blocks all consuming reactions of one metabolite, solves a MOMA QP, and returns the similarity score to the healthy reference.

Argument	Type	Description
`model`	struct	COBRA model
`simMetab`	string	Metabolite name (for logging)
`KOrxn`	cell/char	Reaction IDs to block
`rxnFBS`	numeric vector	Forward/Backward/Unchanged labels per reaction
`VRefDis`	numeric vector	Disease reference flux distribution

Returns antiMetabScore.MATADOR_Score (scalar) and antiMetabScore.fluxMOMA (flux vector).

`averageMATADOR(dModel, files, diffFBS, VRefDis, saveLoc, sampleName)`

Loops over all metabolites in the target file and calls both MATADOR (MOMA-based) and rMTA_Metabolite in parallel. Writes scores to <sampleName>.xlsx.

Argument	Description
`dModel`	Disease-state COBRA model
`files`	Path to the metabolite target `.xlsx` file
`diffFBS`	Forward/Backward/Unchanged vector
`VRefDis`	Disease reference flux distribution
`saveLoc`	Output directory
`sampleName`	Output file prefix

Output struct fields: aDAT_ScoresTable, RunTime, ConsumingRxnsIndices.

`mainGradedMATADOR(model, simMetab, KOrxn, rxnFBS, gradeLevel, VRefDis)`

Core graded perturbation engine. Scales reaction bounds by gradeLevel (0 = full block, 1 = unperturbed) instead of zeroing them completely.

Argument	Description
`gradeLevel`	Scalar in [0, 1]; fraction of original bounds retained

All other arguments are identical to MATADOR. Returns antiMetab.MATADOR_Score and antiMetab.fluxMATADOR.

`gradedMATADOR(dModel, files, diffFBS, VRefDis, gradeLevels, saveLoc, sampleName)`

Iterates over metabolites × grade levels, calling mainGradedMATADOR for each combination. Results are assembled into a metabolite × grade score matrix.

`generatePersonalizedModels(model, expressionData)`

Builds iMAT-based context-specific models for every sample in every dataset. Thresholds are derived from the pooled (control + disease) expression distribution (25th and 75th percentiles) to ensure cross-condition comparability.

`iMATCustom(model, expressionRxns, threshold_lb, threshold_ub, ...)`

Solves the iMAT MILP to extract a context-specific subnetwork. Reactions with expression above threshold_ub are in the high-confidence active set; those below threshold_lb are preferentially removed. Default MILP time limit is 60 s per sample.

`fisherTestRxn(model, personalizedModels, saveLocation)`

Runs Fisher's exact test on the binary reaction-activity matrices to identify reactions whose activation status differs significantly between disease and control.

Output Files

File	Contents
`expressionData.mat`	Loaded expression struct array
`degData_Personalized.mat`	Per-patient z-score DEG struct array
`personalizedMetNet_iMAT.mat`	iMAT models and binary reaction matrices
`AMS_iMAT_FisherTest.xlsx`	Fisher test p-values and odds ratios per reaction
`<dataset>.xlsx` (average)	Sheets: `TScores` (MOMA + rMTA scores), `RunTime`
`<dataset>_SensitivityAnalysis.xlsx`	Graded scores: `MATADOR_0` through `MATADOR_75`
`final_MATADOR_results.mat`	All average and personalized simulation results
`gradedMATADOR_Sensitivity.mat`	All graded simulation results

Interpreting MATADOR Scores

MATADOR scores reflect the cosine-like similarity (computed by MATADOR_TS/aTS) between the flux state after metabolite inhibition and the healthy reference distribution, weighted by rxnFBS.

Score	Interpretation
High positive	Blocking this metabolite pushes flux towards the healthy state — strong antimetabolite candidate
Near zero	Inhibition has minimal effect on network state
High negative	Blocking this metabolite pushes flux towards the disease state — an unlikely antimetabolite candidate
`-Inf`	QP infeasible, or flux norm < 1 (degenerate solution); metabolite likely essential to network feasibility under given constraints

For the graded analysis, a monotonically increasing score from grade 0 → 0.75 suggests that even partial inhibition is therapeutically meaningful, making the metabolite a more tractable drug target.

Notes and Troubleshooting

Parallel pool — both generatePersonalizedModels and averageMATADOR use parfor. Open a pool before calling them (parpool(N) where N ≤ available cores). If the Parallel Computing Toolbox is unavailable, replace parfor with for in the source files.

Solver — all QPs and MILPs require Gurobi. Call changeCobraSolver('gurobi', 'all') before any simulation step.

Random seeds — rng(2304.2024, 'twister') is set before model generation and rng(1107.2025, 'twister') before MATADOR runs to ensure reproducibility.

Missing reactions in target file — reactions listed in AssociatedConsuming that are absent from the model are silently skipped. If scores are unexpectedly -Inf for many metabolites, verify that the model version used in Steps 3–7 matches the one used to generate the target list in Step 6.

Human-GEM constraints — the default physiological bounds (biomass ≥ 0.0001, oxygen uptake ≤ −0.01, glucose uptake ≤ −0.01) are tuned for a generic human cell context. Adjust these to reflect your experimental system before running iMAT.

Sentinel value 65535 — this value may appear in upstream Fisher test output and should be treated as a missing or overflow marker in downstream analyses.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
README.md		README.md
mainGradedMATADOR_Sensitivity_Analysis.mlx		mainGradedMATADOR_Sensitivity_Analysis.mlx
main_MATADOR.mlx		main_MATADOR.mlx
prepareMetaboliteTargetList.R		prepareMetaboliteTargetList.R

Folders and files

Latest commit

History

Repository files navigation

MATADOR

Table of Contents

Requirements

Directory Structure

Input Data Format

Expression Data (Expression Data/*.xlsx)

Differential Expression Data (DEG Data/*_DEG.xlsx)

Metabolite Target List (Metabolite Targets/<DatasetName>_BetweennessTopList.xlsx)

Pipeline Overview

Step-by-Step Usage

Step 1 — Environment Setup

Step 2 — Load Expression and DEG Data

Average analysis — group-level DEG

Personalized analysis — per-patient z-score DEG

Step 3 — Prepare the Genome-Scale Model

Step 4 — Generate Personalised Metabolic Networks

Step 5 — Statistical Test on Reaction Activity

Step 6 — Extract Metabolite Target List (R)

Step 7 — Sample Metabolic Networks

Step 8 — Run MATADOR

Average Analysis

Personalized Analysis

Running Graded Sensitivity Analysis

Function Reference

MATADOR(model, simMetab, KOrxn, rxnFBS, VRefDis)

averageMATADOR(dModel, files, diffFBS, VRefDis, saveLoc, sampleName)

mainGradedMATADOR(model, simMetab, KOrxn, rxnFBS, gradeLevel, VRefDis)

gradedMATADOR(dModel, files, diffFBS, VRefDis, gradeLevels, saveLoc, sampleName)

generatePersonalizedModels(model, expressionData)

iMATCustom(model, expressionRxns, threshold_lb, threshold_ub, ...)

fisherTestRxn(model, personalizedModels, saveLocation)

Output Files

Interpreting MATADOR Scores

Notes and Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Expression Data (`Expression Data/*.xlsx`)

Differential Expression Data (`DEG Data/*_DEG.xlsx`)

Metabolite Target List (`Metabolite Targets/<DatasetName>_BetweennessTopList.xlsx`)

`MATADOR(model, simMetab, KOrxn, rxnFBS, VRefDis)`

`averageMATADOR(dModel, files, diffFBS, VRefDis, saveLoc, sampleName)`

`mainGradedMATADOR(model, simMetab, KOrxn, rxnFBS, gradeLevel, VRefDis)`

`gradedMATADOR(dModel, files, diffFBS, VRefDis, gradeLevels, saveLoc, sampleName)`

`generatePersonalizedModels(model, expressionData)`

`iMATCustom(model, expressionRxns, threshold_lb, threshold_ub, ...)`

`fisherTestRxn(model, personalizedModels, saveLocation)`

Packages