Skip to content

SysBioGTU/MATADOR

Repository files navigation

MATADOR

Metabolite-centric Analysis of TArgets for Drug ORientation

MATADOR is a MATLAB-based computational pipeline for identifying antimetabolite drug targets from transcriptomic data and genome-scale metabolic models (GSMMs). It integrates iMAT-based model contextualisation, network sampling, and MOMA-based flux perturbation to score the metabolic impact of blocking a metabolite's consuming reactions—ranking candidate antimetabolite targets by their ability to push the disease network away from a healthy reference state.

Two analysis modes are provided:

  • Average analysis — cohort-level targets derived from group-averaged flux distributions and differential expression
  • Personalized analysis — patient-level targets derived from individual-sample models and z-score-based differential expression
  • Graded sensitivity analysis — sweeps fractional inhibition levels (0%, 25%, 50%, 75%, 100%) to characterise dose-response behaviour

Table of Contents


Requirements

Dependency Version tested
MATLAB R2023b (v23.2)
Gurobi v11.0
COBRA Toolbox v3.1
RAVEN Toolbox v2.8.7
Human-GEM v1.15.0
Parallel Computing Toolbox any
R v4.4.1

All COBRA and RAVEN toolbox paths must be reachable from the MATLAB path before running the pipeline. Gurobi must be configured as the default COBRA solver.


Directory Structure

project/
├── src/                              # MATADOR source functions
│   ├── MATADOR.mlx
│   ├── mainGradedMATADOR.mlx
│   ├── averageMATADOR.mlx
│   ├── gradedMATADOR.mlx
│   ├── generatePersonalizedModels.mlx
│   ├── iMATCustom.mlx
│   └── fisherTestRxn.mlx
├── cobratoolbox-master/              # COBRA Toolbox installation
├── RAVEN-main/                       # RAVEN Toolbox installation
├── Human-GEM-main/                   # Human-GEM model files
│   ├── Human-GEM.mat
│   ├── GPRs/
│   └── tINIT/
├── Expression Data/                  # Input: gene expression xlsx files
├── DEG Data/                         # Input: differential expression xlsx files
├── Metabolite Reports/
│   ├── Metabolite Targets/           # Input: target lists from R pipeline
│   └── Antimetabolite Analysis/      # Output: MATADOR score tables
├── main_MATADOR.mlx                    # Entry point: average + personalized
└── gradedMATADOR_Sensitivity_Analysis.mlx  # Entry point: graded analysis

Input Data Format

Expression Data (Expression Data/*.xlsx)

One .xlsx file per dataset. Each file must contain two sheets:

  • Sheet 1 — Control/healthy samples
  • Sheet 2 — Disease samples

Both sheets share the same structure:

GeneID Sample_1 Sample_2
ENSG00000001 5.23 4.87

Gene expression values should be pre-normalised on a consistent scale (TPM, FPKM, or log2-transformed counts). Gene IDs must match those in the Human-GEM model.

Differential Expression Data (DEG Data/*_DEG.xlsx)

One .xlsx file per dataset, named <DatasetName>_DEG.xlsx. Three columns are required:

gene logFC pval
ENSG00000001 1.52 0.003

For the average analysis, these are LIMMA-computed group-level statistics. For the personalized analysis, per-patient z-scores and two-tailed p-values are computed automatically from the expression matrices.

Metabolite Target List (Metabolite Targets/<DatasetName>_BetweennessTopList.xlsx)

Generated by the R script prepareMetaboliteTargetList.R (Step 6). Must contain at minimum:

Metabolite AssociatedConsumingRxn
L-glutamine RE2626N, RE2622C, r0392

Each row is one candidate metabolite; AssociatedConsuming is a comma-separated list of reaction IDs in the model that consume that metabolite.


Pipeline Overview

Expression Data  ──┐
DEG Data         ──┤
                   ▼
          [Step 2] Load omics data
                   │
                   ▼
          [Step 3] Load & constrain Human-GEM
                   │
                   ▼
          [Step 4] generatePersonalizedModels()
                   │  iMAT per sample → binary reaction matrix
                   ▼
          [Step 5] fisherTestRxn()
                   │  Fisher's exact test per reaction
                   ▼
          [Step 6] prepareMetaboliteTargetList.R  ← (R script)
                   │  Centrality-based target ranking
                   ▼
          [Step 7] sampleMetNetworks()
                   │  Riemannian-Hamiltonian Monte Carlo (RHMC) flux sampling
                   ▼
          [Step 8a] averageMATADOR()    ← cohort-level scores
          [Step 8b] personalizedMATADOR() ← patient-level scores
                   │
                   ▼
          final_MATADOR_results.mat + per-dataset .xlsx files

Step-by-Step Usage

Step 1 — Environment Setup

Run at the start of any session, or place at the top of your entry-point script. The .m files in src/m_files can be used when MATLAB live editor for .mlx files is not conventient.

clearvars;
addpath('./src/')
addpath('./cobratoolbox-master')
addpath('./RAVEN-main/installation')
addpath('./Human-GEM-main/GPRs')
addpath('./Human-GEM-main/tINIT')

initCobraToolbox
checkInstallation

Step 2 — Load Expression and DEG Data

Average analysis — group-level DEG

% --- Expression data ---
expressionData = struct;
fileList1 = dir(fullfile('Expression Data', '*.xlsx'));
for i = 1:numel(fileList1)
    expressionData(i).OmicData = strrep(fileList1(i).name, '.xlsx', '');
    sheets = sheetnames(fullfile(fileList1(i).folder, fileList1(i).name));
    data1 = readtable(fullfile(fileList1(i).folder, fileList1(i).name), 'Sheet', sheets{1});
    expressionData(i).Genes   = string(data1.GeneID);
    expressionData(i).Control = table2array(data1(:, 2:end));
    data1 = readtable(fullfile(fileList1(i).folder, fileList1(i).name), 'Sheet', sheets{2});
    expressionData(i).Disease = table2array(data1(:, 2:end));
end
save('expressionData.mat', 'expressionData')

% --- Group-level DEG (LIMMA output) ---
degData_Average = struct;
fileList1 = dir(fullfile('DEG Data/', '*.xlsx'));
for i = 1:numel(fileList1)
    degData_Average(i).OmicData = strrep(fileList1(i).name, '_DEG.xlsx', '');
    diffExpr = readtable(fullfile(fileList1(i).folder, fileList1(i).name));
    diffExpr = diffExpr(:, [1, 2, 3]);
    diffExpr.Properties.VariableNames = {'gene', 'logFC', 'pval'};
    degData_Average(i).diffExpr = diffExpr;
end

Personalized analysis — per-patient z-score DEG

% SampleName must match column headers in the disease expression sheet
SampleName = readtable('Expression Data/ROSMAP.xlsx', ...
    'ReadVariableNames', true, 'Sheet', 'AD');
SampleName = SampleName.Properties.VariableNames;
SampleName(1) = [];   % remove the GeneID column header

degData_Personalized = struct;
controlExp = expressionData(1).Control;   % adjust index as needed

for i = 1:size(expressionData(1).Disease, 2)
    aveExpr  = mean([expressionData(1).Disease(:, i), controlExp], 2);
    sdExpr   = std([expressionData(1).Disease(:, i), controlExp], 0, 2);
    zExpr    = (expressionData(1).Disease(:, i) - aveExpr) ./ sdExpr;
    pTwoTailed = 2 * (1 - normcdf(abs(zExpr)));
    degData_Personalized(i).PatientID = SampleName(i);
    diffExpr  = table(expressionData(1).Genes, zExpr, pTwoTailed);
    diffExpr.Properties.VariableNames = {'gene', 'logFC', 'pval'};
    degData_Personalized(i).diffExpr = diffExpr;
end
save('degData_Personalized.mat', 'degData_Personalized')

Step 3 — Prepare the Genome-Scale Model

gemPath     = './Human-GEM-main/Human-GEM.mat';
ihuman      = load(gemPath).ihuman;
curatedModel = ravenCobraWrapper(ihuman);

% Apply physiological flux constraints
curatedModel.lb(strcmp('MAR13082', curatedModel.rxns)) = 0.0001;  % biomass minimum
curatedModel.ub(strcmp('MAR09048', curatedModel.rxns)) = -0.01;   % oxygen uptake
curatedModel.ub(strcmp('MAR09034', curatedModel.rxns)) = -0.01;   % glucose uptake

Adjust the bound values to reflect the physiology of your cell type or experimental condition.


Step 4 — Generate Personalised Metabolic Networks

This step runs iMAT on each individual sample (control and disease) to produce context-specific models and binary reaction-activity matrices. It is computationally intensive; open a parallel pool before calling it.

rng(2304.2024, 'twister');

if isempty(gcp('nocreate')), parpool(18); end   % adjust worker count to hardware
changeCobraSolver('gurobi', 'all');

personalizedMetNet_iMAT = generatePersonalizedModels(curatedModel, expressionData);
save('personalizedMetNet_iMAT.mat', 'personalizedMetNet_iMAT')

Output fields per dataset:

Field Description
OmicData Dataset identifier
healthyMod R × N_control binary matrix (1 = reaction active)
diseaseMod R × N_disease binary matrix
redModelsH Cell array of per-sample COBRA models (control)
redModelsD Cell array of per-sample COBRA models (disease)

Step 5 — Statistical Test on Reaction Activity

Fisher's exact test identifies reactions whose activity differs significantly between disease and control cohorts. Results are used downstream to derive rxnFBS (forward/backward/unchanged labels) for MATADOR scoring.

saveLocation = './';
fisherTestRxn(curatedModel, personalizedMetNet_iMAT, saveLocation)

Output is written to AMS_iMAT_FisherTest.xlsx with one sheet per dataset. Each sheet contains:

Reaction PValue OR ActiveCon (%) ActiveAD (%)

Step 6 — Extract Metabolite Target List (R)

This step runs outside MATLAB. Use prepareMetaboliteTargetList.R to build the metabolite-metabolite interaction network from the Fisher test output, compute betweenness and closeness centrality per network module, and export ranked candidate targets. The input files required to run this script can be downloaded from Zenodo. The current implemenation was performed in R v4.4.1 and the environment R package versions are provided in SRC folder.

The R script produces one .xlsx file per dataset in Metabolite Reports/Metabolite Targets/, named <DatasetName>_BetweennessTopList.xlsx. Refer to the R script documentation for parameters including currency metabolite exclusion lists and the topNPercent selection threshold.


Step 7 — Sample Metabolic Networks

Network sampling defines the disease reference flux distribution (VRefDis) and the rxnFBS vector required by MATADOR. Both average and personalized sampling use the same function, controlled by the mode flag.

changeCobraSolver('gurobi', 'all');

% Average (cohort-level)
averageSampledModels = sampleMetNetworks(curatedModel, personalizedMetNet_iMAT, ...
                                          degData_Average, 'AVERAGE', 10);

% Personalized (patient-level)
personalizedSampledModels = sampleMetNetworks(curatedModel, personalizedMetNet_iMAT, ...
                                               degData_Personalized, 'PERSONALIZED', 10);

The fifth argument (10) controls the number of sampling iterations. Each element of the output struct provides .disModel, .diffFBS, .VRefDis, and .VRefCon fields consumed by averageMATADOR and personalizedMATADOR in the next step.


Step 8 — Run MATADOR

Set a fixed random seed and output directories, then loop over datasets.

rng(1107.2025, 'twister');
targetPath  = 'Metabolite Reports/Metabolite Targets/';
resultsPath = 'Metabolite Reports/Antimetabolite Analysis/';
if ~exist(resultsPath, 'dir'), mkdir(resultsPath); end

Average Analysis

averageMATADOR_Simulation = struct();
for i = 1:numel(averageSampledModels)
    datasetName = averageSampledModels(i).OmicData;
    targetFile  = fullfile(targetPath, [datasetName, '_BetweennessTopList.xlsx']);

    averageMATADOR_Simulation(i).DataSet      = datasetName;
    averageMATADOR_Simulation(i).MetaboliteSims = averageMATADOR( ...
        averageSampledModels(i).disModel,  ...
        targetFile,                        ...
        averageSampledModels(i).diffFBS,   ...
        averageSampledModels(i).VRefDis,   ...
        averageSampledModels(i).VRefCon,   ...
        resultsPath, datasetName);
end

Personalized Analysis

persResultsPath = 'Metabolite Reports/Antimetabolite Analysis Personalized/';
if ~exist(persResultsPath, 'dir'), mkdir(persResultsPath); end

personalizedMATADOR_Simulation = struct();
for i = 1:numel(personalizedSampledModels)
    patientID  = personalizedSampledModels(i).PatientID;
    targetFile = fullfile(targetPath, '_BetweennessTopList.xlsx');

    personalizedMATADOR_Simulation(i).PatientID      = patientID;
    personalizedMATADOR_Simulation(i).MetaboliteSims = personalizedMATADOR( ...
        personalizedSampledModels(i).disModel, ...
        targetFile,                            ...
        personalizedSampledModels(i).diffFBS,  ...
        personalizedSampledModels(i).VRefDis,  ...
        persResultsPath, patientID);
end

save('final_MATADOR_results.mat', 'averageMATADOR_Simulation', 'personalizedMATADOR_Simulation')

Running Graded Sensitivity Analysis

The graded variant sweeps fractional inhibition levels instead of applying a complete block. Use gradedMATADOR_Sensitivity_Analysis.m as the entry point, or call gradedMATADOR directly.

rng(1107.2025, 'twister');
gradeLevels = [0, 0.25, 0.50, 0.75];   % 0%, 25%, 50%, 75% inhibition

gradedMATADOR_Sensitivity = struct();
for i = 1:numel(averageSampledModels)
    datasetName = averageSampledModels(i).OmicData;
    targetFile  = fullfile(targetPath, [datasetName, '.xlsx']);

    gradedMATADOR_Sensitivity(i).DataSet        = datasetName;
    gradedMATADOR_Sensitivity(i).MetaboliteSims = gradedMATADOR( ...
        averageSampledModels(i).disModel,   ...
        targetFile,                         ...
        averageSampledModels(i).diffFBS,    ...
        averageSampledModels(i).VRefDis,    ...
        gradeLevels,                        ...
        resultsPath, datasetName);
end
save('gradedMATADOR_Sensitivity.mat', 'gradedMATADOR_Sensitivity')

Output scores are written to <saveLoc><sampleName>_SensitivityAnalysis.xlsx with columns Metabolite, MATADOR_0, MATADOR_25, MATADOR_50, MATADOR_75.


Function Reference

MATADOR(model, simMetab, KOrxn, rxnFBS, VRefDis)

Core binary perturbation engine. Fully blocks all consuming reactions of one metabolite, solves a MOMA QP, and returns the similarity score to the healthy reference.

Argument Type Description
model struct COBRA model
simMetab string Metabolite name (for logging)
KOrxn cell/char Reaction IDs to block
rxnFBS numeric vector Forward/Backward/Unchanged labels per reaction
VRefDis numeric vector Disease reference flux distribution

Returns antiMetabScore.MATADOR_Score (scalar) and antiMetabScore.fluxMOMA (flux vector).


averageMATADOR(dModel, files, diffFBS, VRefDis, saveLoc, sampleName)

Loops over all metabolites in the target file and calls both MATADOR (MOMA-based) and rMTA_Metabolite in parallel. Writes scores to <sampleName>.xlsx.

Argument Description
dModel Disease-state COBRA model
files Path to the metabolite target .xlsx file
diffFBS Forward/Backward/Unchanged vector
VRefDis Disease reference flux distribution
saveLoc Output directory
sampleName Output file prefix

Output struct fields: aDAT_ScoresTable, RunTime, ConsumingRxnsIndices.


mainGradedMATADOR(model, simMetab, KOrxn, rxnFBS, gradeLevel, VRefDis)

Core graded perturbation engine. Scales reaction bounds by gradeLevel (0 = full block, 1 = unperturbed) instead of zeroing them completely.

Argument Description
gradeLevel Scalar in [0, 1]; fraction of original bounds retained

All other arguments are identical to MATADOR. Returns antiMetab.MATADOR_Score and antiMetab.fluxMATADOR.


gradedMATADOR(dModel, files, diffFBS, VRefDis, gradeLevels, saveLoc, sampleName)

Iterates over metabolites × grade levels, calling mainGradedMATADOR for each combination. Results are assembled into a metabolite × grade score matrix.


generatePersonalizedModels(model, expressionData)

Builds iMAT-based context-specific models for every sample in every dataset. Thresholds are derived from the pooled (control + disease) expression distribution (25th and 75th percentiles) to ensure cross-condition comparability.


iMATCustom(model, expressionRxns, threshold_lb, threshold_ub, ...)

Solves the iMAT MILP to extract a context-specific subnetwork. Reactions with expression above threshold_ub are in the high-confidence active set; those below threshold_lb are preferentially removed. Default MILP time limit is 60 s per sample.


fisherTestRxn(model, personalizedModels, saveLocation)

Runs Fisher's exact test on the binary reaction-activity matrices to identify reactions whose activation status differs significantly between disease and control.


Output Files

File Contents
expressionData.mat Loaded expression struct array
degData_Personalized.mat Per-patient z-score DEG struct array
personalizedMetNet_iMAT.mat iMAT models and binary reaction matrices
AMS_iMAT_FisherTest.xlsx Fisher test p-values and odds ratios per reaction
<dataset>.xlsx (average) Sheets: TScores (MOMA + rMTA scores), RunTime
<dataset>_SensitivityAnalysis.xlsx Graded scores: MATADOR_0 through MATADOR_75
final_MATADOR_results.mat All average and personalized simulation results
gradedMATADOR_Sensitivity.mat All graded simulation results

Interpreting MATADOR Scores

MATADOR scores reflect the cosine-like similarity (computed by MATADOR_TS/aTS) between the flux state after metabolite inhibition and the healthy reference distribution, weighted by rxnFBS.

Score Interpretation
High positive Blocking this metabolite pushes flux towards the healthy state — strong antimetabolite candidate
Near zero Inhibition has minimal effect on network state
High negative Blocking this metabolite pushes flux towards the disease state — an unlikely antimetabolite candidate
-Inf QP infeasible, or flux norm < 1 (degenerate solution); metabolite likely essential to network feasibility under given constraints

For the graded analysis, a monotonically increasing score from grade 0 → 0.75 suggests that even partial inhibition is therapeutically meaningful, making the metabolite a more tractable drug target.


Notes and Troubleshooting

Parallel pool — both generatePersonalizedModels and averageMATADOR use parfor. Open a pool before calling them (parpool(N) where N ≤ available cores). If the Parallel Computing Toolbox is unavailable, replace parfor with for in the source files.

Solver — all QPs and MILPs require Gurobi. Call changeCobraSolver('gurobi', 'all') before any simulation step.

Random seedsrng(2304.2024, 'twister') is set before model generation and rng(1107.2025, 'twister') before MATADOR runs to ensure reproducibility.

Missing reactions in target file — reactions listed in AssociatedConsuming that are absent from the model are silently skipped. If scores are unexpectedly -Inf for many metabolites, verify that the model version used in Steps 3–7 matches the one used to generate the target list in Step 6.

Human-GEM constraints — the default physiological bounds (biomass ≥ 0.0001, oxygen uptake ≤ −0.01, glucose uptake ≤ −0.01) are tuned for a generic human cell context. Adjust these to reflect your experimental system before running iMAT.

Sentinel value 65535 — this value may appear in upstream Fisher test output and should be treated as a missing or overflow marker in downstream analyses.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors