Skip to content

ChaorongC/MESA

Repository files navigation

MESA

Multimodal Epigenetic Sequencing Analysis (MESA) is a flexible and sensitive method of capturing and integrating multimodal epigenetic information of cfDNA using a single experimental assay.

@ Modified by: Chaorong Chen

@ Modified time: 2023-02-11 02:27:49e original MESA paper, please refer to this tutorial: https://rpubs.com/LiYumei/926228.

Dependencies

  • Python >=3.6
  • deepTools
  • bedtools
  • DANPOS2
  • BSMAP
  • UCSC tools
  • Python Package
    • pandas
    • numpy
    • scikit-learn = 0.24.2
    • joblib
    • itertools
    • boruta_py
    • deep-forest

Installation

Clone the repository with git:

git clone https://github.com/ChaorongC/MESA
cd MESA

Or download the repository with wget:

wget https://github.com/ChaorongC/MESA/archive/refs/heads/main.zip
unzip MESA-main.zip
cd MESA-main

Usage

The Python script MESA.py in the root directory is the main program for MESA. The function MESA_single() in 'MESA.py' is for analysis on a single type of feature, and the function MESA_integration() is for combining results on different types of features and returning the multimodal prediction result.

Example

Check the Jupyter notebook demo.ipynb for a tutorial on how to run MESA.

Parameters

MESA_single(X,
        y,
        estimator,
        classifiers=[],
        cv=5,
        random_state=0,
        min_feature=10,
        n_jobs=-1,
        scoring='roc_auc',
        boruta_top_n_feature=1000)

X : dataframe of shape (n_features, n_samples)

Input samples. A matrix containing features as rows with samples as columns.

y : array-like of shape (n_samples,)

Target values/labels/stages. Usually, we use 0 and 1 for 'normal/negative' and 'cancer/positive' samples.

estimator : estimator object/model implementing ‘fit’

The object used to fit the data. A model that is used to evaluate feature subsets in each iteration of sequential backward selection.

classifiers : a list of estimator object/model implementing ‘fit’ and 'predict_proba'

The object to use to evalutate on test set at the end. A model used to train on the final selected feature subset then test on the testing set.

cv : int, cross-validation generator or an iterable, default=5

(Adopted from sklearn.model_selection.cross_val_score) Determines the cross-validation splitting strategy. Possible inputs for cv are: None, to use the default 5-fold cross validation; int, to specify the number of folds in a (Stratified)KFold; CV splitter, An iterable yielding (train, test) splits as arrays of indices.

random_state : int, RandomState instance or None, default=0

Controls the pseudo random number generation for shuffling the data.

__min_feature : int, default=10

The minimal feature size SBS should consider.

n_jobs : int, default=-1

Number of jobs to run in parallel. When evaluating a new feature to add or remove, the cross-validation procedure is parallel over the folds. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

scoring : str or callable, default='roc_auc'

For SBS process, a str (see scikit-learn model evaluation documentation) or a scorer callable object/function with signature scorer(estimator, X, y) which should return only a single value. Compatible with sklearn.model_selection.cross_val_score.

boruta_top_n_feature : int, default=1000

Features to select for SBS in the Boruta algorithm. Features are first ranked by Boruta then output for SBS for further selection.

MESA_integration(X_list, 
                  y, 
                  feature_selected, 
                  classifiers)

X : list of dataframes of shape (n_features, n_samples)

Input samples. A matrix containing features as rows with samples as columns.

y : array-like of shape (n_samples,)

Target values/labels/stages. Usually, we use 0 and 1 for 'normal/negative' and 'cancer/positive' samples.

feature_selected : list of tuples (n_samples)

Features selected for each LOO iteration (same order with X)

classifiers : a list of estimator object/model implementing ‘fit’ and 'predict_proba'

The object to use to evalutate on test set at the end.

Authors

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published