DeepHEM: A Novel Deep Domain-Adversarial Learning Framework for Identifying Human Essential MiRNAs

Overview

DeepHEM is a deep learning framework designed for identifying human essential miRNAs. This repository contains the source code, data, and experimental results for the paper "DeepHEM: A Novel Deep Domain-Adversarial Learning Framework for Identifying Human Essential MiRNAs". The framework leverages domain adaptation techniques to transfer knowledge from well-studied species (e.g., mouse) to less-studied species (e.g., human), enabling accurate essentiality miRNA identification even with limited labeled data.

Project Structure

The repository is organized into the following main directories:

ablation studies/: Contains code and results for ablation experiments investigating the impact of distribution alignment loss, loss function combination, multi-modal fusion method, and feature representation module.
biological correlation analysis/: Includes scripts for analyzing the biological correlation of prediction results, such as trend boxplots and correlation matrix diagrams.
case study/: Contains code and data for in-depth case studies of specific human miRNAs.
comparative analysis experiment/: Includes implementations of comparison methods (DAN, DDC, TCA) and the proposed DeepHEM method.
data/: Contains raw and processed data used in the experiments.

Requirements

To run the code, you need the following dependencies:

Python 3.7+
PyTorch 1.7+
pandas
numpy
scikit-learn
matplotlib
scipy
openpyxl (Excel I/O)
seaborn (for plots in correlation analysis)

Installation

Clone the repository:

git clone https://github.com/SniperWong/DeepHEM.git
cd DeepHEM

Install the required dependencies:

pip install -r requirements.txt

Data Preparation

1. Dataset Description

The dataset consists of miRNA sequences and features from two species:

Source domain: Mouse (Mus musculus) miRNA data with known essentiality labels
Target domain: Human (Homo sapiens) miRNA data for essentiality identification

2. Data Files

All data files are located in the data/ directory:

source_data.xlsx: Mouse miRNA sequences
mmu_label.xlsx: Essentiality labels for mouse miRNAs
target_data.xlsx: Human miRNA sequences
mmu_feature.xlsx: Inherent features for mouse miRNAs
hsa_feature.xlsx: Inherent features for human miRNAs
mmu_mti_feature.xlsx: miRNA-target gene interaction features for mouse
hsa_mti_feature.xlsx: miRNA-target gene interaction features for human
Conservation score.xlsx: Conservation scores of human miRNAs
DSW score.xlsx: Disease spectrum width score (DSW) of human miRNAs

3. Preprocessing Steps

The preprocessing pipeline includes the following steps (already implemented inside the training scripts):

Read miRNA sequences from Excel files using data_helper.read_mirna_sequences.
- Expected columns: first column is ID, second column is sequence string.
Extract sequence features using 3-mer frequency with sklearn.feature_extraction.text.CountVectorizer (k=3).

Model Training

1. Model Architecture

DeepHEM consists of the following key components:

Feature extractor: Transformer-based architecture for extracting representations from miRNA sequences; MLP for inherent and MTI-based features for miRNAs
Classifier: Multi-layer perceptron (MLP) for function prediction
Domain discriminator: Adversarial network for domain adaptation
Feature fusion module: Concatenation for fusing different types of features

2. Training Configuration

Optimizer: Adam with learning rate 0.001
Epochs: 100
Batch size: 64
Loss functions: Combination of classification loss, MMD (Maximum Mean Discrepancy) loss, and adversarial loss

3. Running procedure

load/preprocess the data, 2) train the model, 3) save best weights, 4) generate predictions for all human miRNAs, and 5) compute biological correlations.

Evaluation

1. Evaluation Metrics

The model's performance is evaluated using the following metrics:

Pearson correlation: Measures correlation and p_values between predicted scores and conservation scores
Spearman correlation: Measures rank correlation between predicted scores and conservation scores
Kendall correlation: Non-parametric measure of rank correlation

2. Running Evaluation

After training, the model automatically evaluates its performance on the target domain data. The evaluation results include:

Predictions for all human miRNAs
Correlation analysis with biological scores (DSW and conservation scores)

3. Ablation Studies

To reproduce the ablation studies, navigate to the ablation studies/ subfolders and run the corresponding scripts. Note: some filenames contain Chinese parentheses and spaces; always wrap in quotes on Windows.

Feature ablation (different feature combinations):

cd "ablation studies/test--feature"
python "main（only seq）.py"
python "main（only MTI）.py"
python "main（only 18）.py"
python "main（seq+MTI）.py"
python "main（seq+18）.py"
python "main（MTI+18） .py"

Fusion method ablation:

cd "ablation studies/test--fusion"
python "main（add）.py"
python "main（cross attention）.py"

Kernel ablation (MMD variants):

cd "ablation studies/test--kernel"
python "main（MK-MMD）.py"
python "main（SK-MMD）.py"

Loss function ablation:

cd "ablation studies/test--loss"
python "main（mmd+cl）.py"
python "main（adv+cl）.py"

Comparative Analysis Experiment

This folder provides implementations and runnable entry points for the comparison methods evaluated alongside DeepHEM.

Directory: comparative analysis experiment/code/
- DAN/ (Deep Adversarial Network baseline)
- DDC/ (Deep Domain Confusion baseline)
- TCA/ (Transfer Component Analysis baseline)

cd "comparative analysis experiment/code/DAN"
python main.py

cd "comparative analysis experiment/code/DDC"
python main.py

cd "comparative analysis experiment/code/TCA"
python main.py

Required inputs (expected in the method directory you run from, or adjust paths in the scripts):

source_data.xlsx, source_label.xlsx, target_data.xlsx
mmu_feature.xlsx, hsa_feature.xlsx (18 handcrafted features)
mmu_feature_pca.xlsx, hsa_feature_pca.xlsx (MTI PCA features)
Biological correlation files for evaluation: hsa_dsw_score_1913.txt, dsw_name_1913.txt, hsa_conserv_score_1913.txt, conservation_name_1913.txt, pred_name_1913.txt

Biological Correlation Analysis

To analyze the biological relevance of the prediction results:

Correlation matrix:

cd biological correlation analysis/plot correlation matrix
python "Drawing Correlation Matrix Diagram.py"

Trend boxplots:

cd biological correlation analysis/draw trend boxplot
python "Drawing the trend boxplots.py"

Case Study

To run the case study analysis:

cd "case study"
python casestudy.py

Reproducibility

To ensure reproducibility of the results reported in the paper:

For biological correlation analysis，ablation studies，comparative analysis experiment, and case study, use the specific scripts and data provided in each subdirectory
Run the DeepHEM scripts with default parameters

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepHEM: A Novel Deep Domain-Adversarial Learning Framework for Identifying Human Essential MiRNAs

Overview

Project Structure

Requirements

Installation

Data Preparation

1. Dataset Description

2. Data Files

3. Preprocessing Steps

Model Training

1. Model Architecture

2. Training Configuration

3. Running procedure

Evaluation

1. Evaluation Metrics

2. Running Evaluation

3. Ablation Studies

Comparative Analysis Experiment

Biological Correlation Analysis

Case Study

Reproducibility

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ablation studies		ablation studies
biological correlation analysis		biological correlation analysis
case study		case study
comparative analysis experiment/code		comparative analysis experiment/code
data		data
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

DeepHEM: A Novel Deep Domain-Adversarial Learning Framework for Identifying Human Essential MiRNAs

Overview

Project Structure

Requirements

Installation

Data Preparation

1. Dataset Description

2. Data Files

3. Preprocessing Steps

Model Training

1. Model Architecture

2. Training Configuration

3. Running procedure

Evaluation

1. Evaluation Metrics

2. Running Evaluation

3. Ablation Studies

Comparative Analysis Experiment

Biological Correlation Analysis

Case Study

Reproducibility

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages