Skip to content
View DeepARG-Attention's full-sized avatar
  • Joined Dec 19, 2025

Block or report DeepARG-Attention

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
DeepARG-Attention/README.md

DeepARG-Attention: Interpretable ARG Identification in Metagenomics License: MIT Python PyTorch Status

Note: This repository serves as the implementation workspace for the undergraduate thesis research proposal: "Research on Deep Learning and Attention Mechanism-based Identification and Classification of Antibiotic Resistance Genes in Metagenomics".

🧬 Project Overview Antimicrobial Resistance (AMR) is a top global public health threat. Metagenomic sequencing allows for the direct detection of Antibiotic Resistance Genes (ARGs) from environmental samples. However, existing alignment-based tools (e.g., BLAST) and early deep learning models struggle with remote homology search (low sequence similarity) and lack interpretability.

This project proposes a CNN-Attention Hybrid Architecture designed to:

Surpass SOTA performance (DeepARG, HMD-ARG) on short metagenomic reads. Visualize active sites using Attention Mechanisms to explain why a gene is classified as resistant. Handle data imbalance effectively using advanced loss functions. 🏗️ Proposed Architecture The model utilizes a "Hybrid" approach, combining the local feature extraction capabilities of Convolutional Neural Networks (CNNs) with the global context awareness of Self-Attention mechanisms.

mermaid DNA Sequence / One-hot

1D-CNN Layers

ResNet Blocks

Multi-Head Self-Attention

Global Pooling

Fully Connected Layers

Output: ARG Class

Figure 1: Conceptual Architecture (Work in Progress)

🎯 Key Features (Planned) End-to-End Learning: Direct input from DNA sequences (ACGT) without manual feature engineering. Attention Maps: Generate visualization of high-weight regions corresponding to functional protein domains. Optimized for Short Reads: Robust classification for fragmented sequences (100-150bp) common in NGS data. Strict Curation: Training on a rigorously clustered dataset (CD-HIT) to prevent data leakage from homologous sequences. 🛠️ Tech Stack Language: Python 3.9 Deep Learning Framework: PyTorch Data Processing: Biopython, Pandas, NumPy Bioinformatics Tools: CD-HIT, DIAMOND, BLAST Visualization: Matplotlib, Seaborn (for Saliency Maps) 📅 Roadmap This project follows a 16-week development lifecycle as outlined in the research proposal.

Phase 1: Preparation (Weeks 1-3) Literature review (DeepARG, TRAC, HMD-ARG). Environment setup. Phase 2: Data Curation (Weeks 4-6) Integration of CARD and UniProt databases. Sequence clustering and dataset splitting (Train/Val/Test). Phase 3: Development (Weeks 7-10) Implementation of the CNN-Attention backbone. Hyperparameter tuning using Bayesian optimization. Phase 4: Evaluation (Weeks 11-13) Benchmarking against BLAST and DeepARG. Analysis of attention weights for interpretability. Phase 5: Publication (Weeks 14-16) Thesis writing and code documentation. 📚 References The development of this project is inspired by the following key works:

Li, Y., et al. (2018). HMD-ARG: hierarchical multi-task deep learning for annotating antibiotic resistance genes. Arango-Argoty, G., et al. (2018). DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. Author: Chenhao Guo Supervisor: Prof. Yu Li

Popular repositories Loading

  1. DeepARG-Attention DeepARG-Attention Public

    A CNN-Attention Hybrid Model for ARG identification in metagenomics (Research Proposal Implementation).