Utilizing Deep Learning to Uncover Antimicrobial Peptides and Investigating their Associations with Diverse Diseases

As pathogens start to become more resistant to our current antibiotics, we will soon need new antibiotics to fight various forms of bacteria. By 2050, drug-resistant pathogens are expected to be the leading cause of death in the world (O’Neill 2016). One way to create new antibiotics is to identify pep- tides, sequences of amino acids, that have antimicrobial properties. These are commonly known as antimicrobial peptides(AMP). These AMPs are known to regulate inflammation, kill certain types of cancer cells, and fight various in- fections and diseases. In this project, we use an Attention model to classify peptides as AMPs or non-AMPs. We then run this model on a dataset called FINRISK, which contains both DNA and health data on a randomly selected group of Finnish people. Using the output of our model and the health data in FINRISK, we performed a Pearson correlation test between having a spe- cific AMP and having a disease listed in the FINRISK health data. We found an average Pearson correlation coefficient of 0.5456 between having 4 pep- tides and having COPD. We will be testing these peptides in the wet lab in the coming weeks.

Barnacle2 Workflow

Setting up conda environment:

Only required if you do not have a conda environment set up already.

(1) After signing in to Barnacle2, run these commands:

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
source ~/.bashrc

(2) run

vi ~/.bash_profile

then press i to enter insert mode, and paste the following code

if [ -f ~/.bashrc ]; then
    . ~/.bashrc
fi

hit esc, type":wq" to save and exit file

(3) finally, run

source ~/.bash_profile

Now you have a working conda environment on barnacle2.

Running the project

Initlize environment

To create the environment, run the following command from the root directory of the project.

conda -m venv capstone python==3.8
conda activate capstone

Installation of dependencies

To install dependencies, run the following command from the same terminal

pip install -r requirements.txt
conda install cudatoolkit==11.2.2
conda install cudnn==8.8.0

Building the project stages using `run.py`

To process the data, from the root directory of the project, run python run.py pre-process
- This script unzips the gz file, converts it to a fasta format, does six-frame-translation and gets the orfs, and get the data into Attention model-ready format.
- To make sure that you can use multiprocessing to be time efficient, please run the following command:
```
srun --mem 32g -N 1 -c 32 --time 14:00:00 --pty bash -l
```
- please make sure that you are on base environment for pre-processing

To gain access to gpu, open up a new terminal and login, then run this command

srun --mem 100g -N 1 -c 32 --partition gpu --gres=gpu:1 --time 14:00:00 --pty bash -l

To do prediction and extract the predicted AMPs, from the project root dir, run python run.py prediction
- This divides the input file into 10 files(If the file is larger than 10GB), and run them separately through the model, it outputs a file which contains all the sequences that has been predicted AMPs.
- Please make sure that you are on capstone environment for prediction
The result will be exported into a txt file in the results folder of the root directory of the project
The processed files are very large (~5TB)
pre-process takes about 200 minutes
prediction takes about 180 minutes
Please notice that all of the finrisk data are protected and you will need certain permission to access, please change the paths in the scripts if you wish to run on your account.
To run correlation analysis and export the final results, run python run.py result, this is export the results in the folder Final_results

Reference

Models and part of the scripts are provided by Identification of antimicrobial peptides from the human gut microbiome using deep learning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Final_results

Final_results

Models

Models

results

results

scripts

scripts

.gitattributes

.gitattributes

README.md

README.md

requirements.txt

requirements.txt

run.py

run.py

Repository files navigation

Utilizing Deep Learning to Uncover Antimicrobial Peptides and Investigating their Associations with Diverse Diseases

Barnacle2 Workflow

Setting up conda environment:

Only required if you do not have a conda environment set up already.

Now you have a working conda environment on barnacle2.

Running the project

Initlize environment

Installation of dependencies

Building the project stages using `run.py`

Reference

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
Final_results		Final_results
Models		Models
results		results
scripts		scripts
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

YimingHao0730/project2

Folders and files

Latest commit

History

Repository files navigation

Utilizing Deep Learning to Uncover Antimicrobial Peptides and Investigating their Associations with Diverse Diseases

Barnacle2 Workflow

Setting up conda environment:

Only required if you do not have a conda environment set up already.

Now you have a working conda environment on barnacle2.

Running the project

Initlize environment

Installation of dependencies

Building the project stages using run.py

Reference

About

Resources

Stars

Watchers

Forks

Languages

Building the project stages using `run.py`