Introduction

nGIA is an accurate and fast gene sequence clustering too. It uses greedy incremental clustering algorithm, the similarity of sequences is determined by alignment, and filters are added before alignment. All algorithms are implemented on GPU, resulting in extremely fast.
The oneAPI version has stopped maintenance.
Only supports Nvidia RTX 30 or higher GPUs.

Install

Install MPICH.

Install MPICH and copy the library headers to "/usr/local/include", and libraries to "/usr/local/lib". If using the Ubuntu, it can be installed by the package manager.

sudo apt install mpich

Install CUDA and GPU Driver.

It can be downloaded from https://developer.nvidia.com/cuda-downloads.

Compile.

Using make to compile.

cd cuda/makeDB && make && cd -
cd cuda/cluster && make && cd -

Usage

Make a database.

cd cuda/makeDB && ./makeDB -f ../../data/gene.fasta -p ../../data/gene.packed -t 0 && cd -

-f: fasta file (gene or protein sequences)
-p: packed file (generated by makeDB)
-t: data type (0-gene 1-protein)

Do clustering

cd cuda/cluster && mpirun -n 1 ./cluster -p ../../data/gene.packed -r ../../data/result.txt -s 0.95 -m 0 && cd -

-p: packed file (generated by makeDB)
-r: result file (generated by cluster)
-s: similarity
-m: mode (0-fast 1-precise)

Result

The generated result file will look like this:

1>sequence1
2ACGT
3  >sequence2
4  >sequence3

The first and second lines do not have spaces at the beginning, so they are non-redundant sequences. The first line starts with '>', indicating that this is the sequence name, and the second line is the sequence content. The spaces at the beginning of lines 3 and 4 indicate that these are redundant sequences and are similar to the first line. Redundant sequences only list sequence names without sequence content.

Citing

Please cite the following publication if you use nGIA:

Ju Z, Zhang H, Meng J, et al. nGIA: A novel Greedy Incremental Alignment based algorithm for gene sequence clustering[J]. Future Generation Computer Systems, 2022, 136: 221-230.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
cuda		cuda
data		data
oneAPI		oneAPI
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Install

Usage

Result

Citing

About

Releases

Packages

Languages

License

SIAT-HPCC/gene-sequence-clustering

Folders and files

Latest commit

History

Repository files navigation

Introduction

Install

Usage

Result

Citing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages