Skip to content

C code to find a suitable linear order for a set of proteins

Notifications You must be signed in to change notification settings

dalmolingroup/seriation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About this software

This software solves the Seriation problem finding a suitable linear order for a set of proteins. The result is a list of proteins ordered in one dimension such that functionally associated proteins are closer.

figure Figure 1. Visual representation of the main output produced by this software. (A) Initial state of an adjacency matrix containing 4386 Saccharomyces cerevisiae proteins, the x-axis is randomly ordered. (B) Final state of the same adjacency matrix using the ordered protein list obtained. The interaction between two proteins is represented by a black dot.

Authors

The software was developed by Felipe Kuentzer, in collaboration with Douglas G. Ávila, Alexandre Pereira, Gabriel Perrone, Samoel da Silva, Alexandre Amory, and Rita de Almeida.

The version provided here was modified by Clovis Ferreira dos Reis to improve the textual feedback and to avoid bugs like:

  • Duplication of identifiers on the ordering output.
  • Segmentation fault while reading an input file containing many nodes.

Download and compilation

Compilation requires GCC. To compile this software invoke the following commands on the shell:

> wget https://github.com/arthurvinx/seriation/archive/master.zip
> unzip master.zip
> cd seriation-master/
> gcc ordering1D.c -o ordering1D -lm

How to use

To execute the software invoke this command on the shell:

> ./ordering1D f=[absolute path to association file]

Parameters list:

> ./ordering1D

An association file name is necessary! No default!

Parameters list:
        f=Association file
        i=Number of isothermal steps
        m=Number of Monte Carlo steps
        c=Cooling factor
        a=Alpha value
        p=Percentual energy for initial temperature
        s=Random seed

Parameters default values:

i=100
m=2000
c=0.5
a=1.0
p=0.0001

Input

The input is a text file describing an undirected protein-protein interaction (PPI) network. This repository contains an example file from Escherichia coli. In this example, the nodes are labeled by ENSEMBL Peptide IDs.

Protein-protein interaction network data can be downloaded from STRING. You may choose to download the information with the subscores per channel and tune your filters. The input must be a file containing two columns, no header, with rows composed by the IDs of two proteins that interact with each other.

Outputs

Two text files will be saved in the association file directory, one containing the prefix "energy_" detailing the ordering process, and one containing the prefix "ordering_" (this will be your ordered list). The lower the final energy, the better the ordered list. I suggest to increase the number of Monte Carlo steps to 20000 to improve the outputs.

This repository contains an example of the output produced by this software for the Escherichia coli PPI network.

License

The source code is distributed under the terms of the GNU General Public License v3 GPL.

How to cite this software

If you are using this software on your research please cite:

Similar softwares

About

C code to find a suitable linear order for a set of proteins

Topics

Resources

Stars

Watchers

Forks

Languages

  • C 100.0%