Skip to content
Merged
182 changes: 182 additions & 0 deletions docs/source/lib/EFI/SSN/XgmmlReader.pm.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
XgmmlReader.pm
==============

Reference
---------


EFI::SSN::XgmmlReader
=====================



NAME
----

EFI::SSN::XgmmlReader - Perl utility module for extracting network
information from XGMML files



SYNOPSIS
--------

::

use EFI::SSN::XgmmlReader;

my $parser = EFI::SSN::XgmmlReader->new(xgmml_file => $ssnFile);
$parser->parse();

my $edgelist = $parser->getEdgeList();
my $indexSeqIdMap = $parser->getIndexSeqIdMap();
my $idIndexMap = $parser->getIdIndexMap();

map { print join(" ", @$_), "\n"; } @$edgelist;
map { print join("\t", $_, $indexSeqIdMap->{$_}), "\n"; } keys %$indexSeqIdMap;
map { print join("\t", $_, $idIndexMap->{$_}), "\n"; } sort keys %$idIndexMap;



DESCRIPTION
-----------

**EFI::SSN::XgmmlReader** is a Perl module for parsing XGMML (XML
format) files. Data that is saved includes an edgelist, node indices,
node IDs, and sequence IDs. SSN nodes are given an index number
(numerical) in the order in which they appear in the file. The edgelist
is composed of a pair of node indices. In addition to node indicies,
nodes also contain sequence IDs which are defined by the ``label``
attribute in a SSN ``node`` element. Node IDs may or may not be the same
as the sequence ID; the EFI tools output SSN files with the ``id`` and
``label`` attribute containing the same value, but XGMML tools such as
Cytoscape may not preserve that and will rather create their own node ID
(stored in the ``id`` attribute).



METHODS
-------



``new(xgmml_file => $ssnFile)``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Creates a new **EFI::SSN::XgmmlReader** object.



Parameters
^^^^^^^^^^

``xgmml_file``
Path to a SSN file in XGMML format (XML).



Returns
^^^^^^^

Returns an object.



Example Usage
^^^^^^^^^^^^^

::

my $parser = EFI::SSN::XgmmlReader->new(xgmml_file => $ssnFile);

``parse()``
~~~~~~~~~~~

Parses the XGMML file on a per-element basis. This method doesn't create
a DOM; rather it obtains information from each XML element as the file
is being parsed and builds an internal representation of an SSN as a
collection of arrays and hashes.



Example Usage
^^^^^^^^^^^^^

::

$parser->parse();



``getEdgeList()``
~~~~~~~~~~~~~~~~~

Gets the edgelist, which is a list of edges where each edge is defined
as a pair of node indices.



Returns
^^^^^^^

An array ref with each element being a two-element array ref of the
source and target node indices.



Example Usage
^^^^^^^^^^^^^

::

my $edgelist = $parser->getEdgeList();
map { print join(" ", @$_), "\n"; } @$edgelist;



``getIndexSeqIdMap()``
~~~~~~~~~~~~~~~~~~~~~~

Gets the structure that correlates node index to sequence ID.



Returns
^^^^^^^

A hash ref that maps node index to sequence ID (numeric -> string).



Example Usage
^^^^^^^^^^^^^

::

my $indexSeqIdMap = $parser->getIndexSeqIdMap();
map { print join("\t", $_, $indexSeqIdMap->{$_}), "\n"; } keys %$indexSeqIdMap;



``getIdIndexMap()``
~~~~~~~~~~~~~~~~~~~

Gets a mapping of node IDs (the ``id`` attribute in a SSN node) to node
index.



Returns
^^^^^^^

A hash ref mapping node ID (string) to node index (numeric)



Example Usage
^^^^^^^^^^^^^

::

my $idIndexMap = $parser->getIdIndexMap();
map { print join("\t", $_, $idIndexMap->{$_}), "\n"; } sort keys %$idIndexMap;
194 changes: 194 additions & 0 deletions docs/source/lib/EFI/SSN/XgmmlReader/IdList.pm.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
IdList.pm
=========

Reference
---------


EFI::SSN::XgmmlReader::IdList
=============================



NAME
----

EFI::SSN::XgmmlReader::IdList - Perl utility module for extracting
network and metanode information from XGMML files



SYNOPSIS
--------

::

use EFI::SSN::XgmmlReader::IdList;

my $parser = EFI::SSN::XgmmlReader::IdList->new(xgmml_file => $ssnFile);
$parser->parse();

my $metanodeType = $parser->getMetanodeType();
my $metanodeSizes = $parser->getMetanodeSizes();
my $metanodeMap = $parser->getMetanodes();
print "Network ID type: $metanodeType\n"; # uniprot, uniref90, uniref50, repnode
if ($metanodeType ne "uniprot") {
foreach my $metanode (sort keys %$metanodeMap) {
map {
print join("\t", $metanode,
$metanodeSizes->{$_},
$_);
print "\n";
} @{ $metanodeMap->{$metanode} };
}
}



DESCRIPTION
-----------

**EFI::SSN::XgmmlReader::IdList** is a Perl module for parsing XGMML
(XML format files). It extends the functionality of
**EFI::SSN::XgmmlReader** by additionally parsing metanode identifying
information from the network; metanodes are SSN nodes that represent
multiple sequences. There are two types: UniRef and RepNode metanodes.
This module also retains information that maps a metanode ID (sequence
ID) to the sequence IDs inside the ID. The metanode ID is correlated to
the node index. **EFI::Annotations** is used to get a list of SSN field
names that represent metanode ID data, which determine which node
attribute is being processed. See **EFI::SSN::XgmmlReader** for methods
for parsing and obtaining network information



METHODS
-------



``getMetanodeType()``
~~~~~~~~~~~~~~~~~~~~~

Gets the type of the metanodes in the network.



Returns
^^^^^^^

One of ``uniprot``, ``uniref90``, ``uniref50``, ``repnode``



Example Usage
^^^^^^^^^^^^^

::

my $metanodeType = $parser->getMetanodeType();
print "Network ID type: $metanodeType\n"; # uniprot, uniref90, uniref50, repnode



``getMetanodeSizes()``
~~~~~~~~~~~~~~~~~~~~~~

Gets the sizes of the metanodes in the network.



Returns
^^^^^^^

A hash ref that maps metanode sequence ID to the number of sequences
contained in the metanode. If the network is a UniProt network then this
hash is empty.



Example Usage
^^^^^^^^^^^^^

::

my $metanodeSizes = $parser->getMetanodeSizes();



``getMetanodes()``
~~~~~~~~~~~~~~~~~~

Gets metanodes from the network.



Returns
^^^^^^^

A hash ref that maps metanode sequence ID (the metanode is the XGMML
node in the SSN) to a list of sequence IDs that the metanode represents.
If the network is a UniProt network then this hash is empty.



Example Usage
^^^^^^^^^^^^^

::

my $metanodeMap = $parser->getMetanodes();
foreach my $metanode (sort keys %$metanodeMap) {
map { print join("\t", $metanode, $_), "\n"; } @{ $metanodeMap->{$metanode} };
}



``getMetadata()``
~~~~~~~~~~~~~~~~~

Gets the metadata (node attributes) that is saved during parsing
(currently only SwissProt description). This is primarily used in the
case that the network is UniProt; in that case the EFI database is not
queried to obtain metadata information. If the network is UniRef, then
the database is queried and the SwissProt information from the queries
is used instead of the saved node attribute.



Returns
^^^^^^^

A hash ref with keys being the sequence ID (metanode ID), with each
value being another hash ref with each saved node attribute. Currently
the ``swissprot`` and ``sequence`` hash ref keys are supported. Only
sequence IDs with attribute values are in the hash ref. The ``sequence``
key will only be present if a protein sequence was included; this is
used when unidentified sequences are included in the analysis.

::

{
"UNIPROT_ID" => {
"swissprot" => "Description",
"sequence" => "ABC"
},
"UNIPROT_ID2" => {},
"UNIPROT_ID3" => {
"swissprot" => "Description"
}
}



Example Usage
^^^^^^^^^^^^^

::

my $metadata = $parser->getMetadata();
foreach my $id (keys %$metadata) {
foreach my $md (keys %{ $metadata->{$id} }) {
print "$id\t$md\t$metadata->{$id}->{$md}\n";
}
}
Loading