biojava · sbliven · Jul 22, 2015 · Jul 10, 2014 · Jul 21, 2015 · Jul 21, 2015
diff --git a/README.md b/README.md
@@ -6,7 +6,7 @@ A brief introduction into [BioJava](https://github.com/biojava/biojava).
 
 The goal of this tutorial is to provide an educational introduction into some of the features that are provided by BioJava. 
 
-At the moment this tutorial is still under development. Please check  the [BioJava Cookbook](http://biojava.org/wiki/BioJava:CookBook3.0) for a more comprehensive collection of many examples of what is possible with BioJava and how to do things.
+At the moment this tutorial is still under development. Please check  the [BioJava Cookbook](http://biojava.org/wiki/BioJava:CookBook3.0) for a more comprehensive collection of examples about what is possible with BioJava and how to do things.
 
 ## Index
 
@@ -16,10 +16,9 @@ Book 1: [The Core module](core/README.md), basic working with sequences.
 
 Book 2: [The Alignment module](alignment/README.md), pairwise and multiple alignments of protein sequences.
 
-Book 3: [The Protein Structure modules](structure/README.md), everything related to working with 3D structures.
-
-Book 4: [The Genomics Module](genomics/README.md), working with genomic data
+Book 3: [The Structure modules](structure/README.md), everything related to working with 3D structures.
 
+Book 4: [The Genomics Module](genomics/README.md), working with genomic data.
 
 ## License
 

diff --git a/alignment/README.md b/alignment/README.md
@@ -63,4 +63,4 @@ Navigation:
 
 Prev: [Book 1: The Core module](../core/README.md)
 
-Next: [Book 3: The Protein Structure modules](../structure/README.md)
+Next: [Book 3: The Structure modules](../structure/README.md)
diff --git a/bin/update_index.py b/bin/update_index.py
@@ -110,7 +110,7 @@ def makefooter(self):
             name = p.makename()
             # Get a path to p relative to our own path
             link = os.path.relpath(p.rootlink(),os.path.dirname(self.rootlink()))
-            linkmd.append("[{}]({})".format(name,link))
+            linkmd.append("[{0}]({1})".format(name,link))
             p = p.parent
         linkmd.reverse()
         lines.append("\n| ".join(linkmd))
@@ -123,13 +123,13 @@ def makefooter(self):
                 prev = self.parent.children[pos-1]
                 name = prev.makename()
                 link = os.path.relpath(prev.rootlink(),os.path.dirname(self.rootlink()))
-                lines.append("Prev: [{}]({})".format(name,link))
+                lines.append("Prev: [{0}]({1})".format(name,link))
                 lines.append("")
             if pos < len(self.parent.children)-1:
                 next = self.parent.children[pos+1]
                 name = next.makename()
                 link = os.path.relpath(next.rootlink(),os.path.dirname(self.rootlink()))
-                lines.append("Next: [{}]({})".format(name,link))
+                lines.append("Next: [{0}]({1})".format(name,link))
                 lines.append("")
 
         #lines.append(self.makename()+", "+self.link)
@@ -162,7 +162,7 @@ def __repr__(self):
 
     # Output tree
     def pr(node,indent=""):
-        print "{}{}".format(indent,node.link,node.rootlink())
+        print "{0}{1}".format(indent,node.link,node.rootlink())
         for n in node.children:
             pr(n,indent+"  ")
 

diff --git a/genomics/README.md b/genomics/README.md
@@ -64,4 +64,4 @@ Navigation:
 [Home](../README.md)
 | Book 4: The Genomics Module
 
-Prev: [Book 3: The Protein Structure modules](../structure/README.md)
+Prev: [Book 3: The Structure modules](../structure/README.md)
diff --git a/structure/README.md b/structure/README.md
@@ -1,7 +1,7 @@
-The Protein Structure Modules of BioJava
+The Structure Modules of BioJava
 =====================================================
 
-A tutorial for the protein structure modules of [BioJava](http://www.biojava.org)
+A tutorial for the structure modules of [BioJava](http://www.biojava.org)
 
 ## About
 <table>
@@ -32,35 +32,35 @@ Chapter 1 - Quick [Installation](installation.md)
 
 Chapter 2 - [First Steps](firststeps.md)
 
-Chapter 3 - The [data model](structure-data-model.md) for the representation of macromolecular structures.
+Chapter 3 - The [Structure Data Model](structure-data-model.md), for the representation of macromolecular structures
 
-Chapter 4 - [Local installations](caching.md) of PDB
+Chapter 4 - [Local Installations](caching.md) of PDB
 
 Chapter 5 - The [Chemical Component Dictionary](chemcomp.md)
 
-Chapter 6 - How to [work with mmCIF/PDBx files](mmcif.md)
+Chapter 6 - How to [Work with mmCIF/PDBx Files](mmcif.md)
 
-Chapter 7 - [SEQRES and ATOM records](seqres.md), mapping to Uniprot (SIFTs)
+Chapter 7 - [SEQRES and ATOM Records](seqres.md), mapping to Uniprot (SIFTs)
 
-Chapter 8 - Protein [Structure Alignments](alignment.md)
+Chapter 8 - [Structure Alignments](alignment.md)
 
 Chapter 9 - [Biological Assemblies](bioassembly.md)
 
 Chapter 10 - [External Databases](externaldb.md) like SCOP &amp; CATH
 
 Chapter 11 - [Accessible Surface Areas](asa.md)
 
-Chapter 12 - [Contacts within a chain and between chains](contact-map.md)
+Chapter 12 - [Contacts Within a Chain and between Chains](contact-map.md)
 
-Chapter 13 - Finding all interfaces in crystal: [crystal contacts](crystal-contacts.md)
+Chapter 13 - Finding all Interfaces in Crystal: [Crystal Contacts](crystal-contacts.md)
 
 Chapter 14 - Protein Symmetry
 
 Chapter 15 - Bonds
 
 Chapter 16 - [Special Cases](special.md)
 
-Chapter 17 - [Lists](lists.md) of PDB IDs and PDB [status information](lists.md).
+Chapter 17 - [Lists](lists.md) of PDB IDs and PDB [Status Information](lists.md)
 
 
 ### Author: 
@@ -88,7 +88,7 @@ The content of this tutorial is available under the [CC-BY](http://creativecommo
 
 Navigation:
 [Home](../README.md)
-| Book 3: The Protein Structure modules
+| Book 3: The Structure modules
 
 Prev: [Book 2: The Alignment module](../alignment/README.md)
 

diff --git a/structure/alignment-data-model.md b/structure/alignment-data-model.md
@@ -0,0 +1,229 @@
+Structure Alignment Data Model
+===
+
+## AFPChain Data Model
+
+The `AFPChain` data structure was designed to store pairwise structural
+alignments. The class functions as a bean, and contains many variables 
+used internally by the alignment algorithms implemented in biojava.
+
+Some of the important stored variables are:
+* Algorithm Name
+* Optimal Alignment: described later.
+* Optimal RMSD: final and total RMSD value of the alignment.
+* TM-score
+* BlockRotationMatrix: rotation component of the superposition transformation.
+* BlockShiftVector: translation component of the superposition transformation.
+
+BioJava class: [org.biojava.bio.structure.align.model.AFPChain](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/model/AFPChain.html)
+
+### The Optimal Alignment
+
+The residue equivalencies of the alignment (EQRs) are described in the optimal 
+alignment variable, a triple array of integers, where the indices stand for:
+
+```java
+  int[][][] optAln = afpChain.getOptAln();
+  int residue = optAln[block][chain][eqr];
+```
+
+* **block**: the blocks divide the alignment into different parts. The 
+division can be due to non-topological rearrangements (e.g. circular 
+permutations) or due to flexible parts (e.g. domain switch). There can 
+be any number of blocks in a structural alignment, defined by the structure 
+alignment algorithm.
+* **chain**: in a pairwise alignment there are only two chains, or structures.
+* **eqr**: EQR stands for equivalent residue position, i.e. the alignment 
+position. There are as many positions (EQRs) in a block as the length of 
+the alignment block, and their number is equal for any of the two chains in 
+the same block.
+
+In each entry (combination of the three indices described above) an integer 
+is stored, which corresponds to the residue index in the specified chain, i.e.
+the index in the Atom array of the chain. In between the same block, the stored
+integers (residues) are always in increasing order.
+
+### Examples
+
+Some examples of how to get the basic properties of an `AFPChain`:
+
+```java
+  afpChain.getAlgorithmName();          //Name of the algorithm that generated the alignment
+  afpChain.getBlockNum();               //Number of blocks
+  afpChain.getTMScore();                //TM-score
+  afpChain.getTotalRmsdOpt()            //Optimal RMSD 
+  afpChain.getBlockRotationMatrix()[0]  //get the rotation matrix of the first block
+  afpChain.getBlockShiftVector()[0]     //get the translation vector of the first block
+```
+
+### Overview
+
+As an overview, the `AFPChain` data model:
+
+* Only supports **pairwise alignments**, i.e. two chains or structures aligned.
+* Can support **flexible alignments** and **non-topological alignments**. 
+However, their combinatation (a flexible alignment with topological rearrangements) 
+can not be represented, because the blocks mean either one or the other. 
+* Can not support **non-sequential alignments**, or they would require a new block 
+for each EQR, because sequentiality of the residues is assumed inside each block.
+
+## MultipleAlignment Data Model
+
+Since BioJava 4.1.0, a new data model is available to store structure alignments.
+The `MultipleAlignment` data structure is a general model that supports any of the 
+following properties, and any combination:
+
+* **Multiple structures**: the model is no longer restricted to pairwise alignments.
+* **Non-topological alignments**: such as circular permutations or domain rearrangements.
+* **Flexible alignments**: parts of the alignment with different superposition 
+transformation.
+
+In addtition, the data structure is not limited in the number and types of scores
+it can store, because the scores are stored in a key:value fashion, as it will be
+described later.
+
+BioJava class: [org.biojava.bio.structure.align.multiple.MultipleAlignment](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/multiple/MultipleAlignment.html)
+
+### Object Hierarchy
+
+The biggest difference with `AFPChain` is that the `MultipleAlignment` data 
+structure is object oriented.
+The hierarchy of sub-objects is represented below:
+
+<pre>
+MultipleAlignmentEnsemble
+   |
+   MultipleAlignment(s)
+        |
+        BlockSet(s)
+            |
+             Block(s)
+</pre>
+
+* **MultipleAlignmentEnsemble**: the ensemble is the top level of the hierarchy.
+As a top level, it stores information regarding creation properties (algorithm,
+version, creation time, etc.), the structures involved in the alignment (Atoms,
+structure identifiers, etc.) and cached variables (atomic distance matrices). 
+It contains a collection of `MultipleAlignment` that share the same properties 
+stored in the ensemble. This construction allows the storage of alternative 
+alignments inside the same data structure.
+
+* **MultipleAlignment**: the `MultipleAlignment` stores the core information of a 
+multiple structure alignment. It is designed to be the return type of the multiple
+structure alignment algorithms. The object contains a collection of `BlockSet` and 
+it is linked to its parent `MultipleAlignmentEnsemble`.
+
+* **BlockSet**: the `BlockSet` stores a flexible part of a multiple structure 
+alignment. A flexible part needs the residue equivalencies involved, contained in
+a collection of `Block`, and a transformation matrix for every structure that 
+describes the 3D superposition of all structures. It is linked to its parent
+`MultipleAlignment`.
+
+* **Block**: the `Block` stores the aligned positions (equivalent residues) of a 
+`BlockSet` that are in sequentially increasing order. Each `Block` represents a 
+sequential part of a non-topological alignment, if more than one `Block` is present.
+It is linked to its parent `BlockSet`.
+
+### The Optimal Alignment
+
+In the `MultipleAlignment` data structure the aligned residues are stored in a
+double List for every `Block`. The indices of the double List are the following:
+
+```java
+  List<List<Integer>> optAln = block.getAlnRes();
+  Integer residue = optAln.get(chain).get(eqr);
+```
+
+The indices mean the same as in the optimal alignment of the `AFPChain`, just to
+remember them:
+
+* **chain**: chain or structure index.
+* **eqr**: EQR stands for equivalent residue position, i.e. the alignment 
+position. There are as many positions (EQRs) in a block as the length of 
+the alignment block, and their number is equal for any of the chains in 
+the same block.
+
+As in `AFPChain`, each entry (combination of the two indices described above) 
+is an Integer that corresponds to the residue index in the specified chain, i.e.
+the index in the Atom array of the chain. Caution has to be taken in the code,
+because a `MultipleAlignment` can contain gaps, which are represented as `null`
+in the List entries.
+
+### Alignment Scores
+
+All the objects in the hierarchy levels implement the `ScoresCache` interface.
+This interface allows the storage of any number of scores as a key:value set.
+The key is a `String` that describes the score and used to recover it after,
+and the value is a double with the calculated score. The interface has only 
+two methods: putScore and getScore.
+
+The following lines of code are an example on how to do score manipulations
+on a `MultipleAlignment`:
+
+```java
+  //Put a score into the alignment and get it back
+  alignment.putScore('myRMSD', 1.234);
+  double myRMSD = alignment.getScore('myRMSD');
+
+  BlockSet bs = alignment.getBlockSets().get(0);
+  //The same can be done for BlockSets
+  alignment.putScore('bsRMSD', 1.234);
+  double bsRMSD = alignment.getScore('bsRMSD');
+```
+
+### Manipulating Multiple Alignments
+
+Some classes are designed to contain utility methods for manipulating a `MultipleAlignment` object.
+The most important ones are ennumerated and briefly described below:
+
+* [MultipleAlignmentScorer](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/multiple/util/MultipleAlignmentScorer.html): contains frequent names for scores and methods to calculate them.
+
+* [MultipleAlignmentTools](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/multiple/util/MultipleAlignmentTools.html): contains helper methods, such as sequence alignment calculation, transform atom arrays of the structures or calculate aligned residue distances between all structures.
+
+* [MultipleAlignmentWriter](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/multiple/util/MultipleAlignmentWriter.html): contains methods to generate different types of String outputs of the alignment, e.g. FASTA, XML, FatCat.
+
+* [MultipleSuperimposer](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/multiple/util/MultipleSuperimposer.html): interface for implementations that calculate the structure superpositions of the alignment. Some examples of implementations are the ReferenceSuperimposer (superimposes all the structures to a reference) and the CoreSuperimposer (only uses EQRs present in all structures, without gaps, to superimpose them).
+
+* [MultipleAlignmentXMLParser](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/xml/MultipleAlignmentXMLParser.html): contains a method to create a `MultipleAlignment` object from an XML file representation.
+
+### Overview
+
+As an overview, the `MultipleAlignment` data model:
+
+* Supports any number of aligned structures, **multiple structures**.
+* Can support **flexible alignments** and **non-topological alignments**,
+and any of their combinatations (e.g. a flexible alignment with topological 
+rearrangements).
+* Can not support **non-sequential alignments**, or they would require a new 
+`Block` for each EQR, because sequentiality of the residues is a requirement
+for each `Block`.
+* Can store **any score** in any of the four object hierarchy level, making it
+easy to adapt to new requirements and algorithms.
+
+For more examples and information about the `MultipleAlignment` data structure 
+go to the Demo package on the biojava-structure module or look through the interface 
+files, where the javadoc explanations can be found.
+
+## Conversion between Data Models
+
+The conversion from an `AFPChain` to a `MultipleAlignment` is possible trough the
+ensemble constructor. An example on how to do it programatically is below:
+
+```java
+  AFPChain afpChain;
+  Atom[] chain1;
+  Atom[] chain2;
+  boolean flexible = false;
+  MultipleAlignmentEnsemble ensemble = new MultipleAlignmentEnsemble(afpChain, chain1, chain2, false);
+  MultipleAlignment converted = ensemble.getMultipleAlignments().get(0);
+```
+
+There is no method to convert from a `MultipleAlignment` to an `AFPChain`, because
+the first representation supports any number of structures, while the second is 
+only supporting pairwise alignments. However, the conversion can be done with some
+lines of code if needed (instantiate a new `AFPChain` and copy one by one the 
+properties that can be represented from the `MultipleAlignment`.
+
+===
+
+Go back to [Chapter 8 : Structure Alignments](alignment.md).
Original file line number	Diff line number	Diff line change
Expand Up		@@ -63,4 +63,4 @@ Navigation:

		Prev: [Book 1: The Core module](../core/README.md)

		Next: [Book 3: The Protein Structure modules](../structure/README.md)
		Next: [Book 3: The Structure modules](../structure/README.md)