Skip to content
Laurent Jourdren edited this page Mar 26, 2015 · 2 revisions

Writing a SAM filter

WARNING: This documentation is outdated and will soon be updated.

Eoulsan defines a ReadAlignmentsFilter interface that allows to filter SAM alignments.

Using ReadAlignmentsFilter

In the following example, we use the RemoveUnmappedReadAlignmentsFilter that removes SAM alignments not aligned against the genome.


SAMFileReader reader = new SAMFileReader(new File("input.sam"));

// creation of the filter and set its parameters
ReadAlignmentsFilter filter = new RemoveUnmappedReadAlignmentsFilter();
filter.setParameter("remove", "true");

// creation of the ReadAlignmentsFilterBuffer object
ReadAlignmentsFilterBuffer rafb = new ReadAlignmentsFilterBuffer(filter);

List<SAMRecord> records = new ArrayList<SAMRecord>();

// SAM file reading
for (SAMRecord samRecord : reader) {

  // the current alignment is not on the same read as the previous alignment
  if (!rafb.addAlignment(samRecord)) {

    // all the alignments that are on the previous read are filtered...
    records.clear();
    records.addAll(rafb.getFilteredAlignments()); // buffer emptied
  
    // ... and written
    for (SAMRecord r : records) {
      System.out.println(r.getSAMString());
    }
  
    // now the current alignment is added to the buffer
    rafb.addAlignment(samRecord);
  }

}

// treatment of the last alignment of the SAM file
records.clear();
records.addAll(rafb.getFilteredAlignments());

for (SAMRecord r : records) {
System.out.println(r.getSAMString());
}

reader.close();

For users, the treatment of single-end and paired-end reads is the same one, thanks to the addAlignment() function in the ReadAlignmentsFilterBuffer class.

How does the ReadAlignmentsFilterBuffer work

This object is a buffer of alignments from the same read.

There are two major methods in this class:

  • addAlignment() takes a SAMRecord (alignment) in argument and returns a boolean. It is true if the SAMRecord has the same read name as other alignments already stored or if the buffer is empty, and false in other cases.
  • getFilteredAlignments() takes no argument and returns the list of SAMRecord in the buffer that pass the filter.

Combining multiple ReadAlignmentsFilter

Several filters can be executed with nearly the same code as above, using the MultiReadAlignmentsFilter class:


SAMFileReader reader = new SAMFileReader(new File("input.sam"));
  
// creation of a filter list
List<ReadAlignmentsFilter> filterList = new ArrayList<ReadAlignmentsFilter>();
// filters are added to the list
filterList.add(new RemoveUnmappedReadAlignmentsFilter());
filterList.add(new RemoveMultiMatchesReadAlignmentsFilter());
// set filter parameters
filterList.get(0).setParameter("remove", "true");
filterList.get(1).setParameter("remove", "true");

// creation of the MultiReadAlignmentsFilter object
ReadAlignmentsFilter filter = new MultiReadAlignmentsFilter(filterList);

// creation of the ReadAlignmentsFilterBuffer object
ReadAlignmentsFilterBuffer rafb = new ReadAlignmentsFilterBuffer(filter);
List<SAMRecord> records = new ArrayList<SAMRecord>();

// SAM file reading
for (SAMRecord samRecord : reader) {

  // the current alignment is not on the same read as the previous alignment
  if (!rafb.addAlignment(samRecord)) {
    
    // all the alignments that are on the previous read are filtered...
    records.clear();
    records.addAll(rafb.getFilteredAlignments());
    
    // ... and written
    for (SAMRecord r : records) {
      System.out.println(r.getSAMString());
    }
    
    // now the current alignment is added to the buffer
    rafb.addAlignment(samRecord);
  }
}
  
// treatment of the last record
records.clear();
records.addAll(rafb.getFilteredAlignments());

for (SAMRecord r : records) {
  System.out.println(r.getSAMString());
}

reader.close();

Available ReadAlignmentsFilter implementations

The following ReadAlignmentsFilter implementations are available:

  • RemoveUnmappedReadAlignmentsFilter removes all unmapped alignments,
  • RemoveMultiMatchesReadAlignmentsFilter removes all multi-match alignments,
  • QualityReadAlignmentsFilter removes alignments with a bad mapping quality score,
  • KeepOneMatchReadAlignmentsFilter keeps, in case of a multi-match alignments, the first alignment of the read,
  • KeepNumberMatchReadAlignmentsFilter keeps, in case of a multi-match alignments, a given number of alignments of the read,
  • DistanceFromReferenceReadAlignmentsFilter removes alignments which have a distance too long from the reference sequence.

Writing a plug-in for the filtersam step of Eoulsan

It is very easy to write a plug-in for the filtersam step of Eoulsan. In this section we will write a MyQualityReadFilter class that filters on the mapping quality score.

  • First add getName() and getDescription() methods to your new filter:

package com.example;

public class MyQualityReadAlignmentsFilter extends AbstractReadAlignmentsFilter {

  @Override
  public String getName() {

    return "mymappingquality";
  }

  @Override
  public String getDescription() {
    return "My mapping quality threshold ReadAlignmentFilter";
  }

}

  • Then add setParameter method that allow to configure our filter:

private double qualityThreshold = -1.0;

@Override
  public void setParameter(final String key, final String value)
      throws EoulsanException {

    if (key == null || value == null)
      return;

    if ("threshold".equals(key.trim())) {

      try {
        this.qualityThreshold = Double.parseDouble(value.trim());
      } catch (NumberFormatException e) {
        return;
      }

      if (this.qualityThreshold < 0.0)
        throw new EoulsanException("Invalid qualityThreshold: "
            + qualityThreshold);
    } else

      throw new EoulsanException("Unknown parameter for "
          + getName() + " alignment filter: " + key);

  }
  • And an init() method to initialize the plug-in once all the parameters has been set. Here for our example, if no threshold has been set throw an exception.

  @Override
  public void init() {

    if (this.qualityThreshold < 0.0)
      throw new IllegalArgumentException("Mapping quality threshold is not set for "
          + getName() + " alignment filter.");
  }

  • Now we can add the filterReadAlignments() method:

@Override
  public boolean filterReadAlignments(final List<SAMRecord> records) {

  if (records == null)
    return;

  // single-end mode
  if (!records.get(0).getReadPairedFlag()) {
    for (SAMRecord r : records) {

      // storage in 'result' of records that do not pass the quality filter
      if (r.getMappingQuality() < this.qualityThreshold) {
        this.result.add(r);
      }
    }
  }

  // paired-end mode
  else {
    for (int counterRecord = 0; counterRecord < records.size() - 1; counterRecord +=
        2) {

      // storage in 'result' of records that do not pass the quality filter
      if (records.get(counterRecord).getMappingQuality() < this.qualityThreshold
          || records.get(counterRecord + 1).getMappingQuality() < 
          this.qualityThreshold) {

        // records are stored 2 by 2 because of the paired-end mode
        this.result.add(records.get(counterRecord));
        this.result.add(records.get(counterRecord + 1));
      }
    }
  }

  // all records that do not pass the mapping quality filter are removed
  records.removeAll(result);
  result.clear();

}

  • Now our ReadAlignmentsFilter can compile and can be used in a standalone program but not as a filtersam plug-in. To enable our ReadAlignmentsFilter as a plug-in we must register it by adding the full name of the class in the fr.ens.transcriptome.eoulsan.bio.alignmentsfilters.ReadAlignmentsFilter text file in the META-INF/services directory. See the Writing Step Plugin for more information.