Skip to content
Laurent Jourdren edited this page Mar 2, 2016 · 4 revisions

This page presents the Eoulsan code organization that can be useful for a Eoulsan plugin developer. This page don't show information about internal organisation of Eoulsan like the workflow engine.

Like any Java project, the Eoulsan code is divided in packages. The Eoulsan package can be gathered in several categories:

Common packages

  • fr.ens.biologie.genomique.eoulsan: the main Eoulsan package.

  • Main: The main class of Eoulsan, store the command line arguments

  • MainHadoop: Main class when launch hadoop -jar

  • MainCLI: Main class in other cases

  • EoulsanRuntime: Eoulsan Runtime class to handle temporary directories, create input and output streams

  • Globals: Main constants of Eoulsan

  • Settings: Global parameters/configuration of Eoulsan or workflow

  • EoulsanLogger: Logger class that allow to store log messages in the right log file

  • EoulsanError, EoulsanException and EoulsanRuntimeException: Eoulsan exception classes

  • fr.ens.biologie.genomique.eoulsan.util: Util packages

  • FileUtils: This class contains file utility methods

  • ProcessUtils: This class contains utility methods to launch external process

  • StringsUtils: This class contains String utility methods

  • SystemUtils: This class contains utility methods to get information about the OS

  • FileUtils: This class contains file utility methods

  • Utils: This class contains file utility methods to manage Collection and check preconditions

  • Version: This class allow to parse a version string

  • fr.ens.biologie.genomique.eoulsan.util.cloud: Utility package dedicated to cloud computing

  • fr.ens.biologie.genomique.eoulsan.util.docker: Utility package dedicated to Docker

  • fr.ens.biologie.genomique.eoulsan.util.hadoop: Utility package dedicated to Hadoop

  • fr.ens.biologie.genomique.eoulsan.util.locker: Utility package that contains lockers to avoid that several read mapping tasks runs at the same time on a Hadoop mode

  • fr.ens.biologie.genomique.eoulsan.util.r: Utility package dedicated to R

Bio packages

  • fr.ens.biologie.genomique.eoulsan.bio:
  • Alphabets: Define RNA and DNA alphabets
  • FastqFormat: Enum that handle all the FASTQ formats and conversion
  • GenomeDescription: Store Genome information (chromosomes names and sizes, checksum)
  • GenomicArray: Class used by Java HTSeq-count implementation
  • GenomicInterval: Genomic interval
  • GFFEntry: GFF3 entry
  • IlluminaReadId: Illumina FASTQ Id parser
  • ReadSequence: FASTQ entry
  • Sequence: FASTQ entry
  • fr.ens.biologie.genomique.eoulsan.bio.readsmappers: Classes to map FASTQ files using mappers (e.g. Bowtie, STAR...)
  • fr.ens.biologie.genomique.eoulsan.bio.io: Classes to read and write bio object
  • fr.ens.biologie.genomique.eoulsan.bio.io.hadoop: Classes to read and write bio object for hadoop map/reduce tasks
  • fr.ens.biologie.genomique.eoulsan.bio.alignmentsfilters: Classes to filter SAM entries
  • fr.ens.biologie.genomique.eoulsan.bio.expressioncounters: Classes to count expression (e.g. HTSeq-count Java implementation)
  • fr.ens.biologie.genomique.eoulsan.bio.readsfilters: Classes to filter FASTQ entries

Module packages

Modules bundled in Eouslsan are gathered in the fr.ens.biologie.genomique.eoulsan.module.* packages:

  • fr.ens.biologie.genomique.eoulsan.module: This package contains core module that are not related to a bioinformatic task (e.g. import, design, merger, splitter...)
  • fr.ens.biologie.genomique.eoulsan.module.diffana: Normalization and differential analysis module package
  • fr.ens.biologie.genomique.eoulsan.module.expression: Expression computation module package (e.g. HTSeq-count)
  • fr.ens.biologie.genomique.eoulsan.module.expression.local: Expression computation module package, local mode implementation
  • fr.ens.biologie.genomique.eoulsan.module.expression.hadoop: Expression computation module package, Hadoop mode implementation
  • fr.ens.biologie.genomique.eoulsan.module.fastqc: FastQC module package
  • fr.ens.biologie.genomique.eoulsan.module.generators: Generator packages
  • fr.ens.biologie.genomique.eoulsan.module.mapping: FASTQ mapping package
  • fr.ens.biologie.genomique.eoulsan.module.mapping.local: FASTQ mapping package, local mode implementation
  • fr.ens.biologie.genomique.eoulsan.module.mapping.hadoop: FASTQ mapping package, Hadoop mode implementation

The module classes communicate with the workflow engine using the classes of the fr.ens.biologie.genomique.eoulsan.core package:

  • FileNaming: Class that define the name of the output files of the workflow
  • InputPort: Class that define an input port of a module
  • InputPorts: Class that define all the input ports of a module
  • InputPortsBuilder: Helper class that simplify the creation of input ports
  • OutputPort: Class that define an output port of a module
  • OutputPorts: Class that define all the output ports of a module
  • OutputPortsBuilder: Helper class that simplify the creation of output ports
  • Module: Interface that define a module
  • Modules: Class that contains utility methods to print/throw warning/exception if a module parameter is deprecated/renamed...
  • Parameter: This class define a module parameter
  • Step: This interface define a Step (= module + parameters + connected ports)
  • StepConfigurationContext: Step configuration interface used when configuring a module
  • TaskContext: Context created when executing a task
  • TaskStatus: Interface used by a module to communicate with the workflow when executing a task
  • TaskResult: Interface used to define the result of a task
  • Workflow: This interface contains information about the workflow like all the steps of the workflow

The module classes must not use the related classes to the internal implementation of the workflow engine located in the following packages because the API may change:

  • fr.ens.biologie.genomique.eoulsan.core.workflow: Internal classes of the workflow engine
  • fr.ens.biologie.genomique.eoulsan.core.schedulers: Eoulsan scheduler classes
  • fr.ens.biologie.genomique.eoulsan.core.schedulers.cluster: Job scheduler classes (e.g. SLURM, TORQUE)

Other useful packages

  • fr.ens.biologie.genomique.eoulsan.translators: This package contains classes managing additional annotation (e.g. biomart) and generation XLSX or ODS files with links to websites.
  • fr.ens.biologie.genomique.eoulsan.splitermergers: This packages contains classes that allow or split and merge some file formats. It is very useful when using Eoulsan in clusterexec mode.
  • fr.ens.biologie.genomique.eoulsan.requirements: This package define a module requirement (e.g. Docker image, executable or Rserve server)
  • fr.ens.biologie.genomique.eoulsan.actions: This package contains all the Eoulsan action classes (e.g. exec, clusterexec, createdesign...)