Skip to content

Commit

Permalink
update documentation related to JSON serialisation of events includin…
Browse files Browse the repository at this point in the history
…g dataflow
  • Loading branch information
ens-bwalts committed Sep 14, 2021
1 parent 496c752 commit 16de266
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 2 deletions.
13 changes: 13 additions & 0 deletions docs/advanced_usage/json.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Serialising dataflow events with JSON
=====================================

There are some facilities to support serialisaiton of events such as dataflow as JSON files or streams.

The :ref:`Runnable API <runnable_api_dataflows>` provides a method
``dataflow_output_ids_from_json($filename, $default_branch)`` to read a set of paramaters (output IDs)
serialised as JSON from a flat file.

Additionally, eHive uses JSON serialisation to interface Runnables written in guest languages (such as Python)
with Workers. This is handled by, and documented in, ``Bio::EnsEMBL::Hive::GuestProcess``. This could serve
as an example for advanced users wishing to construct infrastructure to transmit events between eHive
and other systems.
2 changes: 1 addition & 1 deletion docs/creating_runnables/runnable_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -137,4 +137,4 @@ to easily generate events. The method takes two arguments:
the default branch number.
#. The default branch number (defaults to 1).


Use of this is demonstrated in the Runnable :doxehive:`Bio::EnsEMBL::Hive::RunnableDB::SystemCmd`
25 changes: 24 additions & 1 deletion modules/Bio/EnsEMBL/Hive/RunnableDB/SystemCmd.pm
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,19 @@
The command line must be stored in the parameters() as the value corresponding to the 'cmd' key.
It allows to pass in other parameters and use the parameter substitution mechanism in its full glory.
=head1 CONFIGURATION EXAMPLE
This Runnable also allows the creation of dataflow using JSON stored in an external file.
Each line of this file contains an optional branch number, followed by a complete JSON serialisation of the parameters (output_id)
appearing on the same single line. For example, a line to direct dataflow on branch 2 might look like:
2 {"parameter_name" : "parameter_value"}
If no branch number is provided, then dataflow of those parameters will occour on the branch number
passed to SystemCmd in the 'dataflow_branch' parameter, if given. Otherwise, it will default to
branch 1 (autoflow).
A sample file is provided at ${EHIVE_ROOT_DIR}/modules/Bio/EnsEMBL/Hive/Examples/SystemCmd/PipeConfig/sample_files/Inject_JSON_Dataflow_example.json
=head1 CONFIGURATION EXAMPLES
# The following example shows how to configure SystemCmd in a PipeConfig module
# to create a MySQL snapshot of the Hive database before executing a critical operation.
Expand All @@ -31,6 +43,17 @@
},
},
# The following example shows how to configure SystemCmd in a PipeConfig module
# to generate dataflow events based on parameters stored as JSON in a file named "some_parameters.json"
{ -logic_name => 'inject_parameters_from_file',
-module => 'Bio::EnsEMBL::Hive::RunnableDB::SystemCmd',
-parameters => {
'dataflow_file' => 'some_parameters.json',
'cmd' => 'sleep 0', # a command must be provided in the cmd parameter
},
},
=head1 LICENSE
Copyright [1999-2015] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute
Expand Down

0 comments on commit 16de266

Please sign in to comment.