Skip to content

Commit

Permalink
Merge branch 'version/2.5' into version/2.6
Browse files Browse the repository at this point in the history
* version/2.5:
  update documentation related to JSON serialisation of events including dataflow
  Create example PipeConfig using SystemCmd to create dataflow from a JSON file
  • Loading branch information
ens-bwalts committed Oct 5, 2021
2 parents e44f628 + 43f849a commit 1c3a4c1
Show file tree
Hide file tree
Showing 6 changed files with 131 additions and 2 deletions.
13 changes: 13 additions & 0 deletions docs/advanced_usage/json.rst
@@ -0,0 +1,13 @@
Serialising dataflow events with JSON
=====================================

There are some facilities to support serialisaiton of events such as dataflow as JSON files or streams.

The :ref:`Runnable API <runnable_api_dataflows>` provides a method
``dataflow_output_ids_from_json($filename, $default_branch)`` to read a set of paramaters (output IDs)
serialised as JSON from a flat file.

Additionally, eHive uses JSON serialisation to interface Runnables written in guest languages (such as Python)
with Workers. This is handled by, and documented in, ``Bio::EnsEMBL::Hive::GuestProcess``. This could serve
as an example for advanced users wishing to construct infrastructure to transmit events between eHive
and other systems.
2 changes: 1 addition & 1 deletion docs/creating_runnables/runnable_api.rst
Expand Up @@ -138,4 +138,4 @@ to easily generate events. The method takes two arguments:
the default branch number.
#. The default branch number (defaults to 1).


Use of this is demonstrated in the Runnable :doxehive:`Bio::EnsEMBL::Hive::RunnableDB::SystemCmd`
1 change: 1 addition & 0 deletions docs/index.rst
Expand Up @@ -74,6 +74,7 @@ User documentation
advanced_usage/mpi
advanced_usage/slack
advanced_usage/continuous_pipelines
advanced_usage/json

.. toctree::
:caption: External plugins
Expand Down
@@ -0,0 +1,89 @@
=pod
=head1 NAME
Bio::EnsEMBL::Hive::Examples::SystemCmd::PipeConfig::InjectJSONDataflow_conf
=head1 SYNOPSIS
init_pipeline.pl Bio::EnsEMBL::Hive::Examples::SystemCmd::PipeConfig::InjectJSONDataflow_conf --pipeline_url $HIVE_URL
seed_pipeline.pl -url $HIVE_URL -logic_name perform_cmd -input_id "{'cmd' => 'sleep 0', 'dataflow_file' => './sample_files/Inject_JSON_Dataflow_example.json'}"
runWorker.pl -url $HIVE_URL
=head1 DESCRIPTION
This is an example of using the SystemCmd runnable to create dataflow events using parameters read from a JSON file.
There is a sample file located in ${EHIVE_ROOT_DIR}/modules/Bio/EnsEMBL/Hive/Examples/SystemCmd/PipeConfig/sample_files/
This file is called Inject_JSON_Dataflow_example.json
Each line of this file contains an optional branch number, followed by a complete JSON serialisation of the parameters (output_id)
appearing on the same line. For example, a line to direct dataflow on branch 2 might look like:
2 {"parameter_name" : "parameter_value"}
If no branch number is provided, then dataflow of those parameters will occour on the branch number
passed to SystemCmd in the 'dataflow_branch' parameter, if given. Otherwise, it will default to
branch 1 (autoflow).
Note that a command must be provided to SystemCmd using the 'cmd' parameter, even if JSON parameter injection
is the only desired behaviour.
=head1 LICENSE
Copyright [1999-2015] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute
Copyright [2016-2021] EMBL-European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
=head1 CONTACT
Please subscribe to the Hive mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/ehive-users to discuss Hive-related questions or to be notified of our updates
=cut


package Bio::EnsEMBL::Hive::Examples::SystemCmd::PipeConfig::InjectJSONDataflow_conf;

use strict;
use warnings;
use base ('Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf');


sub pipeline_analyses {
return [
{ -logic_name => 'perform_cmd',
-module => 'Bio::EnsEMBL::Hive::RunnableDB::SystemCmd',
-flow_into => {
1 => ['autoflow_test'],
2 => ['branch2_test'],
3 => ['branch3_test'],
4 => ['branch4_test'],
},
},
{ -logic_name => 'autoflow_test',
-module => 'Bio::EnsEMBL::Hive::RunnableDB::Dummy',
},
{ -logic_name => 'branch2_test',
-module => 'Bio::EnsEMBL::Hive::RunnableDB::Dummy',
},
{ -logic_name => 'branch3_test',
-module => 'Bio::EnsEMBL::Hive::RunnableDB::Dummy',
},
{ -logic_name => 'branch4_test',
-module => 'Bio::EnsEMBL::Hive::RunnableDB::Dummy',
},
];
}

1;

@@ -0,0 +1,3 @@
{"addressed_to_one" : "message for one"}
2 {"addressed_to_two" : "message for two"}
4 {"addressed_to_four" : "message for four" , "also_addressed_to_four" : "second message for four"}
25 changes: 24 additions & 1 deletion modules/Bio/EnsEMBL/Hive/RunnableDB/SystemCmd.pm
Expand Up @@ -15,7 +15,19 @@
The command line must be stored in the parameters() as the value corresponding to the 'cmd' key.
It allows to pass in other parameters and use the parameter substitution mechanism in its full glory.
=head1 CONFIGURATION EXAMPLE
This Runnable also allows the creation of dataflow using JSON stored in an external file.
Each line of this file contains an optional branch number, followed by a complete JSON serialisation of the parameters (output_id)
appearing on the same single line. For example, a line to direct dataflow on branch 2 might look like:
2 {"parameter_name" : "parameter_value"}
If no branch number is provided, then dataflow of those parameters will occour on the branch number
passed to SystemCmd in the 'dataflow_branch' parameter, if given. Otherwise, it will default to
branch 1 (autoflow).
A sample file is provided at ${EHIVE_ROOT_DIR}/modules/Bio/EnsEMBL/Hive/Examples/SystemCmd/PipeConfig/sample_files/Inject_JSON_Dataflow_example.json
=head1 CONFIGURATION EXAMPLES
# The following example shows how to configure SystemCmd in a PipeConfig module
# to create a MySQL snapshot of the Hive database before executing a critical operation.
Expand All @@ -31,6 +43,17 @@
},
},
# The following example shows how to configure SystemCmd in a PipeConfig module
# to generate dataflow events based on parameters stored as JSON in a file named "some_parameters.json"
{ -logic_name => 'inject_parameters_from_file',
-module => 'Bio::EnsEMBL::Hive::RunnableDB::SystemCmd',
-parameters => {
'dataflow_file' => 'some_parameters.json',
'cmd' => 'sleep 0', # a command must be provided in the cmd parameter
},
},
=head1 LICENSE
Copyright [1999-2015] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute
Expand Down

0 comments on commit 1c3a4c1

Please sign in to comment.