# flow API Example

The sarracenia.flow class provides built in accept/reject filtering for messages, supports built-in downloading in several protocols, retries on failure, and allows the creation of callbacks, to customize processing.

You need to provide a configuration as an argument when instantiating a subscriber.
the _sarracenia.config.no_file_config()_ returns an empty configuration without consulting
any of the sr3 configuration file tree.

After adding the modifications needed to the configuration, the subscriber is then initiated and run.

In [1]:
!mkdir /tmp/flow_demo

make a directory for the files you are going to download.
the root of the directory tree to must exist.

In [4]:
import re
import sarracenia.config
from sarracenia.flow.subscribe import Subscribe
import sarracenia.flowcb

from urllib.parse import urlparse

cfg = sarracenia.config.no_file_config()

cfg.broker = urlparse('amqps://anonymous:anonymous@hpfx.collab.science.gc.ca')
cfg.topicPrefix = [ 'v02', 'post']
cfg.program_name = 'subscribe'
cfg.bindings = [ ('xpublic', ['v02', 'post'], ['*', 'WXO-DD', 'observations', 'swob-ml', '#' ]) ]
cfg.queue_name='q_anonymous.subscriber_test2'
cfg.download=True
cfg.batch=1
cfg.message_count_max=5

# set the instance number for the flow class.
cfg.no=0

# set flow class to put working files in ~/.cache/sr3/subscrribe/flow_demo directory.
cfg.pid_filename = sarracenia.config.get_pid_filename( None, cfg.program_name, 'flow_demo', 0)

# accept/reject patterns:
pattern=".*"
#              to_match, write_to_dir, DESTFN, regex_to_match, accept=True,mirror,strip, pstrip,flatten
cfg.masks= [ ( pattern, "/tmp/flow_demo", None, re.compile(pattern), True, False, False, False, '/' ) ]





## starters.
the broker, bindings, and queue_name settings are explained in the moth notebook.

## cfg.download

Whether you want the flow to download the files corresponding to the messages.
If true, then it will download the files.

## cfg.batch

Messages are processed in batches. The number of messages to retrieve per call to newMessages()
is limited by the _batch_ setting.  We set it to 1 here so you can see each file being downloaded immediately when the corresponding message is downloaded.  you can leave this blank, and it defaults to 25. Settings are matter of taste and use case.

## cfg.message_count_max

Normally we just leave this setting at it's default (0) which has no effect on processing.
for demonstration purposes, we limit the number of messages the subscriber will process with this setting.
after _message_count_max_ messages have been received, stop processing.


## cfg.masks
masks are a compiled form of accept/reject directives.  a relPath is compared to the regex in the mask.
If the regex matches, and accept is true, then the message is accepted for further processing.
If the regex matches, but accept is False, then processing of the message is stopped (the message is rejected.)

masks are a tuple. the meaning can be looked up in the sr3(1) man page.

*  pattern_string,      the input regular expression string, to be compiled by re routines.
*  directory,           where to put the files downloaded (root of the tree, when mirroring)
*  fn,                  transformation of filename to do. None is the 99% use case.
*  regex,               compiled regex version of the pattern_string
*  accept(True/False),  if pattern matches then accept message for further processing.
*  mirror(True/False),  when downloading build a complete tree to mirror the source, or just dump in directory
*  strip(True/False),   modify the relpath by stripping entries from the left.
*  pstrip(True/False),  strip entries based on patterm
*  flatten(char ... '/' means do not flatten.) )

## cfg.no, cfg.pid_filename

These settings are needed because they would ordinarily be set by the sarracenia.instance class which is
normally used to launch flows. They allow setting up of run-time paths for retry_queues, and statefiles,
to remember settings if need be between runs.


In [5]:
subscriber = sarracenia.flow.subscribe.Subscribe( cfg )

subscriber.run()

2021-10-31 23:55:22,482 [INFO] sarracenia.flow loadCallbacks plugins to load: ['sarracenia.flowcb.gather.message.Message', 'sarracenia.flowcb.retry.Retry']
2021-10-31 23:55:22,597 [DEBUG] amqp _on_start Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit@hpfx2.collab.science.gc.ca', 'copyright': 'Copyright (C) 2007-2019 Pivotal Software, Inc.', 'information': 'Licensed under the MPL.  See http://www.rabbitmq.com/', 'platform': 'Erlang/OTP 21.3', 'product': 'RabbitMQ', 'version': '3.7.13'}, mechanisms: [b'AMQPLAIN', b'PLAIN'], locales: ['en_US']
2021-10-31 23:55:22,643 [DEBUG] amqp __init__ using channel_id: 1
2021-10-31 23:55:22,659 [DEBUG] amqp _on_open_ok Channel open
2021-10-31 23:55:22,720

_Config__admin=None, _Config__broker=amqps://anonymous@hpfx.collab.science.gc.ca, _Config__post_broker=None, accel_threshold=0,
accept_unmatched=False, attempts=3, auto_delete=False, baseDir=None, baseUrl_relPath=False, batch=1, bind=True,
bindings="...blic', ['v02', 'post'], ['*', 'WXO-DD', 'observations', 'swob-ml', '#'])]", bufsize=1048576, bytes_per_second=None, bytes_ps=0,
cfg_run_dir='.', chmod=0, chmod_dir=509, chmod_log=384, currentDir=None, debug=False, declare=True, declared_exchanges=[], declared_users={},
delete=False, destfn_script=None, directory=None, discard=False, documentRoot=None, download=True, durable=True, env_declared=[], exchange=None,
expire=300, fileEvents={'delete', 'modify', 'link', 'create'}, file_time_limit=5184000.0, filename='WHATFN', fixed_headers={}, flatten='/',
hostdir='fractal', hostname='fractal', housekeeping=30, imports=[], inflight=None, inline=False, inline_encoding='guess', inline_max=4096,
inline_only=False, instances=1, integrity_arbitrary_v

2021-10-31 23:55:23,084 [INFO] sarracenia.flow do_download downloaded ok: /tmp/flow_demo/2021-11-01-0353-CWWF-AUTO-minute-swob.xml
2021-10-31 23:55:23,229 [INFO] sarracenia.flow do_download downloaded ok: /tmp/flow_demo/2021-11-01-0352-CVRA-AUTO-minute-swob.xml
2021-10-31 23:55:23,374 [INFO] sarracenia.flow do_download downloaded ok: /tmp/flow_demo/2021-11-01-0353-CWUS-AUTO-minute-swob.xml
2021-10-31 23:55:23,399 [DEBUG] amqp collect Closed channel #1
2021-10-31 23:55:23,400 [INFO] sarracenia.flowcb.gather.message on_stop closing
2021-10-31 23:55:23,400 [INFO] sarracenia.flow close flow/close completed cleanly


# Conclusion:

With the sarracenia.flow class, an async method of operation is supported, it can be customized using flowcb (flow callback) class to introduce specific processing at specific times. It is just like invocation of a single instance from the command line, except all configuration is done within python by setting cfg fields, rather than using the configuration language.

What is lost vs. using the command line tool: 

* ability to use the configuration language (slightly simpler than assigning values to the cfg object) 
* easy running of multiple instances, 
* co-ordinated monitoring of the instances (restarts on failure, and a programmable number of subscribers started per configuration.) 
* log file management.

The command line tool provides those additional features.