# flow API Example

The [sarracenia.flow class](../Reference/code.rst#module-sarracenia.flow) provides built in accept/reject filtering for messages, supports built-in downloading in several protocols, retries on failure, and allows the creation of callbacks, to customize processing.

You need to provide a configuration as an argument when instantiating a subscriber.
the _sarracenia.config.no_file_config()_ returns an empty configuration without consulting
any of the sr3 configuration file tree.

After adding the modifications needed to the configuration, the subscriber is then initiated and run.

In [2]:
!mkdir /tmp/flow_demo

make a directory for the files you are going to download.
the root of the directory tree to must exist.

In [3]:
import re
import sarracenia.config
from sarracenia.flow.subscribe import Subscribe
import sarracenia.flowcb
import sarracenia.credentials

cfg = sarracenia.config.no_file_config()

cfg.broker = sarracenia.credentials.Credential('amqps://anonymous:anonymous@hpfx.collab.science.gc.ca')
cfg.topicPrefix = [ 'v02', 'post']
cfg.component = 'subscribe'
cfg.config = 'flow_demo'
cfg.action = 'start'
cfg.bindings = [ ('xpublic', ['v02', 'post'], ['*', 'WXO-DD', 'observations', 'swob-ml', '#' ]) ]
cfg.queueName='q_anonymous.subscriber_test2'
cfg.download=True
cfg.batch=1
cfg.messageCountMax=5

# set the instance number for the flow class.
cfg.no=0

# set other settings based on provided ones, so it is ready for use.

cfg.finalize()

# accept/reject patterns:
pattern=".*"
#              to_match, write_to_dir, DESTFN, regex_to_match, accept=True,mirror,strip, pstrip,flatten
cfg.masks= [ ( pattern, "/tmp/flow_demo", None, re.compile(pattern), True, False, False, False, '/' ) ]





## starters.
the broker, bindings, and queueName settings are explained in the moth notebook.

## cfg.download

Whether you want the flow to download the files corresponding to the messages.
If true, then it will download the files.

## cfg.batch

Messages are processed in batches. The number of messages to retrieve per call to newMessages()
is limited by the _batch_ setting.  We set it to 1 here so you can see each file being downloaded immediately when the corresponding message is downloaded.  you can leave this blank, and it defaults to 25. Settings are matter of taste and use case.

## cfg.messageCountMax

Normally we just leave this setting at it's default (0) which has no effect on processing.
for demonstration purposes, we limit the number of messages the subscriber will process with this setting.
after _messageCountMax_ messages have been received, stop processing.


## cfg.masks
masks are a compiled form of accept/reject directives.  a relPath is compared to the regex in the mask.
If the regex matches, and accept is true, then the message is accepted for further processing.
If the regex matches, but accept is False, then processing of the message is stopped (the message is rejected.)

masks are a tuple. the meaning can be looked up in the sr3(1) man page.

*  pattern_string,      the input regular expression string, to be compiled by re routines.
*  directory,           where to put the files downloaded (root of the tree, when mirroring)
*  fn,                  transformation of filename to do. None is the 99% use case.
*  regex,               compiled regex version of the pattern_string
*  accept(True/False),  if pattern matches then accept message for further processing.
*  mirror(True/False),  when downloading build a complete tree to mirror the source, or just dump in directory
*  strip(True/False),   modify the relpath by stripping entries from the left.
*  pstrip(True/False),  strip entries based on patterm
*  flatten(char ... '/' means do not flatten.) )

## cfg.no, cfg.pid_filename

These settings are needed because they would ordinarily be set by the sarracenia.instance class which is
normally used to launch flows. They allow setting up of run-time paths for retry_queues, and statefiles,
to remember settings if need be between runs.


In [4]:
subscriber = sarracenia.flow.subscribe.Subscribe( cfg )

subscriber.run()

2024-01-29 15:00:37,351 [INFO] sarracenia.flow loadCallbacks flowCallback plugins to load: ['sarracenia.flowcb.gather.message.Message', 'sarracenia.flowcb.retry.Retry', 'sarracenia.flowcb.housekeeping.resources.Resources', 'log']
2024-01-29 15:00:37,354 [DEBUG] sarracenia.flowcb.retry __init__ sr_retry __init__
2024-01-29 15:00:37,354 [DEBUG] sarracenia.config add_option []0 retry_driver declared as type:<class 'str'> value:disk
2024-01-29 15:00:37,355 [DEBUG] sarracenia.diskqueue __init__  work_retry_00 __init__
2024-01-29 15:00:37,357 [DEBUG] sarracenia.config add_option []0 MemoryMax declared as type:<class 'int'> value:0
2024-01-29 15:00:37,357 [DEBUG] sarracenia.config add_option []0 MemoryBaseLineFile declared as type:<class 'int'> value:100
2024-01-29 15:00:37,358 [DEBUG] sarracenia.config add_option []0 MemoryMultiplier declared as type:<class 'float'> value:3
2024-01-29 15:00:37,359 [DEBUG] sarracenia.config add_option []0 logEvents declared as type:<class 'set'> value:{'after

2024-01-29 15:00:38,025 [INFO] sarracenia.flowcb.log after_accept accepted: (lag: 8201.25 ) https://hpfx.collab.science.gc.ca /20240129/WXO-DD/observations/swob-ml/20240129/CVBB/2024-01-29-1743-CVBB-AUTO-minute-swob.xml
2024-01-29 15:00:38,025 [INFO] sarracenia.flow do_download missing destination directories, makedirs: /tmp/flow_demo/20240129/WXO-DD/observations/swob-ml/20240129/CVBB 
2024-01-29 15:00:38,114 [INFO] sarracenia.flowcb.log after_work downloaded ok: /tmp/flow_demo/20240129/WXO-DD/observations/swob-ml/20240129/CVBB/2024-01-29-1743-CVBB-AUTO-minute-swob.xml 
2024-01-29 15:00:38,139 [DEBUG] sarracenia.moth.amqp getNewMessage new msg: {'_format': 'v02', '_deleteOnPost': {'source', '_format', 'exchange', 'subtopic', 'local_offset', 'ack_id'}, 'sundew_extension': 'DMS:WXO_RENAMED_SWOB2:MSC:XML::20240129174356', 'from_cluster': 'DDSR.CMC', 'to_clusters': 'ALL', 'filename': 'msg_ddsr-WXO-DD_8067f0a1a5b4711ab86e481341b26590:DMS:WXO_RENAMED_SWOB2:MSC:XML::20240129174356', 'source':

## Conclusion:

With the sarracenia.flow class, an async method of operation is supported, it can be customized using flowcb (flow callback) class to introduce specific processing at specific times. It is just like invocation of a single instance from the command line, except all configuration is done within python by setting cfg fields, rather than using the configuration language.

What is lost vs. using the command line tool: 

* ability to use the configuration language (slightly simpler than assigning values to the cfg object) 
* easy running of multiple instances, 
* co-ordinated monitoring of the instances (restarts on failure, and a programmable number of subscribers started per configuration.) 
* log file management.

The command line tool provides those additional features.