# Command Line Interface

This notebook introduces sarracenia v3 usage from the command line (mostly on Linux, 
but should be similar on Windows and Mac also, main difference being different conventions for where
preferences and logs are stored. This is probably the easiest way to work with Sarracenia. You
configure a flow to download files into a directory, and you can read the directory to
process the files there.


In [1]:
import sarracenia
!mkdir -p ~/.config/sr3/subscribe
!mkdir -p ~/.cache/sr3/log


# Prequisites
The above is just a way to get jupyter notebooks to install metpx-sr3 on a server.
Creating some directories in case people use API access without running things through the API.
Not sure if the above works.  The basic pre-requisite is to have metpx-sr3 installed somehow,
either as a .deb package, or using pip (or pip3) available to the environment used by jupyter.

The rest of this notebook assumes metpx-sr3 is installed.

# SR3

The command line interface is called _sr3_ (short for Sarracenia version 3). One defines
flows to run using configuration files in a simple format: _keyword_ _value_ format.
there are example configurations to get you started:

In [1]:
!sr3 list examples

Sample Configurations: (from: /usr/lib/python3/dist-packages/sarracenia/examples )
cpump/cno_trouble_f00.inc        poll/aws-nexrad.conf             
poll/pollingest.conf             poll/pollnoaa.conf               
poll/pollsoapshc.conf            poll/pollusgs.conf               
poll/pulse.conf                  post/WMO_mesh_post.conf          
sarra/wmo_mesh.conf              sender/ec2collab.conf            
sender/pitcher_push.conf         shovel/no_trouble_f00.inc        
subscribe/WMO_Sketch_2mqtt.conf  subscribe/WMO_Sketch_2v3.conf    
subscribe/WMO_mesh_CMC.conf      subscribe/WMO_mesh_Peer.conf     
subscribe/aws-nexrad.conf        subscribe/dd_2mqtt.conf          
subscribe/dd_all.conf            subscribe/dd_amis.conf           
subscribe/dd_aqhi.conf           subscribe/dd_cacn_bulletins.conf 
subscribe/dd_citypage.conf       subscribe/dd_cmml.conf           
subscribe/dd_gdps.conf           subscribe/dd_ping.conf           
subscribe/dd_radar.conf         

There are different kinds for flows: the examples are classified flow type (poll, post, sarra, sender, shovel...)
A _subscribe_ is used by clients to download from a data pump. Let's pick one of those.

In [4]:
!sr3 add subscribe/hpfx_amis.conf

add: 2021-09-09 19:51:17,602 [INFO] sarracenia.sr add copying: /home/peter/Sarracenia/sr3/sarracenia/examples/subscribe/hpfx_amis.conf to /home/peter/.config/sr3/subscribe/hpfx_amis.conf 



The files that are active for you are place in ~/.config/sr3/<flow_type>/config_name.  You can browse there 
and modify them with an editor if you like.  You can do that also with  _sr3 edit subscribe/hpfx_amis.conf_

    # this is a feed of wmo bulletin (a set called AMIS in the old times)

    broker amqps://hpfx.collab.science.gc.ca/
    exchange xpublic

    # instances: number of downloading processes to run at once.  defaults to 1. Not enough for this case
    instances 5
   
    # expire, in operational use, should be longer than longest expected interruption
    expire 10m

    topic_prefix v02.post
    subtopic *.WXO-DD.bulletins.alphanumeric.#
    mirror false
    directory /tmp/amis/
    accept .*

added the message_count_max, so it doesn't run forever.

In [5]:
!mkdir /tmp/amis
!echo message_count_max 10 >>~/.config/sr3/subscribe/hpfx_amis.conf

The root directory where files are to be placed needs to exist before you start.
the above commands are to configure on a linux machine, you might need something else on a mac or windows.

You can then run a flow interactively with the _foreground_ action, and it will end quickly, like so:

In [6]:
!sr3 foreground subscribe/hpfx_amis.conf

2021-09-09 19:55:35,695 [INFO] sarracenia.config fill_missing_options overriding batch for consistency with message_count_max: 10
2021-09-09 19:55:35,800 [INFO] sarracenia.config fill_missing_options overriding batch for consistency with message_count_max: 10
2021-09-09 19:55:35,801 [INFO] sarracenia.flow loadCallbacks plugins to load: ['sarracenia.flowcb.retry.Retry', 'sarracenia.flowcb.v2wrapper.V2Wrapper', 'sarracenia.flowcb.gather.message.Message']
2021-09-09 19:55:36,038 [INFO] sarracenia.moth.amqp __getSetup queue declared q_anonymous_subscribe.hpfx_amis.88291804.03896805 (as: amqps://anonymous@hpfx.collab.science.gc.ca/) 
2021-09-09 19:55:36,038 [INFO] sarracenia.moth.amqp __getSetup binding q_anonymous_subscribe.hpfx_amis.88291804.03896805 with v02.post.*.WXO-DD.bulletins.alphanumeric.# to xpublic (as: amqps://anonymous@hpfx.collab.science.gc.ca/)
2021-09-09 19:55:36,071 [INFO] sarracenia.flow run options:
_Config__admin=amqp://bunnymaster@localhost,
_Config__broker="...ps://an

as you can see it downloaded five files to /tmp/amis.
The _foreground_ action is intended to help with debugging, rather than real operations.

In [3]:
!sr3 status

2021-09-09 16:17:07,033 [INFO] sarracenia.config fill_missing_options overriding batch for consistency with message_count_max: 10
status: 
Component/Config                         State        Run  Miss   Exp Retry
----------------                         -----        ---  ----   --- -----
subscribe/hpfx_amis                      stopped        0     0     0     0
      total running configs:   0 ( processes: 0 missing: 0 stray: 0 )


There is 1 configuration in your list.  You can have hundreds.  The columns on the right refer to how many instances you have for each configuration. In the example above _instances_ is set to 5, so one would expect to see 5 running instances when it would be running. You can start specifc configuration with _sr3 start subscribe/*_ or start all active instances with: _sr3 start_

!sr3 log subscribe/hpfx_amis.conf

    tail: cannot open '/home/peter/.cache/sr3/log/subscribe_hpfx_amis_01.log' for reading: No such file or    directory
    tail: no files remaining
    2021-02-17 01:10:09,252 [ERROR] root run_command subprocess.run failed err=Command '['tail', '-f', '/home/peter/.cache/sr3/log/subscribe_hpfx_amis_01.log']' returned non-zero exit status 1.
    
When running in the background, output needs to go a log file. As we have only run this configuration file in the foreground, asking to see the log prints an error about the log being missing. This tells you that the logs are in the _~/.cache/sr3/log_ directory. Logs can be monitored in real-time with traditional tools such as _tail -f_ or  _grep_.

_sr3 stop_ does what you expect.

Processes can crash. In the _sr3 status_ output above, if the number of processes in the Run column is less than in the Exp (for Expected) one, then it means that some instances have crashed.  you can repair it (just start the missing instances) with:

_sr3 sanity_  -- start missing instances, also kill strays if any found.

So that's it, an introduction to running configurations in Sarracenia from the command line.


# Conclusion

If all you want to do is obtain data from a data pump in real-time, using the command line interface to control some processes that run all the time, so that they dump files in a certain directory is the easiest way to go.

It isn't very efficient though.  When you have large numbers of files to work with, and you want high speed processing, it is better, in the sense of lower cpu and i/o overhead, and in terms of speed of processing,
to have your own application informed of the arrival of files, rather than scanning a directory.

The easiest way to do that is to add some callbacks to your flows.  We'll cover that next.