# Downloading Using the Command Line

This [jupyter notebook](https://jupyter.org) introduces [Sarracenia version 3](https://metpx.github.io/sarracenia) usage from the command line (mostly on Linux, but should be similar on Windows and Mac also, main difference being different conventions for where preferences and logs are stored. This is probably the easiest way to work with Sarracenia. You configure a flow to download files into a directory, and you can read the directory to process the files there.


In [1]:
import sarracenia
!mkdir -p ~/.config/sr3/subscribe
!mkdir -p ~/.cache/sr3/log


## Prerequisites

The above is just a way to get jupyter notebooks to install metpx-sr3 on a server.
Creating some directories in case people use API access without running things through the API. The basic pre-requisite is to have metpx-sr3 installed somehow, either as a .deb package, or using pip (or pip3) available to the environment used by jupyter.

The rest of this notebook assumes [metpx-sr3](https://metpx.github.io/sarracenia) is installed.

## SR3

The command line interface is called [sr3](../Reference/sr3.1.rst) (short for Sarracenia version 3). One defines
flows to run using configuration files in a simple format: _keyword_ _value_ format.
there are example configurations to get you started:

In [1]:
!sr3 list examples

Sample Configurations: (from: /home/peter/Sarracenia/sr3/sarracenia/examples )
cpump/cno_trouble_f00.inc        flow/amserver.conf               
flow/opg.conf                    flow/poll.inc                    
flow/post.inc                    flow/report.inc                  
flow/sarra.inc                   flow/sender.inc                  
flow/shovel.inc                  flow/subscribe.inc               
flow/watch.inc                   flow/winnow.inc                  
poll/airnow.conf                 poll/aws-nexrad.conf             
poll/copernicus_odata.conf       poll/mail.conf                   
poll/nasa-mls-nrt.conf           poll/nasa_cmr_opendap.conf       
poll/nasa_cmr_other.conf         poll/nasa_cmr_podaac.conf        
poll/noaa.conf                   poll/soapshc.conf                
poll/usgs.conf                   post/WMO_mesh_post.conf          
sarra/wmo_mesh.conf              sender/am_send.conf              
sender/ec2collab.conf            sen

There are different kinds for flows: the examples are classified flow type (poll, post, sarra, sender, shovel...)
A _subscribe_ is used by clients to download from a data pump. Let's pick one of those.

In [2]:
!sr3 add subscribe/hpfx_amis.conf

add: 2024-01-12 15:47:53,081 127062 [INFO] sarracenia.sr add copying: /home/peter/Sarracenia/sr3/sarracenia/examples/subscribe/hpfx_amis.conf to /home/peter/.config/sr3/subscribe/hpfx_amis.conf 



The files that are active for you are place in ~/.config/sr3/<flow_type>/config_name.  You can browse there 
and modify them with an editor if you like.  You can do that also with  _sr3 edit subscribe/hpfx_amis.conf_

    # this is a feed of wmo bulletin (a set called AMIS in the old times)

    broker amqps://hpfx.collab.science.gc.ca/
    exchange xpublic

    # instances: number of downloading processes to run at once.  defaults to 1. Not enough for this case
    instances 5
   
    # expire, in operational use, should be longer than longest expected interruption
    expire 10m

    topicPrefix v02.post
    subtopic *.WXO-DD.bulletins.alphanumeric.#
    mirror false
    directory /tmp/hpfx_amis/

added the messageCountMax, so it doesn't run forever.

In [3]:
!mkdir /tmp/hpfx_amis
!echo messageCountMax 10 >>~/.config/sr3/subscribe/hpfx_amis.conf

The root directory where files are to be placed needs to exist before you start.
the above commands are to configure on a linux machine, you might need something else on a mac or windows.

You can then run a flow interactively with the _foreground_ action, and it will end quickly, like so:

In [4]:
!sr3 foreground subscribe/hpfx_amis.conf

2024-01-12 15:50:03,810 127223 [INFO] sarracenia.config finalize overriding batch for consistency with messageCountMax: {self.batch}
.2024-01-12 15:50:04,207 [INFO] 127226 sarracenia.config finalize overriding batch for consistency with messageCountMax: {self.batch}
2024-01-12 15:50:04,213 [INFO] 127226 sarracenia.config finalize overriding batch for consistency with messageCountMax: {self.batch}
2024-01-12 15:50:04,213 [INFO] 127226 sarracenia.flow loadCallbacks flowCallback plugins to load: ['sarracenia.flowcb.gather.message.Message', 'sarracenia.flowcb.retry.Retry', 'sarracenia.flowcb.housekeeping.resources.Resources', 'log']
2024-01-12 15:50:04,217 [INFO] 127226 sarracenia.flowcb.log __init__ subscribe initialized with: logEvents: {'after_work', 'after_accept', 'after_post', 'post', 'on_housekeeping'},  logMessageDump: False
2024-01-12 15:50:04,217 [INFO] 127226 sarracenia.flow run callbacks loaded: ['sarracenia.flowcb.gather.message.Message', 'sarracenia.flowcb.retry.Retry', 'sarr




as you can see it downloaded five files to /tmp/amis.
The _foreground_ action is intended to help with debugging, rather than real operations.

In [5]:
!sr3 status

2024-01-12 15:50:20,026 127310 [INFO] sarracenia.config finalize overriding batch for consistency with messageCountMax: {self.batch}
status: 
Component/Config                         Processes   Connection        Lag                              Rates                                        
                                         State   Run Retry  msg data   Queued  LagMax LagAvg  Last  %rej     pubsub messages   RxData     TxData     
                                         -----   --- -----  --- ----   ------  ------ ------  ----  ----     ------ --------   ------     ------     
subscribe/hpfx_amis                      stop    0/0     0 100%  16%      0    4.37s    2.21s  8.6s  0.0% 149 Bytes/s   1 msgs/s 371 Bytes/s  0 Bytes/s
      Total Running Configs:   0 ( Processes: 0 missing: 0 stray: 0 )
                     Memory: uss:0 Bytes rss:0 Bytes vms:0 Bytes 
                   CPU Time: User:0.00s System:0.00s 
	   Pub/Sub Received: 1 msgs/s (149 Bytes/s), Sent:  0 ms

There is 1 configuration in your list.  You can have hundreds.  The columns on the right refer to how many instances you have for each configuration. In the example above _instances_ is set to 5, so one would expect to see 5 running instances when it would be running. You can start specifc configuration with _sr3 start subscribe/*_ or start all active instances with: _sr3 start_

In [6]:
!sr3 log subscribe/hpfx_amis.conf

2024-01-12 15:50:30,719 127315 [INFO] sarracenia.config finalize overriding batch for consistency with messageCountMax: {self.batch}
tail: cannot open '/home/peter/.cache/sr3/log/subscribe_hpfx_amis_01.log' for reading: No such file or directory
tail: no files remaining
2024-01-12 15:50:30,722 127315 [CRITICAL] root run_command subprocess.run failed err=Command '['tail', '-f', '/home/peter/.cache/sr3/log/subscribe_hpfx_amis_01.log']' returned non-zero exit status 1.



When running in the background, output needs to go a log file. As we have only run this configuration file in the foreground, asking to see the log prints an error about the log being missing. This tells you that the logs are in the _~/.cache/sr3/log_ directory. Logs can be monitored in real-time with traditional tools such as _tail -f_ or  _grep_.

_sr3 stop_ does what you expect.

Processes can crash. In the _sr3 status_ output above, if the number of processes in the Run column is less than in the Exp (for Expected) one, then it means that some instances have crashed.  you can repair it (just start the missing instances) with:

_sr3 sanity_  -- start missing instances, also kill strays if any found.

So that's it, an introduction to running configurations in Sarracenia from the command line.


## Conclusion

If all you want to do is obtain data from a data pump in real-time, using the command line interface to control some processes that run all the time, so that they dump files in a certain directory is the easiest way to go.

It isn't very efficient though.  When you have large numbers of files to work with, and you want high speed processing, it is better, in the sense of lower cpu and i/o overhead, and in terms of speed of processing,
to have your own application informed of the arrival of files, rather than scanning a directory.

The easiest way to do that is to add some callbacks to your flows.  We'll cover that next.