# Downloading Using the Command Line - Test

This [jupyter notebook](https://jupyter.org) introduces [Sarracenia version 3](https://metpx.github.io/sarracenia) usage from the command line (mostly on Linux, but should be similar on Windows and Mac also, main difference being different conventions for where preferences and logs are stored.) This is probably the easiest way to work with Sarracenia. You configure a flow to download files into a directory, and you can read the directory to process the files there.


In [1]:
import sarracenia
!mkdir -p ~/.config/sr3/subscribe
!mkdir -p ~/.cache/sr3/log


## Prerequisites

The above is just a way to get jupyter notebooks to install metpx-sr3 on a server.
Creating some directories in case people use API access without running things through the API. The basic pre-requisite is to have metpx-sr3 installed somehow, either as a .deb package, or using pip (or pip3) available to the environment used by jupyter.

The rest of this notebook assumes [metpx-sr3](https://metpx.github.io/sarracenia) is installed.

## SR3

The command line interface is called [sr3](../Reference/sr3.1.rst) (short for Sarracenia version 3). One defines
flows to run using configuration files in a simple format: _keyword_ _value_ format.
There are example configurations to get you started:

In [2]:
!sr3 list examples

Sample Configurations: (from: /net/local/home/shakerm/sr3/sarracenia/examples )
cpump/cno_trouble_f00.inc        flow/amserver.conf               
flow/opg.conf                    flow/poll.inc                    
flow/post.inc                    flow/report.inc                  
flow/sarra.inc                   flow/sender.inc                  
flow/shovel.inc                  flow/subscribe.inc               
flow/watch.inc                   flow/winnow.inc                  
poll/airnow.conf                 poll/aws-nexrad.conf             
poll/copernicus_odata.conf       poll/mail.conf                   
poll/nasa-mls-nrt.conf           poll/nasa_cmr_opendap.conf       
poll/nasa_cmr_other.conf         poll/nasa_cmr_podaac.conf        
poll/noaa.conf                   poll/soapshc.conf                
poll/usgs.conf                   post/WMO_mesh_post.conf          
sarra/wmo_mesh.conf              sender/am_send.conf              
sender/ec2collab.conf            sender/pitcher_p

There are different kinds for flows: the examples are classified by flow type (poll, post, sarra, sender, shovel, etc.)
A _subscribe_ is used by clients to download from a data pump. Let's pick one of those.

In [3]:
!sr3 add subscribe/hpfx_amis.conf

add: 2024-03-06 23:48:56,706 2118966 [INFO] sarracenia.sr add copying: /net/local/home/shakerm/sr3/sarracenia/examples/subscribe/hpfx_amis.conf to /net/local/home/shakerm/.config/sr3/subscribe/hpfx_amis.conf 



The files that are active for you are placed in ~/.config/sr3/\<flow_type>/config_name.  You can browse there 
and modify them with an editor if you like.  You can also do that with  _sr3 edit subscribe/hpfx_amis.conf_.

    # this is a feed of wmo bulletin (a set called AMIS in the old times)

    broker amqps://hpfx.collab.science.gc.ca/
    exchange xpublic

    # instances: number of downloading processes to run at once.  Defaults to 1. Not enough for this case
    instances 5
   
    # expire, in operational use, should be longer than longest expected interruption
    expire 10m

    topicPrefix v02.post
    subtopic *.WXO-DD.bulletins.alphanumeric.#
    mirror false
    directory /tmp/hpfx_amis/

Add the messageCountMax, so it doesn't run forever:

In [4]:
!mkdir /tmp/hpfx_amis
!echo messageCountMax 10 >>~/.config/sr3/subscribe/hpfx_amis.conf

The root directory where files are to be placed needs to exist before you start.
The above commands are to configure on a Linux machine, you might need something else on a mac or windows.

You can then run a flow interactively with the _foreground_ action, and it will end quickly, like so:

In [5]:
!sr3 foreground subscribe/hpfx_amis.conf

2024-03-06 23:49:06,570 2118978 [INFO] sarracenia.config finalize overriding batch for consistency with messageCountMax: {self.batch}
.2024-03-06 23:49:06,841 [INFO] 2118981 sarracenia.config finalize overriding batch for consistency with messageCountMax: {self.batch}
2024-03-06 23:49:06,846 [INFO] 2118981 sarracenia.config finalize overriding batch for consistency with messageCountMax: {self.batch}
2024-03-06 23:49:06,846 [INFO] 2118981 sarracenia.flow loadCallbacks flowCallback plugins to load: ['sarracenia.flowcb.gather.message.Message', 'sarracenia.flowcb.retry.Retry', 'sarracenia.flowcb.housekeeping.resources.Resources', 'log']
2024-03-06 23:49:06,855 [INFO] 2118981 sarracenia.flowcb.log __init__ subscribe initialized with: logEvents: {'after_post', 'on_housekeeping', 'after_work', 'after_accept'},  logMessageDump: False
2024-03-06 23:49:06,855 [INFO] 2118981 sarracenia.flow run callbacks loaded: ['sarracenia.flowcb.gather.message.Message', 'sarracenia.flowcb.retry.Retry', 'sarrac

As you can see, it downloaded five files to /tmp/amis.
The _foreground_ action is intended to help with debugging, rather than real operations.

In [6]:
!sr3 status

2024-03-06 23:49:30,243 2118998 [INFO] sarracenia.config finalize overriding batch for consistency with messageCountMax: {self.batch}
status: 
Component/Config                         Processes   Connection        Lag                              Rates                                        
                                         State   Run Retry  msg data   Queued  LagMax LagAvg  Last  %rej     pubsub messages   RxData     TxData     
                                         -----   --- -----  --- ----   ------  ------ ------  ----  ----     ------ --------   ------     ------     
subscribe/hpfx_amis                      stop    0/0          -          -         -     -     -          -        -
      Total Running Configs:   0 ( Processes: 0 missing: 0 stray: 0 )
                     Memory: uss:0 Bytes rss:0 Bytes vms:0 Bytes 
                   CPU Time: User:0.00s System:0.00s 
	   Pub/Sub Received: 0 msgs/s (0 Bytes/s), Sent:  0 msgs/s (0 Bytes/s) Queued: 0 Retry: 0, Mean lag

Above, you can see there is 1 configuration in your list.  You can have hundreds.  The columns on the right refer to how many instances you have for each configuration. In the example above, _instances_ is set to 5, so one would expect to see 5 running instances when it would be running. You can start specifc configurations (in this case a subscribe config) with _sr3 start subscribe/\<config>_, or start all active configs from all components (sarra, subscribe, watch, winnow, etc.) with _sr3 start_

In [7]:
!sr3 log subscribe/hpfx_amis.conf

2024-03-06 23:45:56,401 2118802 [INFO] sarracenia.config finalize overriding batch for consistency with messageCountMax: {self.batch}
tail: cannot open '/net/local/home/shakerm/.cache/sr3/log/subscribe_hpfx_amis_01.log' for reading: No such file or directory
tail: no files remaining
2024-03-06 23:45:56,406 2118802 [CRITICAL] root run_command subprocess.run failed err=Command '['tail', '-f', '/net/local/home/shakerm/.cache/sr3/log/subscribe_hpfx_amis_01.log']' returned non-zero exit status 1.



When running in the background, output needs to go a log file. Since we have only ran this configuration file in the foreground, asking to see the log prints an error about the log being missing. This tells you that the logs are in the _~/.cache/sr3/log_ directory. Logs can be monitored in real-time with traditional tools such as _tail -f_ or  _grep_.

_sr3 stop_ does what you expect.

Processes can crash. In the _sr3 status_ output above, if the number of processes in the Run column is less than in the Exp (for Expected) one, then it means that some instances have crashed. You can repair it (just start the missing instances) with:

_sr3 sanity_  -- start missing instances, also kill strays if any found.

So that's it, an introduction to running configurations in Sarracenia from the command line.


## Conclusion

If all you want to do is obtain data from a data pump in real-time, the easiest way to go is using the command line interface to control some processes that run all the time so that they dump files in a certain directory.

It isn't very efficient though. When dealing with a large number of files and aiming for high-speed processing, it’s more efficient to have your own application receive notifications about file arrivals rather than scanning a directory. This approach reduces CPU and I/O overhead while improving processing speed.

The easiest way to do that is to add some callbacks to your flows.  We'll cover that next.