# Demo: selection file management based on json files

*Motivation:*
As discussed in RWT selection files originally were targeted at humans not at tools -
yet when managing 100s-1000s of selection files for replication management a machine readable solution is necessary  

*Proposal:*
* use json as a machine readable serialization format for replica selections
* use a json storage backend to easily query the various json entries
* provide a selection file <--> json translation
* use this then as a platform to generate on demand selection files for usage in synda

In [1]:
# add manage_rep to module path such that notebook can run in cloned git repo
# later: pip support
import sys
sys.path.append('../util')
from manage_rep import man_selections
from tinydb import TinyDB, Query

### Example: load selection files and generate json files

* Discussion points: 
   * a simple generic synda selection file parsing routine is needed 
      * the one in synda in tightly coupled into synda and thus not usefull by now
      * 80% of synda selection file syntax is easily parsable by the python configparsers module (my approach taken here) 
      * eather we restrict our replica selections on this 80% synda syntax support or ...
      * some modifications for synda selection files would be helpfull (will put this as github issues on synda)
          * configparser analog for section specification
          * json input support for synda
          * ...
      

In [2]:
# all selection files were extended by one line (required section specifier for configparser)
sel_path = '../selection_files/NCI/'
json_path = sel_path  
new_dict = man_selections.read_sel_files(sel_path)
man_selections.write_json(new_dict,json_path)


reading:  ../selection_files/NCI/CMIP6_Bulk_Priority1.txt
reading:  ../selection_files/NCI/CMIP6_Bulk_Priority2.txt
reading:  ../selection_files/NCI/CMIP6_Bulk_Priority3.txt
reading:  ../selection_files/NCI/CMIP6_Bulk_Priority4.txt
reading:  ../selection_files/NCI/CMIP6_Bulk_Priority5.txt
Writing:  ../selection_files/NCI/CMIP6_Bulk_Priority1.json
Writing:  ../selection_files/NCI/CMIP6_Bulk_Priority2.json
Writing:  ../selection_files/NCI/CMIP6_Bulk_Priority3.json
Writing:  ../selection_files/NCI/CMIP6_Bulk_Priority4.json
Writing:  ../selection_files/NCI/CMIP6_Bulk_Priority5.json


### Load json files in json database
* simple in memory database should easily scale up to ths 1000s of json files we expect
* if more complex use cases arise we can easily swith to e.g. MongoDB or CouchDB

Discussion points: 
* it may be usefull to add additional key,value information pairs to better support of query
(this additional is omitted for the synda selection file generation)
* as an example I added "replica_center" as an additional key to assign selection files to a specific replica center
  (see query below)

In [3]:
db = TinyDB("./db.json")
for k,v in new_dict.items():
    db.insert(v)

### Query database and select parts needed

* it is simple to select parts and define new aggregations based on database queries
* for these newly aggregations then selection files can be generated or *better* synda is modified to accept json files in addition to selection files as input ..

In [4]:
test = Query()
docs = db.search(test.repl_center == ['NCI'] and test.priority == ['5000'])

In [5]:
for doc in docs:
    my_doc = doc
    print(my_doc)

{'project': ['CMIP6'], 'variable': ['psl', 'tas', 'uas', 'vas', 'hus'], 'frequency': ['6hr'], 'experiment': ['1pctCO2', 'piControl', 'historical', 'amip', 'abrupt4xCO2', 'ssp585', 'ssp245'], 'ensemble': ['r1i1p1'], 'priority': ['5000'], 'repl_center': ['NCI']}


### Generate selection files for the selected parts

In [6]:
print(man_selections.gen_sel(my_doc))

#### Replica center: ['NCI'] 
project = CMIP6 
variable = psl tas uas vas hus 
frequency = 6hr 
experiment = 1pctCO2 piControl historical amip abrupt4xCO2 ssp585 ssp245 
ensemble = r1i1p1 
priority = 5000 

