# Flowcharts for MCCE_Benchmarking:

Note: GitHub markdown apparently cannot render the mermaid graphs despite saying it does.


## Purpose:

To allow benchmarking viz the curated, experimental pKa database (v1). **DONE**.  
To allow benchmarking of mcce runs. **TODO**  

---
---

# 03-05-2024
---
# Update post 3/4/24 group meeting on benchmarking
  * Variable names change:
    - n_active -> n_batch
    - job_setup -> setup_job (in docuemntation)
  * Added missing scipy requirement (needed for mcce, not mcce_benchmarking per se)


## Completion status:
 * Entry point 'bench_expl_pkas' with sub-commands 'setup_job' and 'launch_job': DONE
 * Entry point 'bench_mcce_runs': 30%

---
## Flowchart for the current entry points (EPs) of `mcce_benchmark`:

```mermaid
%%{init: {"flowchart": {"htmlLabels": false}} }%%
flowchart TD
    exp("`EP:  _**bench_expl_pkas**_
    <br>
    Status: **DONE**
    <br>
    sub-cmd 1: _setup_job_
    sub-cmd 2: _launch_job_`"
    ) -- 1--> sb1[setup_job]
    exp -- 2 --> sb2[launch_job]
    sb1 -- Input: 'benchmarks_dir' --> dir1[\-benchmarks_dir exists?/]
    dir1 -->|N| ng1[mkdir 'benchmarks_dir']
    dir1 -->|Y| dosb1[Actions performed:
    1. Data setup
    2. Automated  cron scheduling
    ]
    ng1 --> dosb1
    sb2 --> dosb2[Create scheduling via crontab]
    dosb2 -- Monitor % completion --> cmd1[>grep -i 'completed' benchmarks_dir/cron_job_name.log]

    batch("`EP:  _**bench_launchjob**_
    (used in crontab)
    <br>
    Status: **DONE**
    <br>
    Submit ONE batch of size -n_batch
    using sh script`"
    ) -- Pre-requisites:
    A setup as in `bench_expl_pkas setup_job`:
     - Same folder structure
     - 'default_run.sh' or 'job_name.sh' script
        if job_name was used. --> wait(Wait x min)
     wait -->|Monitor % completion| cmd2[>grep -i 'completed' ../benchmark.log]
     cmd2 -->|Repeat until all jobs are completed| batch

    anz("`EP: _**bench_analyze**_
    <br>
    Status: **50% done**
    <br>
    sub-cmd 1: expl_pkas
    sub-cmd 2: mcce_runs`"
    ) -->ok[\Are all the jobs completed?/]
    ok -->|N| nogo1(Check again later!)
    ok -->|Y|asb1[1. expl_pkas]
    ok -->|Y|asb2[2. mcce_runs TODO]
    asb1 -->|Pre-requisites:
    Same folder structure as in
    `bench_expl_pkas setup_job`| dir2[\-benchmarks_dir exists?/]
    dir2 -->|Y|doasb1[Create all_pkas.out
    Output report files
    and figures in
    benchmarks_dir/analysis]
    dir2 -->|N| npgp2(Oops! Typo? Wrong dir?)
    asb2 -->|Pre-requisites:
    completed runs| doasb2["`**TBD**:
    diff of pK.out, sum_crg.out`"]
```
```mermaid
%%{init: {"flowchart": {"htmlLabels": false}} }%%

flowchart LR
subgraph sg0 ["`**Note:**`"]
    o["Because the folder structure and and pdbs will be different:"]
    style o fill:#fff,stroke:#f66,stroke-width:1px

    subgraph sg1 ["expl_pKas: pH only, pkadb pdbs & setup"]
    direction LR
      a("`**bench_expl_pkas**`") ==> b["`_bench_analyze_ **expl_pkas**`"]
    end
    subgraph sg2 ["mcce_runs**: pH, Eh, different setup..."]
    direction LR
      c("`**bench_mcce_runs**`") ==>d["`_bench_analyze_ **mcce_runs**`"]
    end

end
```

---
---
# 2-14-2024
---
# Refactoring MCCE benchmark cli ([Issue 16](https://github.com/GunnerLab/MCCE_Benchmarking/issues/16))

## Purpose:
To allow benchmarking viz the curated, experimental pKa database (v1) or viz MCCE runs (two runs with completed step4).

## Completion status:
The refactoring is not complete yet as it entails creating two different tracks (or entry points), along with automated tests.
* 'experimental_pkas': 80%
* 'mcce_runs': 30%

## What the command line commands would be (w/o renaming main EP):
### Entry point 'experimental_pkas':
 1. Setup the pdbs folder and files for the user. NO GO: `benchmarks_dir` already exists
 ```
 # No input means `benchmarks_dir` = ./mcce_benchmarks:

 >mccebench experimental_pkas benchmark_setup


 # With input:

 >mccebench experimental_pkas benchmark_setup -benchmarks_dir <different name>


 ```
 2. Analysis (default for `pct_complete` not yet determined):
 ```
 >mccebench experimental_pkas analyze -pct_complete <float> [-benchmarks_dir: optional if default was used in `benchmark_setup`]
 ```

### Entry point 'mcce_runs':
NOTE: I think there is no need for the -eps parameter since we would want to compare, say e8 vs e4 (default); Updated the flowchart accordingly.  
NO GO: User selected -titr_type = 'eh' with -reference_dir = 'parse.e4': this reference set is for ph titrations.
```
 >mccebench mcce_runs -new_calc_dir <new>, -reference_dir <ref> -titr_type <ph or eh>


```

---
## Flowchart for the two entry points:

```mermaid
graph TD
    exp[Entry point: 'experimental_pkas'
    default script: dry prot, 4 steps w/mfe] -->|Input: 'benchmards_dir'| dir{dir exists?}
    dir -->|N| ng1[STOP:
    Rename existing dir or
    Change 'benchmarks_dir']
    dir -->|Y| sub1{Subcommand choice:
    1. 'benchmark_setup'
    2. 'analyze'}
    sub1 -->| 'benchmark_setup' |do1[Actions performed:
    1. Data setup
    2. Scheduling setup
    3. Launch]
    sub1 -->| 'analyze' | prob1[Problem:
    Are there enough completed runs?
    Implement with a 'percentage' user input?
    => Function needed as initial check
    to obtain the completed entries in
    the book file & launch the analysis
    if % is met.]
    prob1 --> runs1{Enough completed runs?}
    runs1 -->| Y | rpt1[Final outputs:
    * all_pkas file
    * Matched pKas file
    * Residue stats
    * Conformers throughput per step\n using runtimes & conformer counts
    * Plots
    * Anything else?]
    runs1 --> | N | msg1(Try '>experimental_pkas analyze' later)

    mc[Entry point: 'mcce_runs'] -->|Inputs:
    2 completed runs:
    'new_calc_dir', 'reference_dir';
    'titr_type'| ref{Which 'reference_dir'?}
    ref -->|ref dir is e.g. 'parse.e4' from pKaDB
    Applicable only to pH titrations| comp2[Use 'all_pkas.e4' file for comparison]
    comp2 --> mcpka[Analysis outputs:
    * Matched, then diffed pKa values
    * Plot new vs ref for all numeric fields in pK.out]
    ref -->|ref dir is another mcce output dir
    ASSUMED: runs of same prot| rptmc[Analysis outputs:
    * Diffed pK.out
    * Residue stats
    * Plot new vs ref for all numeric fields in pK.out
    * Anything else?]
    mcpka -.-> note[Problem:
    Analysis will depend on the
    contents of 'parse.e4' dir:
    full output, partial
    or just 'all_pkas.e4' file?]
    rptmc -.-> note
```