/
README.Rmd
178 lines (115 loc) · 8.83 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
output:
rmarkdown::github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
[![minimal R version](https://img.shields.io/badge/R%3E%3D-`r as.character(getRversion())`-brightgreen.svg)](https://cran.r-project.org/)
[![Licence](https://img.shields.io/github/license/mashape/apistatus.svg)](http://choosealicense.com/licenses/mit/)
[![Travis-CI Build Status](https://travis-ci.org/mrc-ide/glodide.png?branch=master)](https://travis-ci.org/mrc-ide/glodide)
## Submission of Global Lmic Reports via Orderly To DIDE
This is a working R package for submitting global fits to DIDE cluster.
## Installation
```
git clone https://github.com/mrc-ide/glodide.git
cd glodide
open glodide.Rproj
devtools::install_deps()
```
## Overview
The structure within analysis is as follows:
```
analysis/
|
├── 01_submission / # submission scripts
|
├── data/
│ ├── DO-NOT-EDIT-ANY-FILES-IN-HERE-BY-HAND
│ ├── raw_data/ # orderly bundles provided from bundling in global-lmic-reports-orderly
│ └── derived_data/ # orderly outputs produced from running reports in raw_data
```
## Process for submitting Jobs
This repository is used for submitting orderly reports to the DIDE cluster. Outline of steps:
> Run in `global-lmic-reports-orderly`
1. Execute bundle script in `global-lmic-reports-orderly` repository to produce orderly bundles in `analysis/data/raw_data` in `glodide`
> Run in `glodide`
2. Submit bundles to the DIDE HPC cluster using `analysis/01_submission.R` that uses `d
> Run in `global-lmic-reports-orderly`
3. Check fits are correct using code in `global-lmic-reports-orderly` and resubmit any countries that need to be rerun/run for longer
4. Pull correctly run orderly bundles from `analysis/data/derived_data` into `global-lmic-reports-orderly`
5. Compile `gh-pages` and push to `mrc-ide/global-lmic-reports`
---
The rationale for separating these steps between the two repositories is to ensure that the `global-lmic-reports-orderly`
location is not on the network share, which has file backups set that can cause issues with file locking etc related to the
orderly database. In addition, the use of `didehpc` (currently) requires the working directory when submitting jobs to be
on the network share and so this separation currently seems the best approach.
## Troubleshooting
[1. LaTeX and PDF Compile Issues](#1-latex-and-pdf-compile-issues)
[2. Google Tag Pandoc Issues](#2-google-tag-pandoc-issues)
[3. Context 24 Core Template](#3-context-24-core-template)
[4. Didehpc Errors](#4-didehpc-errors)
[5. Reconcile Errors](#5-reconcile-errors)
[6. Mapping Network Drives](#6-mapping-network-drives)
[7. Updating packages in context](#7-updating-packages-in-context)
[8. Peculiar didehpc failed jobs](#8-peculiar-didehpc-failed-jobs)
---
#### 1. LaTeX and PDF Compile Issues
PDF compilation was initially tricky with the default setup not having pandoc on a network share nor being able to correctly compile the PDF document from orderly runs. As a result, both [TinyTex](https://yihui.org/tinytex/) and pandoc were installed onto the network share at `L:/OJ` (pandoc was then copied over to its location on the network share):
```
tinytex::install_tinytex(dir = "L:/OJ/TinyTex")
installr::install.pandoc()
```
In the `global-lmic-reports-orderly` `lmic_reports_vaccine` run script, we then had to specify the following to get Rmd to compile PDF documents correctly, with much of the guidance for the last 2 lines below coming from the TinyTex [debugging guide](https://yihui.org/tinytex/r/#debugging)
```
# pandoc linking
if(file.exists("L:\\OJ\\pandoc")) {
rmarkdown:::set_pandoc_info("L:\\OJ\\pandoc")
Sys.setenv(RSTUDIO_PANDOC="L:\\OJ\\pandoc")
tinytex::use_tinytex("L:\\OJ\\TinyTex")
tinytex::tlmgr_update()
}
```
<br>
#### 2. Google Tag Pandoc Issues
Each page of the compiled website has a header of html with Google Analytics set up. This was originally included as an html file that was specified to appear before the body in the compiled html files. On the previous Azure server this worked fine, however, after switching to the DIDE cluster, on occasion there would be 404 errors on rendering the html page due to a network error related to fetching the Analytics code. As a result, we swapped to including as plain text the html that the Google Analytics html file was fetching instead. Hopefully, this fixes this issues but if any jobs fail on rendering the html with similar errors then check whether this text has changed or is wrong etc.
#### 3. Context 24 Core Template
When we moved from running the `squire` model to the `nimue` model we ran out of RAM when running the model. 50 draws from the mcmc chain appears to be fine on the DIDE cluster but only through specifying the 24 Core template in the setup of the didhpc context in the cluster submit script. If there any errors are returned suggesting the there is insufficient memory, e.g. "Can't allocate vector of xB...", then recommend either changing the number of trajectories drawn.
#### 4. Didehpc Errors
There are many types of didehpc error that may appear. First steps should be to head to [https://mrc-ide.github.io/didehpc/articles/troubleshooting.html](https://mrc-ide.github.io/didehpc/articles/troubleshooting.html) to read through all the guidance there, which should help identify specific errors.
#### 5. Reconcile Errors
The troubleshoot guide above contains information on how to reconcile job statuses, i.e. checking to see if the job status given by `grp$status()` is correct and matches what is shown at the DIDE cluster end. However, to see the status of all jobs from the DIDE cluster, the following will help (and is similar to what is internally run when reconciling errors using `obj$reconcile(grp$ids)`):
```
dat <- obj$client$status_user("*", obj$config$cluster)
# dat <- dat[which(dat$name %in% grp$ids),] # this shows you all the tasks
dat <- dat[match(grp$ids, dat$name),] # it uses a match call to get the most recent task running with that id
```
#### 6. Mapping Network Drives
To run the cluster submit script, two network drives need to be mapped:
- T: //fi--didef3.dide.ic.ac.uk/tmp
- L: //fi--didenas5/malaria
If these are not mapped when starting the machine (in particular the tmp drive is often not mapped), then see [https://support.microsoft.com/en-us/windows/map-a-network-drive-in-windows-10-29ce55d1-34e3-a7e2-4801-131475f9557d](https://support.microsoft.com/en-us/windows/map-a-network-drive-in-windows-10-29ce55d1-34e3-a7e2-4801-131475f9557d) for instructions. In overview:
File Explorer > This PC > Map Network Drive (in Tabs) > Map using mappings in list above
#### 7. Updating packages in context
If you need to update any packages that are in the context, e.g. if you make changes to `squire` and `nimue` that you pushed to their Github repositories, then to update these in the context use the following in the cluster_submit script:
```
obj <- didehpc::queue_didehpc(ctx, config = config, provision = "lazy")
```
### 8. Peculiar didehpc failed jobs
If there is a string of jobs that fail in a row, i.e. say jobs 80 - 100 all error, and the error is not one that is a clear R error, i.e. the jobs just seem to stop, this is likely a problem with the specific node the jobs are being run on. In which case, rerunning them should work. This error could be due to the node not behaving or because other jobs being run on that node by other users are maybe taking too much memory and causing something strange. If it continues to be an issue, work out the dide_id of the failed task and ask Wes to see if there is something strange with that specific node.
---
## The R package
This repository is organized as an R package. There are a few R functions exported in this package - the majority of the R code is in the analysis directory. The R package structure is here to help manage dependencies, to take advantage of continuous integration, and so we can keep file and data management simple.
To download the package source as you see it on GitHub, for offline browsing, use this line at the shell prompt (assuming you have Git installed on your computer):
```{r eval=FALSE}
git clone https://github.com/mrc-ide/glodide.git
```
Once the download is complete, open the `glodide.Rproj` in RStudio to begin working with the package and compendium files. We will endeavor to keep all package dependencies required listed in the DESCRIPTION. This has the advantage of allowing `devtools::install_dev_deps()` to install the required R packages needed to run the code in this repository
## Licenses
Code: [MIT](http://opensource.org/licenses/MIT) year: `r format(Sys.Date(), "%Y")`, copyright holder: OJ Watson
Data: [CC-0](http://creativecommons.org/publicdomain/zero/1.0/) attribution requested in reuse