# pISA-tree upload to FAIRDOMHub

*Petek Marko*

*National Institute of Biology (NIB)*

*August, 2022*


## 1. Prior to running this notebook 

1. Install R libraries in the following order into R environment used by jupyter:
* [devtools](https://cran.r-project.org/web/packages/devtools/index.html)
* [pisar](https://github.com/NIB-SI/pisar)
* [seekr](https://github.com/NIB-SI/seekr)

2. Login to [FAIRDOMHub SEEK](https://fairdomhub.org/login) using your credentials.

3. Create FAIRDOMHub **_project_** to host the **_Investigation_** to be uploaded. Set the access permision for the project to ```no_access``` and select the correct license.

4. In your pISA-tree **_project_** directory open the ```seekignore.txt``` in an text editor and add files or directories to be ignored. Also check if the values for the key ```Upload to FAIRDOMHub:``` is set correctly for all pISA-tree levels in the Investigation to be uploaded.

5. In the **_Investigation_** directory, run ```xcheckMetadata.bat``` and inspect the report ```*.md``` file to check for errors and missing values. Errors should not be reported, otherwise upload will most probably fail.

6. In the **_Investigation_** to be uploaded, create a upload **_Study_** (e.g. ```_S_UPLOAD```) and in this study create an upload **_Assay_** (e.g. ```_A_FAIRDOMHub-R```). The directory path to the upload Assay should be as follows: ```/_S_UPLOAD/_A_FAIRDOMHub-R/```

## 2. Load required libraries

If you get errors here, try reinstalling the libraries and/or their dependencies (```remove.packages()```, ```install.packages("", dependencies = TRUE)```, ```update.packages()```).

In [None]:
library(jsonlite)
library(httr)
library(pisar)
library(seekr)

## 3. Directory initialization

Set working directory to in the output directory of the FAIRDOMHub upload assay level (replace <pISA-tree_InvestigationPath> with local path to Investigation for upload):

```fp = file.path('<pISA-tree_InvestigationPath>', '_S_UPLOAD', '_A_FAIRDOMHub-R', 'output')```

```setwd(fp)```

In [None]:
fp = file.path('.','_p_RNAinVAL', '_I_02_FieldTrials', '_S_UPLOAD', '_A_FAIRDOMHub-R', 'output')
setwd(fp)

Store pISA-tree details in object ```pini```:

```?readMeta``` Read metadata file from the given directory

In [None]:
pini <- pisar::pisa()
names(pini)
str(pini)

## 4. Get paths Investigation to be uploaded

Set path to ``seekignore.txt`` file which is located at the project level:

In [None]:
seekignore <- readLines(file.path(.proot, "seekignore.txt"))
seekignore

Get path to the Investigation to be uploaded:

In [None]:
uploadRoot <- pini$I$root
uploadRoot

Prepare and export list of files to be uploaded. You can open this file in a text editor and inspect the list if it contains directories or files that you do not want to upload. If you find any you can add them to ```seekignore.txt``` and rerun this section of the notebook.

In [None]:
forUpload <- seekr::skFilesToUpload(root = uploadRoot, skignore = seekignore)

In [None]:
forUpload

write(forUpload, file = "uploadList.txt")

Calculate upload size and get an size-decreasing ordered list of files to be uploaded. 

In [None]:
cat("Total size for Upload:", round(sum(file.size(paste0(c(rep(uploadRoot,length(forUpload))),"/",forUpload)), na.rm = TRUE)/(1024*1024), 2), "Mb")

UploadFileSizes <- data.frame(FileName = forUpload, FileSize = (round(file.size(paste0(c(rep(uploadRoot,length(forUpload))),"/",forUpload))/(1024*1024),3)))
o <- order(UploadFileSizes$FileSize, decreasing = TRUE)
orderedUploadFileSizes <- UploadFileSizes[o, ]
orderedUploadFileSizes

## 5. Prepare FAIRDOMHub connection

Set credentials for connection to FAIRDOMHub API. The values for usr, pwd and myid below are made up. __INSERT YOUR CREDENTIALS INTO THE FOLLOWING CELL__: 

In [None]:
seekr::skReset( all=TRUE )
seekr::skOptions()

#input your own credentials here!
seekr::skSetOption("url", "https://fairdomhub.org/")
seekr::skSetOption("usr", "johndoe")
seekr::skSetOption("pwd", "mypassword")
seekr::skSetOption("myid", "9999")
seekr::skOptions()

Set the root directory for uploading:

In [None]:
seekr::skSetOption("root", uploadRoot)
seekr::skSetLayers(path=".",root=skGetOption("root"))

Check the existence of the programme and project on FAIRDOMHub. Last three lines should output the metadata fetched from FAIRDOMHub:

In [None]:
.pname <- "PROJECT NAME"  # copy your FAIRDOMHub project name here!
.prname <- "PROGRAMME NAME"  # copy your FAIRDOMHub mprgramme name here!
.testing <- FALSE

seekr::skFindId("programme", .prname)
seekr::skFindId("project", .pname)
print.simple.list(skOptions("id"))

if(seekr::skFindTitle("projects", seekr::skGetOption("proj"))["title"]!=.pname) {cat("Wrong project name or id. Create project", layers[1],"\n")} else {cat("Project exists:", .pname, "\n")}

Set log file (append=TRUE will not rewrite the file if connection is interrupted and you have to run the upload again):

In [None]:
seekr::skLog("Upload_I_02_FieldTrials", file="FAIRDOM.log", append=TRUE)

## 6. Upload

Create neccessary levels and upload files sequentially as listed in the forUpload list:

In [None]:
seekr::skUploadFiles(forUpload[1:length(forUpload)], test=FALSE, append=FALSE, verbose=TRUE)

In [None]:
devtools::session_info()

### Troubleshooting

If for any reason the upload is interrupted, check in the log file and on FAIRDOMHub which file was the last uploaded. Then you can continue the upload by running the command as follows (__SET lastUploadedPlusOne__):

#### Example
Last file succesfully uploaded was 32 in the forUpload list

In [None]:
# IF UPLOAD WAS interrupted, change FALSE to TRUE, input correct number for lastUploadedPlusOne and RUN
if (FALSE)
{
lastUploadedPlusOne <- 33 # Input correct number
skUploadFiles(forUpload[lastUploadedPlusOne:length(forUpload)], test=FALSE, append=TRUE, verbose=TRUE)
}