# “dx create_cohort” in R
<hr/>
***As-Is Software Disclaimer***

This content in this repository is delivered “As-Is”. Notwithstanding anything to the contrary, DNAnexus will have no warranty, support, liability or other obligations with respect to Materials provided hereunder.

<hr/>

This notebook demonstrates usage of the dx command `create_cohort` to:
* Create record of type CohortBrowser on the platform through CLI
* Add additional filters to the existing filters in an input CohortBrowser record

<a href="https://github.com/dnanexus/OpenBio/blob/master/LICENSE.md">MIT License</a> applies to this notebook.

## Preparing your environment
### Launch spec:

* App name: JupyterLab with Python, R, Stata, ML
* Kernel: R
* Instance type: mem1_ssd1_v2_x2
* Cost: < $0.1
* Runtime: < 2 min
* Data description: Input for this notebook is a v3.0 Dataset or Cohort object ID. 
  * exceptions in the current version of `dx create_cohort`: 
    * A CohortBrowser record with "OR" logic on "global_primary_key" is not acceptable
    * A CohortBrowser record with integer type "global_primary_key" is not acceptable

### Install DNAnexus supported package, dxpy

In [None]:
# For dx create_cohort, dxpy must be v0.359.0 or greater
# However, a more recent version of dxpy on PyPI may already be available
# and installed, making the below "pip" install unecessary.
system("pip3 install -U dxpy==0.363.0")

In [None]:
install.packages("readr")
install.packages("jsonlite")

In [None]:
library(dplyr)
library(readr)
library(stringr)
library(jsonlite)

### Print help message

In [None]:
cmd <- "dx create_cohort --help"
system(cmd, intern = TRUE)

### Assign environment variables

In [None]:
# The referenced dataset is not a public dataset and listed here only to demonstrate as an example input.
# The user will need to supply a permissible and valid project ID and record ID

# Set project-id
pid <- "project-GVbq1B80pXYXfbv361XBxZ8F"

# Assign a project qualified dataset record, project-id:record-id
dataset <- paste(pid, "record-FyFPyz0071F54Zjb32vG82Gj", sep = ":")

# Assign a project qualified CohortBrowser, project-id:record-id
cohort <- paste(pid, "record-GZZvx6j0pXYZ3X8X3Fgg54qy", sep = ":")

#### Create working directory for outputs. 
Skip this step if you want to create the output in the current working directory or have a target location for your output.

In [None]:
working_directory <- "create_cohort_demo_folder"
cmd <- paste("dx mkdir", working_directory,sep = " ")
system(cmd, intern = TRUE)
cmd <- paste("dx cd", working_directory,sep = " ")
system(cmd, intern = TRUE)

## Run dx create_cohort with dataset record as input

### 1. Create a new CohortBrowser record in the current working directory by passing cohort ID filters as a string
Note: `--verbose` is passed in the command to display additional details of the output record

In [None]:
cmd <- paste("dx create_cohort --from", dataset, "--cohort-ids 'sample_100_0,sample_100_1' --verbose", sep = " ")
system(cmd, intern = TRUE)

### 2. Create a new CohortBrowser record with user defined filename and location

In [None]:
path <- "/create_cohort_demo_folder/new_cohort_record_1"
cmd <- paste(
    "dx create_cohort",
    path,
    "--from", dataset,
    "--cohort-ids 'sample_100_0,sample_100_1' --verbose",
    sep = " "
)
system(cmd, intern = TRUE)

### 3. Create a new CohortBrowser record by passing cohort IDs in a file with one ID per line and no header

#### Create a file with cohort IDs

In [None]:
samples <- c("sample_100_0" , "sample_100_1")
lapply(samples, write, "cohort_ids.txt", append=TRUE, ncolumns=1)

#### Run `dx create_cohort`
Note: `--brief` is passed in the command to display only the record ID of the output

In [None]:
path <- "/create_cohort_demo_folder/new_cohort_record_2"
cmd <- paste(
    "dx create_cohort",
    path,
    "--from", dataset,
    "--cohort-ids-file cohort_ids.txt --brief",
    sep = " "
)
system(cmd, intern = TRUE)

## Run dx create_cohort with CohortBrowser record as input

### 1. Create a new CohortBrowser record with another CohortBrowser record as input.
The filters added through CLI will be combined with the existing filters in the input

In [None]:
path <- "/create_cohort_demo_folder/new_cohort_record_3"
cmd <- paste(
    "dx create_cohort",
    path,
    "--from", dataset,
    "--cohort-ids 'sample_100_1,sample_100_10,sample_100_100' --brief",
    sep = " "
)
rid <- system(cmd, intern = TRUE)
print(rid)

### 2. Validate results with `dx extract_dataset`

#### `dx extract_dataset` on the parent dataset record shows 50,000 sample (ID) rows

In [None]:
cmd <- paste(
    "dx extract_dataset",
    dataset,
    "--fields 'phenotype.sample_id'  -o - | wc -l",
    sep = " "
)
system(cmd, intern = TRUE)

#### `dx extract_dataset` on the parent dataset record shows 22,683 sample (ID) rows

In [None]:
cmd <- paste(
    "dx extract_dataset",
    cohort,
    "--fields 'phenotype.sample_id'  -o - | wc -l",
    sep = " "
)
system(cmd, intern = TRUE)

#### `dx extract_dataset` on output CohortBrowser record shows only 3 sample (ID) rows

In [None]:
cmd <- paste(
    "dx extract_dataset",
    paste(pid, rid, sep = ":"),
    "--fields 'phenotype.sample_id'  -o -",
    sep = " "
)

system(cmd, intern = TRUE)