# “dx create_cohort” in Python
<hr/>
***As-Is Software Disclaimer***

This content in this repository is delivered “As-Is”. Notwithstanding anything to the contrary, DNAnexus will have no warranty, support, liability or other obligations with respect to Materials provided hereunder.

<hr/>

This notebook demonstrates usage of the dx command `create_cohort` to:
* Create record of type CohortBrowser on the platform through CLI
* Add additional filters to the existing filters in an input CohortBrowser record

<a href="https://github.com/dnanexus/OpenBio/blob/master/LICENSE.md">MIT License</a> applies to this notebook.

## Preparing your environment
### Launch spec:

* App name: JupyterLab with Python, R, Stata, ML
* Kernel: Python
* Instance type: mem1_ssd1_v2_x2
* Cost: < $0.1
* Runtime: < 2 min
* Data description: Input for this notebook is a v3.0 Dataset or Cohort object ID. 
  * exceptions in the current version of `dx create_cohort`: 
    * A CohortBrowser record with "OR" logic on "global_primary_key" is not acceptable
    * A CohortBrowser record with integer type "global_primary_key" is not acceptable

### Install DNAnexus supported package, dxpy

In [None]:
# For dx create_cohort, dxpy must be v0.359.0 or greater
# However, a more recent version of dxpy on PyPI may already be available
# and installed, making the below "pip" install unecessary.
!pip3 install -U dxpy==0.363.0

In [None]:
import subprocess
import dxpy
import json

### Print help message

In [None]:
cmd = ["dx", "create_cohort", "--help"]
help_text = subprocess.check_output(cmd)
print(help_text.decode(), end="\n")

### Assign environment variables

In [None]:
# The referenced dataset is not a public dataset and listed here only to demonstrate as an example input.
# The user will need to supply a permissible and valid project ID and record ID

# Assign a project qualified dataset record, project-id:record-id
dataset = "project-G9j1pX00vGPzF2XQ7843k2Jq:record-GYK2zyQ0g1bx86fBp2X8KpjY"

# Assign a project qualified CohortBrowser, project-id:record-id
cohort = "project-G9j1pX00vGPzF2XQ7843k2Jq:record-GYXBF3j0vGPv9kxZGBBKVQFq"

#### Create working directory for outputs. 
Skip this step if you want to create the output in the current working directory or have a target location for your output.

In [None]:
working_directory = "create_cohort_demo_folder"
subprocess.check_output(['dx', 'mkdir', working_directory])
subprocess.check_output(['dx', 'cd', working_directory])
subprocess.check_output(['dx', 'pwd']).decode()

## Run dx create_cohort with dataset record as input

### 1. Create a new CohortBrowser record in the current working directory by passing cohort ID filters as a string
Note: `--verbose` is passed in the command to display additional details of the output record

In [None]:
cmd = ["dx", "create_cohort", "--from", dataset, "--cohort-ids", "patient_1, patient_2", "--verbose"]
new_record_details = subprocess.check_output(cmd)
print(new_record_details.decode(), end="\n")

### 2. Create a new CohortBrowser record with user defined filename and location

In [None]:
path = "/create_cohort_demo_folder/new_cohort_record_1"

In [None]:
cmd = ["dx", "create_cohort", path, "--from", dataset, "--cohort-ids", "patient_1, patient_2"]
new_record_details = subprocess.check_output(cmd)
print(new_record_details.decode(), end="\n")

### 3. Create a new CohortBrowser record by passing cohort IDs in a file with one ID per line and no header

#### Create a file with cohort IDs

In [None]:
cohort_ids = ['patient_1', 'patient_2']
file = open('cohort_ids.txt','w')
for item in cohort_ids:
    file.write(item+"\n")
file.close()

#### Run `dx create_cohort`
Note: `--brief` is passed in the command to display only the record ID of the output

In [None]:
cmd = ["dx", "create_cohort", "new_cohort_record_2", "--from", dataset, "--cohort-ids-file", "cohort_ids.txt", "--brief"]
new_record_details = subprocess.check_output(cmd)
print(new_record_details.decode(), end="\n")

## Run dx create_cohort with CohortBrowser record as input

### 1. Create a new CohortBrowser record with another CohortBrowser record as input.
The filters added through CLI will be combined with the existing filters in the input

In [None]:
cmd = ["dx", "create_cohort", "new_cohort_record_3", "--from", cohort, "--cohort-ids", "patient_1, patient_2, patient_4", "--brief"]
new_record_id = subprocess.check_output(cmd)
print(new_record_details.decode(), end="\n")

### 2. Validate results with `dx extract_dataset`

#### `dx extract_dataset` on the parent dataset record shows 10 rows

In [None]:
cmd = ["dx", "extract_dataset", dataset, "--fields", "patient.patient_id", "-o", "-"]
extract_dataset_1 = subprocess.check_output(cmd)
print(extract_dataset_1.decode(), end="\n")

#### `dx extract_dataset` on input CohortBroswer record shows 5 rows

In [None]:
cmd = ["dx", "extract_dataset", cohort, "--fields", "patient.patient_id", "-o", "-"]
extract_dataset_2 = subprocess.check_output(cmd)
print(extract_dataset_2.decode(), end="\n")

#### `dx extract_dataset` on output CohortBrowser record shows 2 rows

In [None]:
cmd = ["dx", "describe", new_record_id.decode().strip("\n"), "--json"]
describe_output = subprocess.check_output(cmd)
project_id = json.loads(describe_output.decode())["project"]
new_record = project_id + ":" + new_record_id.decode().strip("\n")
print(new_record)

In [None]:
cmd = ["dx", "extract_dataset", new_record, "--fields", "patient.patient_id", "-o", "-"]
extract_dataset_3 = subprocess.check_output(cmd)
print(extract_dataset_3.decode(), end="\n")

## Delete the working directory. 
This step is added for demo. Skip this step to preserve output records or if this working directory was not created earlier

In [None]:
subprocess.check_call(["dx", "cd", ".."])
subprocess.check_call(["dx", "rm", "-rf", working_directory])