# Merging Assay Datasets
***
***As-Is Software Disclaimer***

*This content in this repository is delivered “As-Is”. Notwithstanding anything to the contrary, DNAnexus will have no warranty, support, liability or other obligations with respect to Materials provided hereunder.*

*[MIT License](https://github.com/dnanexus/OpenBio/blob/master/LICENSE.md) applies to this notebook.*
***

**Launch spec:**
- App name: JupyterLab with Python, R, Stata, ML, Image Processing
- Kernel: Python 3
- Instance type: mem1_ssd1_v2_x4
- Spark cluster configuration: single node
- runtime: ~ 5 min

**Package dependencies:**
- pprint [License](https://docs.python.org/3/license.html?#psf-license)

**Data description:** The record ID of a Dataset that has an instance of a GeneticVariantAssay and the record ID of a Dataset that has an instance of a MolecularExpressionAssay. All data in this notebook is synthetic.


**This notebook shows how to:** Merge Assays from multiple Datasets.
***

## Load packages using `import`

In [1]:
import dxdata
import pprint

## View properties of Datasets

Using dxdata, view the properties of each Dataset. `genopheno_ds` is a Dataset that has a GeneticVariantAssay. `molexp_ds` is a Dataset that has a MolecularExpressionAssay. 

In [None]:
genopheno_ds = dxdata.load_dataset(id="record-G3814k006Fjgk5J74Kx3X1QG")
pprint.pprint(genopheno_ds.__dict__)

In [None]:
molexp_ds = dxdata.load_dataset(id="record-G7Z4Zf80GjvF7QP7Fk6Bz3Fp")
pprint.pprint(molexp_ds.__dict__)

## Create a linking file

If a linking table does not already exist, a linking file can be used to create a linking table in a database while running the Assay Dataset Merger App. Here's an example of what a linking file could look like:

In [None]:
%%bash
dx cat file-G7gXx700Gjv1J3kx15P8Q47Q

## Run Assay Dataset Merger

In [None]:
%%bash
dx run app-assay_dataset_merger \
-isource_dataset=record-G7Z4Zf80GjvF7QP7Fk6Bz3Fp \
-itarget_dataset=record-G3814k006Fjgk5J74Kx3X1QG \
-ilinking_file=file-G7gXx700Gjv1J3kx15P8Q47Q \
-ioutput_dataset_name="merged.dataset" \
-iassay_sample_id_field_name="sample_id" \
-iassay_sample_id_entity_name="expression" \
-ipheno_entity_name="patients" \
-ipheno_sample_id_field_name="patient_id" \
-iassay_entries_per_pheno="one" \
-ilinking_database_name="linking_table" \
-idashboard_template="Global Defaults" \
--detach \
-y

## View properties of newly merged dataset

The properties of `merged_ds` shows that it has both a GeneticVariantAssay and a MolecularExpressionAssay!

In [None]:
%%bash
# Look up the record ID of the new dataset
dx describe job-GJQ6PV00Gjv02fq0FxGzKG6f --json | jq .output

In [None]:
merged_ds = dxdata.load_dataset(id="record-GJQ6X2Q0Y96k865PFv00kJZ0")
pprint.pprint(merged_ds.__dict__)