# **BEDHOST Demo**

The following demo has the purpose of demonstrating how to process, generate statistics and plots of BED files genrated by the R package Genomic Distributions using the REST API for the bedstat and bedbuncher pipelines. 

The general workflow for uploading bed files and their 


## First part of the tutorial (insert BED files stats into elastic)


### 1) Create a PEP describing the BED files to process

In order to get started, we'll need a PEP [Portable Encapsulated project](https://pepkit.github.io/). A PEP consists of 1) an annotation sheet (.csv) that contains information about the samples on a project and 2) a project config.yaml file that points to the sample annotation sheet. THe config file also has other components, such as derived attributes, that in this case point to the BED files to be processed. The following is an example of a config file using the derived attributes output_file_path and yaml_file to point to the `.bed.gz` files and their respective metadata.

In [15]:
cat demo_config.yaml

metadata:
  sample_table: demo_annotation_sheet.csv
  output_dir: $HOME/Desktop/bedstat/bedhost_demo_files_justBED/bedstat_pipeline_results 
  pipeline_interfaces: ../../pipeline_interface.yaml

constant_attributes: 
  output_file_path: "source"
  yaml_file: "source2"
  protocol: "bedstat"

derived_attributes: [output_file_path, yaml_file]
data_sources:
  source: "../{file_name}" 
  source2: "$HOME/Desktop/bedstat/bedhost_demo_files_justBED/bedstat_pipeline_results/submission/{sample_name}.yaml"

### 2) Download the Bedbase configration manager (bbconf)

[bbconf](https://github.com/databio/bbconf) implements convenience methods for interacting with the database backend, which in this case is defined by an Elastic search local cluster. For carrying out this demo, we'll be using the dev version of `bbconf` that can be download as follows:

In [None]:
git clone -b dev git@github.com:databio/bbconf

In order to use bbconf, we'll need to create a minimal configuration.yaml file. The path to this configration file can be stores as the environmental variable `$BEDBASE`

In [29]:
cat $BEDBASE

path:
  pipelines_output: $LABROOT/resources/regions/bedstat_output

database:
  host: localhost
  bed_index: bed_index
  bedset_index: bedset_index

server:
  host: 0.0.0.0
  port: 8000


### 3) Run the bedstat pipeline on the demo PEP

[bedstat](https://github.com/databio/bedstat) is a pypiper pipeline that generates statistics and plots of BED files. For more detailed information about the pipeline and how to set a local elastic search cluster to insert and query files, click [here](https://github.com/databio/bedstat/blob/master/README.md) 

To run [bedstat](https://github.com/databio/bedstat) and the other required pipelines in this demo, we will rely on the pipeline submission engine [looper](http://looper.databio.org/en/latest/). For detailed instructions in how to link a project to a pipeline, click [here](http://looper.databio.org/en/latest/linking-a-pipeline/). If the pipeline is being run from an HPC environment where docker is not available, we recommend running the pipeline using the `--no-db-commit` flag (this will only calculate statistics and generate plots but will not insert this information into the local elasticsearch cluster.

In [27]:
cd ~/Desktop/bedstat
looper run bedhost_demo_files_justBED/bedhost_demo_refPEP/demo_config.yaml --no-db-commit --compute local 

Once we have generated plots and statistics, we can insert them into our local elastic search cluster running the bedstat pipeline with the `--just-db-commit` flag

In [26]:
cd ~/Desktop/bedstat
looper run bedhost_demo_files_justBED/bedhost_demo_refPEP/demo_config.yaml --just-db-commit --compute local 

After the previous steps have been executed, our BED files should be available for query on our local elastic search cluster. Files can be queried using the `bedbuncher` pipeline described in the below section. 


## Second part of the tutorial (use bedbuncher to create bedsets)

### 1) Create a new PEP describing the bedset name and specific JSON query  
[bedbuncher](https://github.com/databio/bedbuncher) is a pipeline designed to create bedsets (sets of BED files retrieved from bedbase). In order to create bedsets, we will need to create an additional PEP describing the query as well as attributes such as the name assigned to the newly created bedset. This configuration file should descibe the path to the `JSON` query file. THe configuration file should have the following structure:

In [7]:
cd ~/Desktop/bedbuncher/project
cat bedset_query.csv

sample_name,bedset_name,JSONquery_name,bbconfig_name,JSONquery_path,output_folder_path
bedset1,test_bedset_igd,test_query,bbconfig,source1,source2


In [10]:
cd ~/Desktop/bedbuncher/project
cat cfg.yaml

metadata:
  sample_table: bedset_query.csv
  output_dir: . 
  pipeline_interfaces: ../pipeline_interface.yaml 

derived_attributes: [JSONquery_path]
data_sources:
  source1: ~/Desktop/bedbuncher/tests/{JSONquery_name}.json
 
constant_attributes:
  protocol: "bedbuncher"

### 2) Run the bedbuncher pipeline with looper

In order to create a bedset, we simply need to create a PEP as previously shown and run the bedbuncher pipeline using looper

In [None]:
cd ~/Desktop/bedbuncher
looper run project/cfg.yaml --compute local

## Third part of the demo (run local instance of bedhost)

The last part of the tutorial consists on running a local instance of [bedhost](https://github.com/databio/bedhost/tree/master) (a REST API for bedstat and bedbuncher produced outputs) in order to explore and download output files. To access the API, we'll need to download the dev branch of the github repository as follows:

In [None]:
git clone -b dev git@github.com:databio/bedhost

Then we need to run the following command, making sure to point to the previously described bedbase config.yaml file 

In [None]:
bedhost serve -c path/to/config

If we have stored the path to the bedbase config in the environment variable `$BEDBASE` (suggested), it's not neccesary to specify the path to start bedhost

In [None]:
bedhost serve 