# BEDBASE workflow tutorial

The following demo has the purpose of demonstrating how to process, generate statistics and plots of BED files generated by the R package Genomic Distributions using the `bedhost` REST API for the bedstat and bedbuncher pipelines output. 

Notes:

- If this hasn't been already done, we recommend starting this jupyter notebook enabling sudo permissions since steps such as downloading `docker` or running an elasticsearch `docker` container won't be executed otherwise. This can be done with `sudo jupyter notebook --allow-root`

 
 

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Create-a-tutorial-directory-and-download-demo-files" data-toc-modified-id="Create-a-tutorial-directory-and-download-demo-files-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Create a tutorial directory and download demo files</a></span></li><li><span><a href="#Generate-statistics-and-plots-of-BED-files-using-BEDSTAT" data-toc-modified-id="Generate-statistics-and-plots-of-BED-files-using-BEDSTAT-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Generate statistics and plots of BED files using BEDSTAT</a></span><ul class="toc-item"><li><span><a href="#Create-a-PEP-describing-the-BED-files-to-process" data-toc-modified-id="Create-a-PEP-describing-the-BED-files-to-process-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Create a PEP describing the BED files to process</a></span></li><li><span><a href="#Download-bedstat-and-the-Bedbase-configuration-manager-(bbconf)" data-toc-modified-id="Download-bedstat-and-the-Bedbase-configuration-manager-(bbconf)-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Download bedstat and the Bedbase configuration manager (bbconf)</a></span></li><li><span><a href="#Inititiate-a-local-elasticsearch-cluster" data-toc-modified-id="Inititiate-a-local-elasticsearch-cluster-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Inititiate a local elasticsearch cluster</a></span></li><li><span><a href="#Run-bedstat--on-the-demo-PEP" data-toc-modified-id="Run-bedstat--on-the-demo-PEP-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Run bedstat  on the demo PEP</a></span></li></ul></li><li><span><a href="#Create-bedsets-using-BEDBUNCHER" data-toc-modified-id="Create-bedsets-using-BEDBUNCHER-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Create bedsets using BEDBUNCHER</a></span><ul class="toc-item"><li><span><a href="#Create-a-new-PEP-describing-the-bedset-name-and-specific-JSON-query" data-toc-modified-id="Create-a-new-PEP-describing-the-bedset-name-and-specific-JSON-query-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Create a new PEP describing the bedset name and specific JSON query</a></span></li><li><span><a href="#Download-bedbuncher--and-install-CML-dependencies" data-toc-modified-id="Download-bedbuncher--and-install-CML-dependencies-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Download bedbuncher  and install CML dependencies</a></span></li><li><span><a href="#Run-bedbuncher-using-Looper" data-toc-modified-id="Run-bedbuncher-using-Looper-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Run bedbuncher using Looper</a></span></li></ul></li><li><span><a href="#Run-local-instance-of-the-bedhost-API" data-toc-modified-id="Run-local-instance-of-the-bedhost-API-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Run local instance of the bedhost API</a></span></li></ul></div>

## Create a tutorial directory and download demo files 
We need create a directory where we'll store the bedbase pipelines and files to be processed. We'll also need to create an environment variable that points to the tutorial directory (we'll need this variable in section 3 of the tutorial). 

In [1]:
cd $HOME

In [2]:
mkdir bedbase_tutorial
cd bedbase_tutorial
export BEDBASEtutorial="$HOME/bedbase_tutorial"
#source ~/.bashrc

To download the files we'll need for this tutorial, we can easily do it with the following commands:

In [3]:
wget http://big.databio.org/example_data/bedbase_demo/bedbase_demo_files_justBED/bedbase_BEDfiles.tar.gz     
wget http://big.databio.org/example_data/bedbase_demo/bedbase_demo_files_justBED/Configuration_files.tar.gz

--2020-03-30 12:19:08--  http://big.databio.org/example_data/bedbase_demo/bedbase_demo_files_justBED/bedbase_BEDfiles.tar.gz
Resolving big.databio.org (big.databio.org)... 128.143.245.181
Connecting to big.databio.org (big.databio.org)|128.143.245.181|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 60245813 (57M) [application/octet-stream]
Saving to: ‘bedbase_BEDfiles.tar.gz’


2020-03-30 12:19:13 (11.6 MB/s) - ‘bedbase_BEDfiles.tar.gz’ saved [60245813/60245813]

--2020-03-30 12:19:13--  http://big.databio.org/example_data/bedbase_demo/bedbase_demo_files_justBED/Configuration_files.tar.gz
Resolving big.databio.org (big.databio.org)... 128.143.245.181
Connecting to big.databio.org (big.databio.org)|128.143.245.181|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1418 (1.4K) [application/octet-stream]
Saving to: ‘Configuration_files.tar.gz’


2020-03-30 12:19:13 (139 MB/s) - ‘Configuration_files.tar.gz’ saved [1418/1418]



The downloaded files are compressed so we'll need to untar them:

In [4]:
tar -zxvf bedbase_BEDfiles.tar.gz
tar -zxvf Configuration_files.tar.gz

bedbase_BEDfiles/
bedbase_BEDfiles/GSE105977_ENCFF449EZT_optimal_idr_thresholded_peaks_hg19.bed.gz
bedbase_BEDfiles/GSE105587_ENCFF018NNF_conservative_idr_thresholded_peaks_GRCh38.bed.gz
bedbase_BEDfiles/GSE105587_ENCFF413ANK_peaks_hg19.bed.gz
bedbase_BEDfiles/GSM2423312_ENCFF155HVK_peaks_GRCh38.bed.gz
bedbase_BEDfiles/GSE105977_ENCFF617QGK_optimal_idr_thresholded_peaks_GRCh38.bed.gz
bedbase_BEDfiles/GSE91663_ENCFF316ASR_peaks_GRCh38.bed.gz
bedbase_BEDfiles/GSM2423313_ENCFF722AOG_peaks_GRCh38.bed.gz
bedbase_BEDfiles/GSE105587_ENCFF809OOE_conservative_idr_thresholded_peaks_hg19.bed.gz
bedbase_BEDfiles/GSM2827349_ENCFF196DNQ_peaks_GRCh38.bed.gz
bedbase_BEDfiles/GSE91663_ENCFF553KIK_optimal_idr_thresholded_peaks_GRCh38.bed.gz
bedbase_BEDfiles/GSE91663_ENCFF319TPR_conservative_idr_thresholded_peaks_GRCh38.bed.gz
bedbase_BEDfiles/GSE105977_ENCFF634NTU_peaks_hg19.bed.gz
bedbase_BEDfiles/GSE105977_ENCFF937CGY_peaks_GRCh38.bed.gz
bedbase_BEDfiles/GSM2827350_ENCFF928JXU_peaks_GRCh38.bed.gz
bedb

## Generate statistics and plots of BED files using BEDSTAT


### Create a PEP describing the BED files to process

In order to get started, we'll need a PEP [Portable Encapsulated project](https://pepkit.github.io/). A PEP consists of 1) an annotation sheet (.csv) that contains information about the samples on a project and 2) a project config.yaml file that points to the sample annotation sheet. The config file also has other components, such as derived attributes, that in this case point to the BED files to be processed. The following is an example of a config file using the derived attributes `output_file_path` and `yaml_file` to point to the `.bed.gz` files and their respective metadata.

In [5]:
cat Configuration_files/bedbase_demo_PEPs/bedstat_config.yaml

metadata:
  sample_table: bedstat_annotation_sheet.csv
  output_dir: bedstat/bedstat_pipeline_logs 
  pipeline_interfaces: ../bedstat/pipeline_interface.yaml

constant_attributes: 
  output_file_path: "source"
  yaml_file: "source2"
  protocol: "bedstat"

derived_attributes: [output_file_path, yaml_file]
data_sources:
  source: "bedbase_BEDfiles/{file_name}" 
  source2: "bedstat/bedstat_pipeline_logs/submission/{sample_name}.yaml"


### Download bedstat and the Bedbase configuration manager (bbconf)

[bedstat](https://github.com/databio/bedstat) is a [pypiper](http://code.databio.org/pypiper/) pipeline that generates statistics and plots of BED files. Additionally, [bedstat](https://github.com/databio/bedstat) relies in
[bbconf](https://github.com/databio/bbconf), the `bedbase` configuration manager which implements convenience methods for interacting with an elasticsearch database, where our files metadata will be placed. For carrying out this demo, we'll be using the dev version of `bbconf` that can be downloaded as follows:

In [6]:
git clone git@github.com:databio/bedstat
pip install git+https://github.com/databio/bbconf.git@dev --user

# Install Python dependencies
pip install piper --user

# Install R dependencies
Rscript scripts/installRdeps.R

Cloning into 'bedstat'...
remote: Enumerating objects: 165, done.[K
remote: Counting objects: 100% (165/165), done.[K
remote: Compressing objects: 100% (92/92), done.[K
remote: Total 362 (delta 81), reused 106 (delta 43), pack-reused 197[K
Receiving objects: 100% (362/362), 57.94 KiB | 2.23 MiB/s, done.
Resolving deltas: 100% (155/155), done.
Cloning into 'bbconf'...
remote: Enumerating objects: 251, done.[K
remote: Counting objects: 100% (251/251), done.[K
remote: Compressing objects: 100% (178/178), done.[K
remote: Total 251 (delta 148), reused 154 (delta 61), pack-reused 0[K
Receiving objects: 100% (251/251), 42.52 KiB | 1.93 MiB/s, done.
Resolving deltas: 100% (148/148), done.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


We'll need to create a directory where we can store the stats and plots generated by `bedstat`. Additionally, we'll create a directory where we can store log and metadata files that we'll need later on.

In [7]:
mkdir bedstat/bedstat_output
mkdir bedstat/bedstat_pipeline_logs

In order to use bbconf, we'll need to create a minimal configuration.yaml file. The path to this configuration file can be stored in the environment variable `$BEDBASE`.

In [8]:
cat Configuration_files/bedbase_configuration.yaml

path:
  pipelines_output: $BEDBASEtutorial/bedstat/bedstat_output

database:
  host: localhost
  bed_index: bed_index
  bedset_index: bedset_index

server:
  host: 0.0.0.0
  port: 8000


### Inititiate a local elasticsearch cluster

In addition to generate statistics and plots, [bedstat](https://github.com/databio/bedstat) inserts JSON formatted metadata into an [elasticsearch](https://www.elastic.co/elasticsearch/?ultron=[EL]-[B]-[AMER]-US+CA-Exact&blade=adwords-s&Device=c&thor=elasticsearch&gclid=Cj0KCQjwjcfzBRCHARIsAO-1_Oq5mSdze16kripxT5_I__EeH9F-xUCz_khEvzGL7q_mqP62CahJ9SIaAg2BEALw_wcB) database that it'll later be used to search and extract files and information about them. (This step may have to be performed outside the notebook since these commands ask for a sudo password. 

In [9]:
# If docker is not already installed, you can do so with the following commands
#(make sure you have sudo permissions)

sudo apt-get update
sudo apt-get install docker-engine -y

# Create a persistent volume to house elastic search data
sudo docker volume create es-data

# Run the docker container for elasticsearch
sudo docker run -p 9200:9200 -p 9300:9300 -v es-data:/usr/share/elasticsearch/data -e "xpack.ml.enabled=false" \
  -e "discovery.type=single-node" elasticsearch:7.5.1

[sudo] password for jev4xy: 


### Run bedstat  on the demo PEP
To run [bedstat](https://github.com/databio/bedstat) and the other required pipelines in this demo, we will rely on the pipeline submission engine [looper](http://looper.databio.org/en/latest/),which can be installed in the following manner

In [None]:
pip install --user loopercli

In order to establish a modular connection between a project and a pipeline, we'll need to create a [pipeline interface](http://looper.databio.org/en/latest/linking-a-pipeline/) file, which tells looper how to run the pipeline. If `bedstat` is being run from an HPC environment where docker is not available, we recommend running the pipeline using the `--no-db-commit` flag (this will only calculate statistics and generate plots but will not insert this information into the local elasticsearch cluster. Once we have generated plots and statistics, we can insert them into our local elasticsearch cluster running `bedstat` with the `--just-db-commit` flag. If your data lives on a local environment, as it's the case in this tutorial, it's not necessary to set those flags and we can run bedstat in the following manner:

In [11]:
#looper run bedbase_demo_PEPs/bedstat_config.yaml --no-db-commit --compute local --limit 1 -R

looper run Configuration_files/bedbase_demo_PEPs/bedstat_config.yaml --bedbase-config Configuration_files/bedbase_configuration.yaml \
--no-db-commit --compute local -R

Command: run (Looper version: 0.12.4)
Reading sample table: '/home/jev4xy/Desktop/bedbase_tutorial/Configuration_files/bedbase_demo_PEPs/bedstat_annotation_sheet.csv'
Activating compute package 'local'
Finding pipelines for protocol(s): bedstat
Known protocols: bedstat
'/home/jev4xy/Desktop/bedbase_tutorial/Configuration_files/bedbase_demo_PEPs/../../bedstat/pipeline/bedstat.py' appears to attempt to run on import; does it lack a conditional on '__main__'? Using base type: Sample
[36m## [1 of 15] bedbase_demo_db1 (bedstat)[0m
Submission settings lack memory specification
Writing script to /home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_pipeline_logs/submission/bedstat_bedbase_demo_db1.sub
Job script (n=1; 0.00 Gb): bedstat/bedstat_pipeline_logs/submission/bedstat_bedbase_demo_db1.sub
Compute node: cphg-51ksmr2
Start time: 2020-03-30 12:23:45
### Pipeline run code and environment:

*              Command:  `/home/jev4xy/Desktop/bedbase_tutorial/Configuration_files/bedbase_demo_P

  - in 'y': chrCHR_HG107_PATCH, chrCHR_HG126_PATCH, chrCHR_HG1311_PATCH, chrCHR_HG1342_HG2282_PATCH, chrCHR_HG1362_PATCH, chrCHR_HG142_HG150_NOVEL_TEST, chrCHR_HG151_NOVEL_TEST, chrCHR_HG1832_PATCH, chrCHR_HG2021_PATCH, chrCHR_HG2023_PATCH, chrCHR_HG2030_PATCH, chrCHR_HG2058_PATCH, chrCHR_HG2063_PATCH, chrCHR_HG2066_PATCH, chrCHR_HG2072_PATCH, chrCHR_HG2095_PATCH, chrCHR_HG2104_PATCH, chrCHR_HG2116_PATCH, chrCHR_HG2191_PATCH, chrCHR_HG2213_PATCH, chrCHR_HG2217_PATCH, chrCHR_HG2232_PATCH, chrCHR_HG2233_PATCH, chrCHR_HG2235_PATCH, chrCHR_HG2239_PATCH, chrCHR_HG2247_PATCH, chrCHR_HG2288_HG2289_PATCH, chrCHR_HG2290_PATCH, chrCHR_HG2291_PATCH, chrCHR_HG2334_PATCH, chrCHR_HG26_PATCH, ch [... truncated]
4: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chrUn_GL000224v1, chr17_GL000205v2_random, chrUn_GL000219v1, chrUn_GL000195v1, chrUn_GL000218v1, chr22_KI270733v1_random, chr1_KI270706v1_random, chrUn_GL000220v1, chrUn_GL000216v2

2: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chrUn_GL000224v1, chrUn_KI270466v1, chrUn_KI270467v1
  - in 'y': chrCHR_HG107_PATCH, chrCHR_HG126_PATCH, chrCHR_HG1311_PATCH, chrCHR_HG1342_HG2282_PATCH, chrCHR_HG1362_PATCH, chrCHR_HG142_HG150_NOVEL_TEST, chrCHR_HG151_NOVEL_TEST, chrCHR_HG1832_PATCH, chrCHR_HG2021_PATCH, chrCHR_HG2023_PATCH, chrCHR_HG2030_PATCH, chrCHR_HG2058_PATCH, chrCHR_HG2063_PATCH, chrCHR_HG2066_PATCH, chrCHR_HG2072_PATCH, chrCHR_HG2095_PATCH, chrCHR_HG2104_PATCH, chrCHR_HG2116_PATCH, chrCHR_HG2191_PATCH, chrCHR_HG2213_PATCH, chrCHR_HG2217_PATCH, chrCHR_HG2232_PATCH, chrCHR_HG2233_PATCH, chrCHR_HG2235_PATCH, chrCHR_HG2239_PATCH, chrCHR_HG2247_PATCH, chrCHR_HG2288_HG2289_PATCH, chrCHR_HG2290_PATCH, chrCHR_HG2291_PATCH, chrCHR_HG2334_PATCH, chrCHR_HG26_PATCH, chrCHR_HG986_PATCH, chrCHR_HSCHR10_1_CTG1, chrCHR_HSCHR10_1_CTG2, chrCHR_HSCHR10_1_CTG4, chrCHR_HSCHR11_1_CTG1_2, chrCHR_HSCHR11_1_CTG5, chrCHR_HS

Finding overlaps...
Setting regionIDs...
jExpr: .N
Combining...
[1] "Plotting: /home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_output/a6a08126cb6f4b1953ba0ec8675df85a/GSE105977_ENCFF793SZW_conservative_idr_thresholded_peaks_GRCh38_chrombins"
Loading required namespace: BSgenome.Hsapiens.UCSC.hg38.masked
[1] "Plotting: /home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_output/a6a08126cb6f4b1953ba0ec8675df85a/GSE105977_ENCFF793SZW_conservative_idr_thresholded_peaks_GRCh38_gccontent"
promoterCore :	found 31
promoterProx :	found 59
exon :	found 156
intron :	found 1595
[1] "Plotting: /home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_output/a6a08126cb6f4b1953ba0ec8675df85a/GSE105977_ENCFF793SZW_conservative_idr_thresholded_peaks_GRCh38_partitions"
1: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chrUn_KI270587v1
  - in 'y': chrCHR_HG107_PATCH, chrCHR_HG126_PATCH, chrCHR_HG1311_PATCH, chrCHR_HG1342_HG2282_PATCH, chr

<pre>
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors

*      Pipeline branch:  * master
*        Pipeline date:  2020-03-18 10:30:43 -0400

### Arguments passed to pipeline:

*     `bedbase_config`:  `Configuration_files/bedbase_configuration.yaml`
*            `bedfile`:  `bedbase_BEDfiles/GSE91663_ENCFF316ASR_peaks_GRCh38.bed.gz`
*        `config_file`:  `bedstat.yaml`
*              `cores`:  `1`
*              `dirty`:  `False`
*       `force_follow`:  `False`
*    `genome_assembly`:  `hg38`
*              `input`:  `None`
*             `input2`:  `None`
*     `just_db_commit`:  `False`
*             `logdev`:  `False`
*                `mem`:  `4000`
*          `new_start`:  `False`
*       `no_db_commit`:  `True`
*      `output_parent`:  `bedstat/bedstat_pipeline_logs/results_pipeline`
*            `recover`:  `True`
*        `sample_name`:  `None`
*        `sample_yaml`:  `bedstat/bedstat_pipeline_logs/submission/bedbase_demo_db5.yaml`
*             `silent`:  `False`
*   `single_or_paired`:  `single`
*           `testmode`:  `False

*         Peak memory (this run):  0.4614 GB
*        Pipeline completed time: 2020-03-30 12:25:23
[36m## [6 of 15] bedbase_demo_db6 (bedstat)[0m
Submission settings lack memory specification
Writing script to /home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_pipeline_logs/submission/bedstat_bedbase_demo_db6.sub
Job script (n=1; 0.00 Gb): bedstat/bedstat_pipeline_logs/submission/bedstat_bedbase_demo_db6.sub
Compute node: cphg-51ksmr2
Start time: 2020-03-30 12:25:23
### Pipeline run code and environment:

*              Command:  `/home/jev4xy/Desktop/bedbase_tutorial/Configuration_files/bedbase_demo_PEPs/../../bedstat/pipeline/bedstat.py --bedfile bedbase_BEDfiles/GSE91663_ENCFF319TPR_conservative_idr_thresholded_peaks_GRCh38.bed.gz --genome hg38 --sample-yaml bedstat/bedstat_pipeline_logs/submission/bedbase_demo_db6.yaml -O bedstat/bedstat_pipeline_logs/results_pipeline --bedbase-config Configuration_files/bedbase_configuration.yaml --no-db-commit -R`
*         Compute host:  cp

4: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chr1_KI270713v1_random, chr1_KI270714v1_random, chr1_KI270706v1_random, chr17_GL000205v2_random, chrUn_KI270744v1
  - in 'y': chrCHR_HG107_PATCH, chrCHR_HG126_PATCH, chrCHR_HG1311_PATCH, chrCHR_HG1342_HG2282_PATCH, chrCHR_HG1362_PATCH, chrCHR_HG142_HG150_NOVEL_TEST, chrCHR_HG151_NOVEL_TEST, chrCHR_HG1832_PATCH, chrCHR_HG2021_PATCH, chrCHR_HG2023_PATCH, chrCHR_HG2030_PATCH, chrCHR_HG2058_PATCH, chrCHR_HG2063_PATCH, chrCHR_HG2066_PATCH, chrCHR_HG2072_PATCH, chrCHR_HG2095_PATCH, chrCHR_HG2104_PATCH, chrCHR_HG2116_PATCH, chrCHR_HG2191_PATCH, chrCHR_HG2213_PATCH, chrCHR_HG2217_PATCH, chrCHR_HG2232_PATCH, chrCHR_HG2233_PATCH, chrCHR_HG2235_PATCH, chrCHR_HG2239_PATCH, chrCHR_HG2247_PATCH, chrCHR_HG2288_HG2289_PATCH, chrCHR_HG2290_PATCH, chrCHR_HG2291_PATCH, chrCHR_HG2334_PATCH, chrCHR_HG26_PATCH, chrCHR_HG986_PATCH, chrCHR_HSCHR10_1_CTG1, chrCHR_HSCHR10_1_CTG2, chrCHR_HSCHR10_1_CT

  - in 'y': chrCHR_HG107_PATCH, chrCHR_HG126_PATCH, chrCHR_HG1311_PATCH, chrCHR_HG1342_HG2282_PATCH, chrCHR_HG1362_PATCH, chrCHR_HG142_HG150_NOVEL_TEST, chrCHR_HG151_NOVEL_TEST, chrCHR_HG1832_PATCH, chrCHR_HG2021_PATCH, chrCHR_HG2023_PATCH, chrCHR_HG2030_PATCH, chrCHR_HG2058_PATCH, chrCHR_HG2063_PATCH, chrCHR_HG2066_PATCH, chrCHR_HG2072_PATCH, chrCHR_HG2095_PATCH, chrCHR_HG2104_PATCH, chrCHR_HG2116_PATCH, chrCHR_HG2191_PATCH, chrCHR_HG2213_PATCH, chrCHR_HG2217_PATCH, chrCHR_HG2232_PATCH, chrCHR_HG2233_PATCH, chrCHR_HG2235_PATCH, chrCHR_HG2239_PATCH, chrCHR_HG2247_PATCH, chrCHR_HG2288_HG2289_PATCH, chrCHR_HG2290_PATCH, chrCHR_HG2291_PATCH, chrCHR_HG2334_PATCH, chrCHR_HG26_PATCH, chrCHR_HG986_PATCH, chrCHR_HSCHR10_1_CTG1, chrCHR_HSCHR10_ [... truncated]
3: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chrUn_GL000219v1, chr1_KI270711v1_random, chrUn_KI270744v1, chr1_KI270714v1_random, chr1_KI270713v1_random, chrUn_KI270742v1

promoterCore :	found 6459
promoterProx :	found 13322
exon :	found 29119
intron :	found 129565
[1] "Plotting: /home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_output/02fd518818560c28ed20ed98f4c291bd/GSM2423312_ENCFF155HVK_peaks_GRCh38_partitions"
1: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chr1_KI270713v1_random, chr1_KI270714v1_random, chr17_GL000205v2_random, chrUn_GL000219v1, chrUn_KI270742v1, chrUn_KI270744v1, chr1_KI270711v1_random, chr1_KI270706v1_random, chr22_KI270731v1_random, chrUn_GL000195v1, chr14_GL000194v1_random, chrUn_KI270442v1, chr17_KI270729v1_random, chr1_KI270707v1_random, chr22_KI270736v1_random, chr1_KI270709v1_random, chr22_KI270733v1_random, chr4_GL000008v2_random, chr16_KI270728v1_random, chr9_KI270719v1_random, chr22_KI270732v1_random, chr14_GL000009v2_random, chrUn_KI270745v1, chr14_GL000225v1_random, chrUn_KI270330v1, chrUn_GL000220v1, chr22_KI270737v1_random, chrUn_KI270751v1, chrUn_

    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
[1] "Plotting: /home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_output/33d4328fe4ff3a472edff81bf8f5d566/GSM2423313_ENCFF722AOG_

*              `cores`:  `1`
*              `dirty`:  `False`
*       `force_follow`:  `False`
*    `genome_assembly`:  `hg38`
*              `input`:  `None`
*             `input2`:  `None`
*     `just_db_commit`:  `False`
*             `logdev`:  `False`
*                `mem`:  `4000`
*          `new_start`:  `False`
*       `no_db_commit`:  `True`
*      `output_parent`:  `bedstat/bedstat_pipeline_logs/results_pipeline`
*            `recover`:  `True`
*        `sample_name`:  `None`
*        `sample_yaml`:  `bedstat/bedstat_pipeline_logs/submission/bedbase_demo_db10.yaml`
*             `silent`:  `False`
*   `single_or_paired`:  `single`
*           `testmode`:  `False`
*          `verbosity`:  `None`

----------------------------------------

Target to produce: `/home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_output/2ffb2cedd14f5f1fae7cb765a66d82a3/GSM2827349_ENCFF196DNQ_peaks_GRCh38.json`  

> `Rscript /home/jev4xy/Desktop/bedbase_tutorial/bedstat/tools/regionstat.R --bedfi

Job script (n=1; 0.00 Gb): bedstat/bedstat_pipeline_logs/submission/bedstat_bedbase_demo_db11.sub
Compute node: cphg-51ksmr2
Start time: 2020-03-30 12:27:13
### Pipeline run code and environment:

*              Command:  `/home/jev4xy/Desktop/bedbase_tutorial/Configuration_files/bedbase_demo_PEPs/../../bedstat/pipeline/bedstat.py --bedfile bedbase_BEDfiles/GSM2827350_ENCFF928JXU_peaks_GRCh38.bed.gz --genome hg38 --sample-yaml bedstat/bedstat_pipeline_logs/submission/bedbase_demo_db11.yaml -O bedstat/bedstat_pipeline_logs/results_pipeline --bedbase-config Configuration_files/bedbase_configuration.yaml --no-db-commit -R`
*         Compute host:  cphg-51ksmr2
*          Working dir:  /home/jev4xy/Desktop/bedbase_tutorial
*            Outfolder:  /home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_output/3e67ac88348d8b816a8ca50ab94eeade/
*  Pipeline started at:   (03-30 12:27:13) elapsed: 0.0 _TIME_

### Version log:

*       Python version:  3.6.8
*          Pypiper dir:  `/home/jev4xy

    self._triage_error(SubprocessError(msg), nofail)
  File "/home/jev4xy/.local/lib/python3.6/site-packages/pypiper/manager.py", line 2131, in _triage_error
    self.fail_pipeline(e)
  File "/home/jev4xy/.local/lib/python3.6/site-packages/pypiper/manager.py", line 1660, in fail_pipeline
    raise exc
pypiper.exceptions.SubprocessError: Subprocess returned nonzero result. Check above output for details
[36m## [12 of 15] bedbase_demo_db12 (bedstat)[0m
Submission settings lack memory specification
Writing script to /home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_pipeline_logs/submission/bedstat_bedbase_demo_db12.sub
Job script (n=1; 0.00 Gb): bedstat/bedstat_pipeline_logs/submission/bedstat_bedbase_demo_db12.sub
Compute node: cphg-51ksmr2
Start time: 2020-03-30 12:27:30
### Pipeline run code and environment:

*              Command:  `/home/jev4xy/Desktop/bedbase_tutorial/Configuration_files/bedbase_demo_PEPs/../../bedstat/pipeline/bedstat.py --bedfile bedbase_BEDfiles/GSE105587_

*        `sample_name`:  `None`
*        `sample_yaml`:  `bedstat/bedstat_pipeline_logs/submission/bedbase_demo_db13.yaml`
*             `silent`:  `False`
*   `single_or_paired`:  `single`
*           `testmode`:  `False`
*          `verbosity`:  `None`

----------------------------------------

Target to produce: `/home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_output/9dc6f420639e0a265f3f179b6b42713a/GSE105587_ENCFF809OOE_conservative_idr_thresholded_peaks_hg19.json`  

> `Rscript /home/jev4xy/Desktop/bedbase_tutorial/bedstat/tools/regionstat.R --bedfile=bedbase_BEDfiles/GSE105587_ENCFF809OOE_conservative_idr_thresholded_peaks_hg19.bed.gz --fileId=GSE105587_ENCFF809OOE_conservative_idr_thresholded_peaks_hg19 --outputfolder=/home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_output/9dc6f420639e0a265f3f179b6b42713a --genome=hg19 --digest=9dc6f420639e0a265f3f179b6b42713a` (31518)
<pre>
Loading required package: GenomicRanges
Loading required package: stats4
Loading required pack

Loading required namespace: BSgenome.Hsapiens.UCSC.hg19.masked
[1] "Plotting: /home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_output/e577eb947b5c791b30df969f0564324b/GSE105977_ENCFF449EZT_optimal_idr_thresholded_peaks_hg19_gccontent"
promoterCore :	found 98
promoterProx :	found 184
exon :	found 314
intron :	found 3012
[1] "Plotting: /home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_output/e577eb947b5c791b30df969f0564324b/GSE105977_ENCFF449EZT_optimal_idr_thresholded_peaks_hg19_partitions"
</pre>
Command completed. Elapsed time: 0:00:10. Running peak memory: 0.378GB.  
  PID: 31572;	Command: Rscript;	Return code: 0;	Memory used: 0.378GB


### Pipeline completed. Epilogue
*        Elapsed time (this run):  0:00:10
*  Total elapsed time (all runs):  0:00:10
*         Peak memory (this run):  0.3778 GB
*        Pipeline completed time: 2020-03-30 12:28:19
[36m## [15 of 15] bedbase_demo_db15 (bedstat)[0m
Submission settings lack memory specification
Writing script to /home/jev4x

In [12]:
#looper run bedbase_demo_PEPs/bedstat_config.yaml  --just-db-commit --compute local -R

looper run Configuration_files/bedbase_demo_PEPs/bedstat_config.yaml --bedbase-config Configuration_files/bedbase_configuration.yaml \
--just-db-commit --compute local -R

Command: run (Looper version: 0.12.4)
Reading sample table: '/home/jev4xy/Desktop/bedbase_tutorial/Configuration_files/bedbase_demo_PEPs/bedstat_annotation_sheet.csv'
Activating compute package 'local'
Finding pipelines for protocol(s): bedstat
Known protocols: bedstat
'/home/jev4xy/Desktop/bedbase_tutorial/Configuration_files/bedbase_demo_PEPs/../../bedstat/pipeline/bedstat.py' appears to attempt to run on import; does it lack a conditional on '__main__'? Using base type: Sample
[36m## [1 of 15] bedbase_demo_db1 (bedstat)[0m
Submission settings lack memory specification
Writing script to /home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_pipeline_logs/submission/bedstat_bedbase_demo_db1.sub
Job script (n=1; 0.00 Gb): bedstat/bedstat_pipeline_logs/submission/bedstat_bedbase_demo_db1.sub
Compute node: cphg-51ksmr2
Start time: 2020-03-30 12:30:18
Established connection with Elasticsearch: localhost
'id' metadata not available
'md5sum' metadata not available
'plots' metadata not avai

[36m## [6 of 15] bedbase_demo_db6 (bedstat)[0m
Submission settings lack memory specification
Writing script to /home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_pipeline_logs/submission/bedstat_bedbase_demo_db6.sub
Job script (n=1; 0.00 Gb): bedstat/bedstat_pipeline_logs/submission/bedstat_bedbase_demo_db6.sub
Compute node: cphg-51ksmr2
Start time: 2020-03-30 12:30:20
Established connection with Elasticsearch: localhost
'id' metadata not available
'md5sum' metadata not available
'plots' metadata not available
'bedfile_path' metadata not available
Data: {'id': ['GSE91663_ENCFF319TPR_conservative_idr_thresholded_peaks_GRCh38'], 'gc_content': [0.507], 'regions_no': [17110], 'mean_absolute_TSS_dist': [51414986.6069], 'md5sum': ['9cd65cf4f07b83af35770c4a098fd4c6'], 'plots': [{'name': 'tssdist', 'caption': 'Region-TSS distance distribution'}, {'name': 'chrombins', 'caption': 'Regions distribution over chromosomes'}, {'name': 'gccontent', 'caption': 'GC content'}, {'name': 'partitions',

[36m## [11 of 15] bedbase_demo_db11 (bedstat)[0m
Submission settings lack memory specification
Writing script to /home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_pipeline_logs/submission/bedstat_bedbase_demo_db11.sub
Job script (n=1; 0.00 Gb): bedstat/bedstat_pipeline_logs/submission/bedstat_bedbase_demo_db11.sub
Compute node: cphg-51ksmr2
Start time: 2020-03-30 12:30:22
Established connection with Elasticsearch: localhost
Traceback (most recent call last):
  File "/home/jev4xy/Desktop/bedbase_tutorial/Configuration_files/bedbase_demo_PEPs/../../bedstat/pipeline/bedstat.py", line 59, in <module>
    with open(json_file_path, 'r', encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_output/3e67ac88348d8b816a8ca50ab94eeade/GSM2827350_ENCFF928JXU_peaks_GRCh38.json'
[36m## [12 of 15] bedbase_demo_db12 (bedstat)[0m
Submission settings lack memory specification
Writing script to /home/jev4xy/Desktop/be

After the previous steps have been executed, our BED files should be available for query on our local elastic search cluster. Files can be queried using the `bedbuncher` pipeline described in the below section. 


## Create bedsets using BEDBUNCHER

### Create a new PEP describing the bedset name and specific JSON query  
[bedbuncher](https://github.com/databio/bedbuncher) is a pipeline designed to create bedsets (sets of BED files retrieved from bedbase), with their respective statistics and additional outputs such as a `PEP` and an `iGD` database. In order to run `bedbuncher`, we will need to design an additional PEP describing the query as well as attributes such as the name assigned to the newly created bedset. This configuration file should point to the `JSON` file describing the query to find files of interest. The configuration file should have the following structure:

In [13]:
cat Configuration_files/bedbase_demo_PEPs/bedbuncher_query.csv

sample_name,bedset_name,JSONquery_name,bbconfig_name,JSONquery_path,output_folder_path
bedset1,bedbase_demo_bedset,test_query,bedbase_configuration,source1,source2


In [14]:
cat Configuration/bedbase_demo_PEPs/bedbuncher_config.yaml

cat: Configuration/bedbase_demo_PEPs/bedbuncher_config.yaml: No such file or directory


: 1

###  Download bedbuncher  and install CML dependencies

To download the `bedbuncher` pipeline, simply clone the repository from github. Though not required, we'll also create a directory where we can store the pipeline logs. 

In [15]:
git clone git@github.com:databio/bedbuncher
mkdir bedbuncher/bedbuncher_pipeline_logs

Cloning into 'bedbuncher'...
remote: Enumerating objects: 58, done.[K
remote: Counting objects: 100% (58/58), done.[K
remote: Compressing objects: 100% (41/41), done.[K
remote: Total 254 (delta 32), reused 38 (delta 15), pack-reused 196[K
Receiving objects: 100% (254/254), 57.71 KiB | 1.86 MiB/s, done.
Resolving deltas: 100% (140/140), done.


One of the feats of `bedbuncher` includes [iGD](https://github.com/databio/iGD) database creation from the files in the bedset. `iGD` can be installed by cloning the repository from github, executing the make file to create the binary, and pointing the binary location with the `$PATH` environment variable. 

In [16]:
git clone git@github.com:databio/iGD
cd iGD
make
cd ..

#Add iGD bin to PATH (might have to do this before starting the tutorial) Something like 
export PATH=$BEDBASEtutorial/iGD/bin/:$PATH

Cloning into 'iGD'...
remote: Enumerating objects: 634, done.[K
remote: Counting objects: 100% (634/634), done.[K
remote: Compressing objects: 100% (312/312), done.[K
remote: Total 1001 (delta 323), reused 626 (delta 320), pack-reused 367[K
Receiving objects: 100% (1001/1001), 854.44 KiB | 6.83 MiB/s, done.
Resolving deltas: 100% (619/619), done.
mkdir -p obj
mkdir -p bin
cc -c -g -O2 -lz -lm src/igd_base.c -o obj/igd_base.o 
[01m[Ksrc/igd_base.c:[m[K In function ‘[01m[Kget_fileinfo[m[K’:
     [01;35m[Kfgets(buf, 1024, fp)[m[K;//head line
     [01;35m[K^~~~~~~~~~~~~~~~~~~~[m[K
     [01;35m[Kfgets(buf, 1024, fp)[m[K;   //header
     [01;35m[K^~~~~~~~~~~~~~~~~~~~[m[K
[01m[Ksrc/igd_base.c:[m[K In function ‘[01m[Kget_igdinfo[m[K’:
     [01;35m[Kfread(&iGD->nbp, sizeof(int32_t), 1, fp)[m[K;
     [01;35m[K^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[m[K
     [01;35m[Kfread(&iGD->gType, sizeof(int32_t), 1, fp)[m[K;
     [01;35m[K^~~~~~~~~~~~~~~~~~

     [01;35m[K^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[m[K
[01m[Ksrc/igd_search.c:[m[K In function ‘[01m[KgetMap_v[m[K’:
     [01;35m[Kfread(gData, sizeof(gdata_t)*tmpi, 1, fP)[m[K;
     [01;35m[K^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[m[K
cc -c -g -O2 -lz -lm src/igd.c -o obj/igd.o 
cc -o bin/igd obj/igd_base.o obj/igd_create.o obj/igd_search.o obj/igd.o -g -O2 -lz -lm


### Run bedbuncher using Looper 

Once we have cloned the `bedbuncher` repository, set our local elasticsearch cluster and created the `iGD` binary, we can run `bedbuncher` passing the location of the `bedbase` configuration file to the argument `--bedbase-config`. Note: if the path to the `bedbase` configration file has been stored in the `$BEDBASE` environment variable, it's not neccesary to pass the `--bedbase-config` argument. 

In [17]:
looper run  Configuration_files/bedbase_demo_PEPs/bedbuncher_config.yaml  --bedbase-config Configuration_files/bedbase_configuration.yaml \
--compute local -R

Command: run (Looper version: 0.12.4)
Reading sample table: '/home/jev4xy/Desktop/bedbase_tutorial/Configuration_files/bedbase_demo_PEPs/bedbuncher_query.csv'
Activating compute package 'local'
Finding pipelines for protocol(s): bedbuncher
Known protocols: bedbuncher
'/home/jev4xy/Desktop/bedbase_tutorial/Configuration_files/bedbase_demo_PEPs/../../bedbuncher/bedbuncher.py' appears to attempt to run on import; does it lack a conditional on '__main__'? Using base type: Sample
[36m## [1 of 1] bedset1 (bedbuncher)[0m
> Note (missing optional attribute): 'bedbuncher' requests sample attribute 'bbconfig_path' for option '--bedbase-config'
Writing script to /home/jev4xy/Desktop/bedbuncher/bedbuncher_pipeline_logs/submission/bedbuncher_bedset1.sub
Job script (n=1; 0.00 Gb): ../bedbuncher/bedbuncher_pipeline_logs/submission/bedbuncher_bedset1.sub
Compute node: cphg-51ksmr2
Start time: 2020-03-30 12:36:29
### Pipeline run code and environment:

*              Command:  `/home/jev4xy/Desktop/b

Creating PEP TAR archive: bedbase_demo_bedset_PEP.tar.gz
/home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_output/4b67b56dcbc2e13d161be7f8cf52d68b/bedbase_demo_bedset_PEP
/home/jev4xy/Desktop/bedbase_tutorial/bedstat/bedstat_output/4b67b56dcbc2e13d161be7f8cf52d68b/bedbase_demo_bedset_PEP/bedbase_demo_bedset_annotation_sheet.csv
{'GSE105587_ENCFF018NNF_conservative_idr_thresholded_peaks_GRCh38': '78c0e4753d04b238fc07e4ebe5a02984', 'GSE105977_ENCFF617QGK_optimal_idr_thresholded_peaks_GRCh38': 'fdd94ac0787599d564b07193e4ec41fd', 'GSE105977_ENCFF793SZW_conservative_idr_thresholded_peaks_GRCh38': 'a6a08126cb6f4b1953ba0ec8675df85a', 'GSE105977_ENCFF937CGY_peaks_GRCh38': 'a78493a2b314afe9f6635c4883f0d44b', 'GSE91663_ENCFF316ASR_peaks_GRCh38': '50e19bd44174bb286aa28ae2a15e7b8f', 'GSE91663_ENCFF319TPR_conservative_idr_thresholded_peaks_GRCh38': '9cd65cf4f07b83af35770c4a098fd4c6', 'GSE91663_ENCFF553KIK_optimal_idr_thresholded_peaks_GRCh38': 'a5af5857bfbc3bfc8fea09cb90e67a16', 'GSM2423312_ENC

##  Run local instance of the bedhost API

The last part of the tutorial consists on running a local instance of [bedhost](https://github.com/databio/bedhost/tree/master) (a REST API for bedstat and bedbuncher produced outputs) in order to explore plots, statistics and download pipeline outputs. To run `bedhost`, we can clone the github repository and pip install the package as follows:

In [18]:
git clone git@github.com:databio/bedhost
pip install bedhost/. --user

Cloning into 'bedhost'...
remote: Enumerating objects: 140, done.[K
remote: Counting objects: 100% (140/140), done.[K
remote: Compressing objects: 100% (93/93), done.[K
remote: Total 651 (delta 93), reused 90 (delta 45), pack-reused 511[K
Receiving objects: 100% (651/651), 214.07 KiB | 2.93 MiB/s, done.
Resolving deltas: 100% (426/426), done.
Processing ./bedhost
Building wheels for collected packages: bedhost
  Building wheel for bedhost (setup.py) ... [?25ldone
[?25h  Created wheel for bedhost: filename=bedhost-0.0.1-cp36-none-any.whl size=59901 sha256=1f4bad3ac3dc8656c097feea1f754cb601d7a9d7cda5ee590ba7a41a50d23469
  Stored in directory: /tmp/pip-ephem-wheel-cache-rnta2z2o/wheels/0d/13/b6/f9f990b04e991dfbb802fbdb6628b11149fedfb88a6916dfe0
Successfully built bedhost
Installing collected packages: bedhost
  Found existing installation: bedhost 0.0.1
    Uninstalling bedhost-0.0.1:
      Successfully uninstalled bedhost-0.0.1
Successfully installed bedhost-0.0.1
You should consid

To start bedhost, we simply need to run the following commands passing the location of the `bedbase` config file to the `-c` flag.  

In [None]:
bedhost serve -c  $BEDBASEtutorial/Configuration_files/bedbase_configuration.yaml


DEBU 2020-03-30 12:38:54,338 | bedhost:est:263 > Configured logger 'bedhost' using logmuse v0.2.5 
DEBU 12:38:54 | bbconf:est:263 > Configured logger 'bbconf' using logmuse v0.2.5 
INFO 12:38:54 | bbconf:bbconf:58 > Established connection with Elasticsearch: localhost 
DEBU 12:38:54 | bbconf:bbconf:59 > Elasticsearch info:
{'name': '1ec537ca3e87', 'cluster_name': 'docker-cluster', 'cluster_uuid': 'PamppPmESrKNFL1hqlo6gA', 'version': {'number': '7.5.1', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '3ae9ac9a93c95bd0cdc054951cf95d88e1e18d96', 'build_date': '2019-12-16T22:57:37.835892Z', 'build_snapshot': False, 'lucene_version': '8.3.0', 'minimum_wire_compatibility_version': '6.8.0', 'minimum_index_compatibility_version': '6.0.0-beta1'}, 'tagline': 'You Know, for Search'} 
INFO 2020-03-30 12:38:54,344 | bedhost:main:254 > running bedhost app 
[32mINFO[0m:     Started server process [[36m32265[0m]
[32mINFO[0m:     Waiting for application startup.
[32mINFO[0m:   

DEBU 12:42:08 | bbconf:bbconf:85 > Searching index: bedsets
Query: {'match_all': {}} 
[32mINFO[0m:     HEAD http://localhost:9200/bedsets [status:200 request:0.002s]
[32mINFO[0m:     GET http://localhost:9200/_cat/count/bedsets?format=json [status:200 request:0.002s]
[32mINFO[0m:     GET http://localhost:9200/bedsets/_search?size=1 [status:200 request:0.003s]
[32mINFO[0m:     127.0.0.1:39154 - "[1mGET / HTTP/1.1[0m" [32m200 OK[0m
[32mINFO[0m:     127.0.0.1:39154 - "[1mGET /serve_rules HTTP/1.1[0m" [32m200 OK[0m
DEBU 2020-03-30 12:42:08,397 | bedhost:main:186 > Received query: {'elastic': {'current': None}} 
DEBU 2020-03-30 12:42:08,397 | bedhost:main:190 > Serving current result 
[32mINFO[0m:     127.0.0.1:39156 - "[1mPOST /bedfiles_filter_result?html=True HTTP/1.1[0m" [32m200 OK[0m
[32mINFO[0m:     HEAD http://localhost:9200/bedsets [status:200 request:0.004s]
DEBU 12:42:12 | bbconf:bbconf:85 > Searching index: bedsets
Query: {'match': {'md5sum': '4b67b56dcbc

If we have stored the path to the bedbase config in the environment variable `$BEDBASE` (suggested), it's not neccesary to pass the `-c` flag. 

In [None]:
bedhost serve 

The `bedhost` API can be opened in the url [http://0.0.0.0:8000](http://0.0.0.0:8000). We can now explore the plots and statistics generated by the `bedstat` and `bedbuncher` pipelines.