# Running Genomics on bacalhau


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/bacalhau-project/examples/blob/main/Genomics/BIDS/index.ipynb)
[![Open In Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/bacalhau-project/examples/HEAD?labpath=miscellaneous/Genomics/index.ipynb)

# Introduction

Kipoi (pronounce: kípi; from the Greek κήποι: gardens) is an API and a repository of ready-to-use trained models for genomics. It currently contains 2201 different models, covering canonical predictive tasks in transcriptional and post-transcriptional gene regulation. Kipoi's API is implemented as a python package (github.com/kipoi/kipoi) and it is also accessible from the command line 

**Setting Up Docker**

In this step you will create a  `Dockerfile` to create your Docker deployment. The `Dockerfile` is a text document that contains the commands used to assemble the image.

First, create the `Dockerfile`.

Next, add your desired configuration to the `Dockerfile`. These commands specify how the image will be built, and what extra requirements will be included.

Dockerfile


```
FROM kipoi/kipoi-veff2:py37

RUN kipoi_veff2_predict ./examples/input/test.vcf ./examples/input/test.fa ./output.tsv -m "DeepSEA/predict" -s "diff" -s "logit"
```


We will use the python:3.8 docker image and build the container along with a sample test command to download the models and weights since bacalhau doesn’t support networking downloading models and weights isn’t possible 

Build the container


```
docker build -t <hub-user>/<repo-name>:<tag> .
```


Please replace

&lt;hub-user> with your docker hub username, If you don’t have a docker hub account [Follow these instructions to create docker account](https://docs.docker.com/docker-id/), and use the username of the account you created

&lt;repo-name> This is the name of the container, you can name it anything you want

&lt;tag> This is not required but you can use the latest tag

After you have build the container, the next step is to test it locally and then push it docker hub

Now you can push this repository to the registry designated by its name or tag.


```
 docker push <hub-user>/<repo-name>:<tag>
```


After the repo image has been pushed to docker hub, we can now use the container for running on bacalhau

Running the container on bacalhau


```
bacalhau docker run \
jsacex/kipoi-veff2:py37 \
-- kipoi_veff2_predict ./examples/input/test.vcf ./examples/input/test.fa ../outputs/output.tsv -m "DeepSEA/predict" -s "diff" -s "logit"
```

Insalling bacalhau

In [None]:
%%bash
curl -sL https://get.bacalhau.org/install.sh | bash

Your system is linux_amd64
No BACALHAU detected. Installing fresh BACALHAU CLI...
Getting the latest BACALHAU CLI...
Installing v0.2.5 BACALHAU CLI...
Downloading https://github.com/filecoin-project/bacalhau/releases/download/v0.2.5/bacalhau_v0.2.5_linux_amd64.tar.gz ...
Downloading sig file https://github.com/filecoin-project/bacalhau/releases/download/v0.2.5/bacalhau_v0.2.5_linux_amd64.tar.gz.signature.sha256 ...
Verified OK
Extracting tarball ...
NOT verifying Bin
bacalhau installed into /usr/local/bin successfully.
Client Version: v0.2.5
Server Version: v0.2.5


In [None]:
%%bash
echo $(bacalhau docker run --id-only --wait --wait-timeout-secs 1000 jsacex/kipoi-veff2:py37 -- kipoi_veff2_predict ./examples/input/test.vcf ./examples/input/test.fa ../outputs/output.tsv -m "DeepSEA/predict" -s "diff" -s "logit") > job_id.txt
cat job_id.txt

cf10a68c-9fb7-41fa-991b-a736cbf6277f



Running the commands will output a UUID (like `54506541-4eb9-45f4-a0b1-ea0aecd34b3e`). This is the ID of the job that was created. You can check the status of the job with the following command:


In [None]:
%%bash
bacalhau list --id-filter $(cat job_id.txt)

[92;100m CREATED  [0m[92;100m ID       [0m[92;100m JOB                     [0m[92;100m STATE     [0m[92;100m VERIFIED [0m[92;100m PUBLISHED               [0m
[97;40m 11:01:31 [0m[97;40m cf10a68c [0m[97;40m Docker jsacex/kipoi-... [0m[97;40m Completed [0m[97;40m          [0m[97;40m /ipfs/QmU3EV213QSHeK... [0m



Where it says "`Published `", that means the job is done, and we can get the results.

To find out more information about your job, run the following command:

In [None]:
%%bash
bacalhau describe $(cat job_id.txt)

JobAPIVersion: ""
ID: cf10a68c-9fb7-41fa-991b-a736cbf6277f
RequesterNodeID: QmXaXu9N5GNetatsvwnTfQqNtSeKAD6uCmarbh3LMRYAcF
ClientID: e240d70997da88352a83933a08156dab66fdabb04f55e8b94f78fc81e5347c54
Spec:
    Engine: 2
    Verifier: 1
    Publisher: 4
    Docker:
        Image: jsacex/kipoi-veff2:py37
        Entrypoint:
            - kipoi_veff2_predict
            - ./examples/input/test.vcf
            - ./examples/input/test.fa
            - ../outputs/output.tsv
            - -m
            - DeepSEA/predict
            - -s
            - diff
            - -s
            - logit
    outputs:
        - Engine: 1
          Name: outputs
          path: /outputs
    Sharding:
        BatchSize: 1
        GlobPatternBasePath: /inputs
Deal:
    Concurrency: 1
CreatedAt: 2022-10-02T11:01:31.607336716Z
JobState:
    Nodes:
        QmYgxZiySj3MRkwLSL4X2MF5F9f2PMhAE3LV49XkfNL1o3:
            Shards:
                0:
                    NodeId: QmYgxZiySj3MRkwLSL4X2MF5F9f2PMhAE3LV49XkfNL1

Since there is no error we can’t see any error instead we see the state of our job to be complete, that means 
we can download the results!
we create a temporary directory to save our results

In [None]:
%%bash
mkdir results

To Download the results of your job, run 

---

the following command:

In [None]:
%%bash
bacalhau get  $(cat job_id.txt)  --output-dir results

[90m11:03:34.094 |[0m [32mINF[0m [1mbacalhau/get.go:67[0m[36m >[0m Fetching results of job 'cf10a68c-9fb7-41fa-991b-a736cbf6277f'...
2022/10/02 11:03:35 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details.
[90m11:03:45.277 |[0m [32mINF[0m [1mipfs/downloader.go:115[0m[36m >[0m Found 1 result shards, downloading to temporary folder.
[90m11:09:55.538 |[0m [32mINF[0m [1mipfs/downloader.go:195[0m[36m >[0m Combining shard from output volume 'outputs' to final location: '/content/results'


After the download has finished you should 
see the following contents in results directory

In [None]:
%%bash
ls results/

shards	stderr	stdout	volumes


#VIEWING THE OUTPUTS

In [None]:
%%bash
cat results/volumes/outputs/output.tsv | head -n 10

#CHROM	POS	ID	REF	ALT	DeepSEA/predict/8988T_DNase_None/diff	DeepSEA/predict/AoSMC_DNase_None/diff	DeepSEA/predict/Chorion_DNase_None/diff	DeepSEA/predict/CLL_DNase_None/diff	DeepSEA/predict/Fibrobl_DNase_None/diff	DeepSEA/predict/FibroP_DNase_None/diff	DeepSEA/predict/Gliobla_DNase_None/diff	DeepSEA/predict/GM12891_DNase_None/diff	DeepSEA/predict/GM12892_DNase_None/diff	DeepSEA/predict/GM18507_DNase_None/diff	DeepSEA/predict/GM19238_DNase_None/diff	DeepSEA/predict/GM19239_DNase_None/diff	DeepSEA/predict/GM19240_DNase_None/diff	DeepSEA/predict/H9ES_DNase_None/diff	DeepSEA/predict/HeLa-S3_DNase_IFNa4h/diff	DeepSEA/predict/Hepatocytes_DNase_None/diff	DeepSEA/predict/HPDE6-E6E7_DNase_None/diff	DeepSEA/predict/HSMM_emb_DNase_None/diff	DeepSEA/predict/HTR8svn_DNase_None/diff	DeepSEA/predict/Huh-7.5_DNase_None/diff	DeepSEA/predict/Huh-7_DNase_None/diff	DeepSEA/predict/iPS_DNase_None/diff	DeepSEA/predict/Ishikawa_DNase_Estradiol_100nM_1hr/diff	DeepSEA/predict/Ishikawa_DNase_4OHTAM_20nM_72hr/di

In [None]:
%%bash
bacalhau describe $(cat job_id.txt) --spec > job.yaml

In [None]:
%%bash
cat job.yaml