# OSDF Examples

What if we didn't want to download the data and the software container at the start? Maybe: 

- the data is big and we don't want to store it locally
- we're part of a project and want everyone to be using the same central container

It turns out, we can just fetch this data directly from OSDF! 

In [9]:
cd ~/tutorial-fastqc

/Users/ckoch/OSG/tutorial-fastqc/alt-submit


## Exploring the Data

The data and a sif file are hosted at this OSDF "bucket"

In [1]:
OSDF_LOCATION="osdf:///ospool/uc-shared/public/osg-training/tutorial-fastqc"

We can use the pelican client to view what files are available: 

In [2]:
pelican object ls ${OSDF_LOCATION}

bash: pelican: command not found


: 127

In [None]:
pelican object ls ${OSDF_LOCATION}/data

In [None]:
pelican object ls ${OSDF_LOCATION}/sif

We could use `pelican object get` to fetch any of the objects to explore them locally, but instead, let's use them in jobs. 

## Using Objects from OSDF in Jobs

Using the data and .sif file from OSDF is as simple as adding the OSDF URL to the submit file as shown here: 

In [6]:
cat alt-submit/osdf-fastqc.submit

# HTCondor Submit File: osdf-fastqc.submit
# Submit a fastqc job using data/container from OSDF

# reference data from OSDF instead of local copies
OSDF_LOCATION=osdf:///ospool/uc-shared/public/osg-training/tutorial-fastqc

# Provide our executable and arguments
executable = fastqc.sh
arguments = SRR2584863_1.trim.sub.fastq

# Provide the container for our software
universe    = container
container_image = $(OSDF_LOCATION)/sif/fastqc.sif

# List files that need to be transferred to the job
transfer_input_files = $(OSDF_LOCATION)/data/SRR2584863_1.trim.sub.fastq
should_transfer_files = YES

# Tell HTCondor to transfer output to our /results directory
transfer_output_files = SRR2584863_1.trim.sub_fastqc.html
transfer_output_remaps = "SRR2584863_1.trim.sub_fastqc.html = results/SRR2584863_1.trim.sub_fastqc.html"

# Track job information
log = logs/SRR2584863_1.fastqc.log
output = logs/SRR2584863_1.fastqc.out
error = logs/SRR2584863_1.fastqc.err

# Resource Requests
request_cpus = 1
reques

Note: instead of writing out the whole URL wherever we need it, we're using an intermediate variable, `$(OSDF_LOCATION)`

In [7]:
condor_submit alt-submit/osdf-fastqc.submit

/Users/ckoch/OSG/tutorial-fastqc/alt-submit


In [None]:
condor_q

## Multiple Jobs

To run multiple jobs, we could use Pelican to generate the list of samples: 

In [None]:
pelican object ls ${OSDF_LOCATION}/data >> alt-submit/samples.txt

And then edit the submit file: 
* Change the queue statement to iterate through the list of samples: 
    
    `queue sample from alt-submit/samples.txt`
* replace all references to a specific sample file with the variable from the queue statement
    
    `transfer_input_files = $(OSDF_LOCATION)/data/$(sample).fastq`