In [None]:
WORKDIR=$HOME/pearc24-pelican-jobs
echo $WORKDIR

## Recap: Fetching Data with Pelican

This set of commands downloads a test data file (in a sequence data format) from the Open Science Data Federation. 

In [None]:
cd $WORKDIR/data

OSDF=pelican://osg-htc.org
OBJ_PATH=ospool/uc-shared/public/osg-training/tutorial-fastqc/test.fastq

In [None]:
pelican object get $OSDF/$OBJ_PATH test.fastq

The following command should display the beginning of a genomic sequence file: 

In [None]:
head test.fastq

## Sample Job Submission

In [None]:
cd $WORKDIR/sample

Look at the contents of the HTCondor job submit file below. There should be some familiar elements (resource requests, where to save stdout/stderr/log files, what commands to run) and some potentially new elements (transferring files). 

In [None]:
cat sample.submit

In [None]:
condor_submit sample.submit

In [None]:
condor_q

In [None]:
cat job*.output

In [None]:
cat output*.txt

## Job Submission with Pelican

### One Job Fetching a Container and Data File

In [None]:
cd $WORKDIR/fastqc

In [None]:
ls -lh

We are now going to submit a slightly more complex job example. This job will fetch both the `test.fastq` file from the OSDF that we used a minute ago, as well as a container with the `fastQC` bioinformatics program. 

In [None]:
grep "pelican" single-fastqc.submit

The job itself will run the FastQC program on the fetched data file and produce a visualization, which will get written back to the `results` folder

In [None]:
cat single-fastqc.submit

In [None]:
condor_submit single-fastqc.submit

In [None]:
condor_q

In [None]:
ls results/

One of the script commands was an `ls` so we can see that the `test.fastq` was downloaded by looking at the standard output file. 

In [None]:
cat logs/*.out

### Multiple Jobs Fetching a Single Container and Unique Data Files

In [None]:
cd $WORKDIR/fastqc

Because the Pelican object links can be quite long, it's helpful to use intermediate variables in the submit file. 

In [None]:
grep "OBJ_LOC" many-fastqc.submit

Finally, we'll run the same FastQC analysis, but with multiple data files (again, being fetched from the OSDF). 

In [None]:
cat many-fastqc.submit

In [None]:
condor_submit many-fastqc.submit

In [None]:
condor_q

In [None]:
ls results/

In [None]:
cat logs/*.out