Skip to content


Repository files navigation



This is the code repository for bjorn - a suite of tools for processing SARS-CoV-2 sequences to support large-scale genomic surveillance. This functionality relies on external tools such as pangolin, UsHER, and GNU parallel.


cd bjorn
docker build -t'bjorn_container' .


Launching the container

docker run --v {data_dir}:/data -v {temp_dir}:/temp -it bjorn_container

Running bjorn on a provision of new sequences. See example config file

./ {config.json} {provision.xz} [/data] [/temp]

(Existing sequence db in datadir will be auto-detected according to config.)

Processing a data provision from GISAID's jsonl format to tsv

cat {provision.xz} | ./ {provision_decoder} {provision_parser} {treeinfo_dir} {tempdir} {work_groups} {workers_per_group} > {provision.tsv}

Identifing changed records

./ {old_records.tsv} {new_records.tsv} {deletes_out.tsv} {insertions_out.tsv} {tempdir}

Analyzing sequences (alignment and mutation- and lineage-calling)

./ {provision.tsv} {workers} {subworkers} {blocksize} {treeinfo_dir} {geoinfo_dir} > {analysed_sequences.tsv}

Exporting to's jsonl format

parallel -j{workers} --block {blocksize} --pipepart "./ -i /dev/stdin -o /dev/stdout -u {unknown_value} -g {geoinfo_dir}" :::: {analysed_sequences.tsv} | gzip -c > {out.jsonl.gz}