pyspark_blsspeller

Pyspark implementation of BLSSpeller algorithm.

Usage

usage: main.py [-h] --input INPUT --output OUTPUT --bindir BINDIR [--bls_thresholds BLS_THRESHOLDS] [--alphabet ALPHABET] [--degen DEGEN] [--min_len MIN_LEN] [--max_len MAX_LEN] [--conf_cutoff CONF_CUTOFF] [--fc_cutoff FC_CUTOFF] [--limit LIMIT] [--resume [RESUME]] [--streaming [STREAMING]] [--keep_tmps [KEEP_TMPS]] [--trigger TRIGGER] [--alignment_option ALIGNMENT_OPTION]

BLSSpeller configuration.

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT
  --output OUTPUT
  --bindir BINDIR
  --bls_thresholds BLS_THRESHOLDS
  --alphabet ALPHABET
  --degen DEGEN
  --min_len MIN_LEN
  --max_len MAX_LEN
  --conf_cutoff CONF_CUTOFF
  --fc_cutoff FC_CUTOFF
  --limit LIMIT         Limit the amount of input files that will be processed
  --resume [RESUME]     Skip iteration and reduction when these output folders are present
  --streaming [STREAMING]
                        Reduce motifs as soon as a process is finished iterating
  --keep_tmps [KEEP_TMPS]
                        Keep the temporary iterated motifs, that otherwise get remove when reducing via streaming
  --trigger TRIGGER     Interval for checking new files when streaming. Can also be used to balance iteration and reduction.
  --alignment_option ALIGNMENT_OPTION

Local

conda env update -f environment.yml
conda activate pyspark_blsspeller
python setup.py bdist_wheel --universal
bash example_spark-submit.sh

HPC

Note that first usage of conda requires conda init bash.

module swap cluster/swalot
module load Miniconda3/4.9.2
conda env update -f environment.yml
conda activate pyspark_blsspeller
python setup.py bdist_wheel --universal

Compile motifIterator

TODO

module load intel/2019b gtest/1.10.0-GCCcore-8.3.0 Arrow/0.17.1-fosscuda-2020b
# change to build dir in motifIterator
cmake ..
make -j0

Test on login node

The Spark module will also load an Arrow module, which will change LD_LIBRARY_PATH. The pyspark code expects this variable in the environment.

module load Spark/3.1.1-fosscuda-2020b
bash hpc_example_spark-submit.sh

Execute on cluster nodes

Note that hod can only run with Python 2, so remove the Python 3 of conda with conda deactivate.

module load hod
hod batch -n 1 --info --label pyspark_test_1 --workdir . --hodconf hod.conf --script hpc_example_spark-submit.sh -l 1 -m e

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
blsspeller		blsspeller
scripts		scripts
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
example_spark-submit.sh		example_spark-submit.sh
executor_environment.yaml		executor_environment.yaml
hod.conf		hod.conf
hpc_example_spark-submit.sh		hpc_example_spark-submit.sh
nodemanager.conf		nodemanager.conf
resourcemanager.conf		resourcemanager.conf
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blsspeller

blsspeller

scripts

scripts

.gitignore

.gitignore

README.md

README.md

environment.yml

environment.yml

example_spark-submit.sh

example_spark-submit.sh

executor_environment.yaml

executor_environment.yaml

hod.conf

hod.conf

hpc_example_spark-submit.sh

hpc_example_spark-submit.sh

nodemanager.conf

nodemanager.conf

resourcemanager.conf

resourcemanager.conf

setup.py

setup.py

Repository files navigation

pyspark_blsspeller

Usage

Local

HPC

Compile motifIterator

Test on login node

Execute on cluster nodes

About

Releases 1

Languages

berombau/pyspark_blsspeller

Folders and files

Latest commit

History

Repository files navigation

pyspark_blsspeller

Usage

Local

HPC

Compile motifIterator

Test on login node

Execute on cluster nodes

About

Topics

Resources

Stars

Watchers

Forks

Languages