[MRG] adding snakemake `--profile` #33

mr-eyes · 2022-01-23T11:47:51Z

Resolves #32

bluegenes · 2022-03-02T19:09:02Z

thanks Mo! It would be really great to provide an example snakemake rule where you set time, partition, etc within the snakemake rule, so folks can see how that happens :)

I have some examples over here if you want to just swipe: http://bluegenes.github.io/hpc-snakemake-tips/
e.g. -

rule quality_trim:
    input: 
        reads="rnaseq/raw_data/{sample}.fq.gz",
        adapters="TruSeq2-SE.fa",
    output: "rnaseq/quality/{sample}.qc.fq.gz"
    threads: 1
    resources:
        mem_mb=1000,
        runtime=10
    shell:
        """
        trimmomatic SE {input.reads} {output} \
        ILLUMINACLIP:{input.adapters}:2:0:15 \
        LEADING:2 TRAILING:2 SLIDINGWINDOW:4:2 MINLEN:25    
        """

bluegenes · 2022-03-02T19:44:13Z

One more thought -- I see the default jobs is 100 and default partition is med2 -- can we change these to follow our recommended queue usage?

options: default low2 to keep default jobs at 100, or default jobs <= 30 on med2.
alternatively (or in addition), you can add resources: [cpus=30, mem_mb=350000] to limit cpu and memory allocation. The one caveat is that we don't need these limits for low2 or bml, so they may be annoying to have in the cluster profile when running on those queues.

SichongP · 2022-03-02T23:08:14Z

A little trick that worked for me is using cpus_med2 and cpus_bmm to separate resource use on different partitions. Then I only set resource limit for med2 and bmm partition using resources: [cpus_med2=30, cpus_bmm=30]. This way snakemake will limit resources usage on medium priority partitions but won't restrict low partition usage.

Of course you will have to set cpus_med2 or cpus_low2 in your resource keyword for each rule instead of default parameter cpus.

As a bonus, you can use this function to automate which partition snakemake should submit your job to:

def getPartition(wildcards, resources):
    # Determine partition for each rule based on resources requested
    for key in resources.keys():
        if 'bmm' in key and int(resources['cpus_bmm']) > 0:
            return 'bmm'
        elif 'med' in key and int(resources['cpus_med']) > 0:
            return 'med2'
    if int(resources['mem_mb']) / int(resources['cpus']) > 4000:
        return 'bml'
    else:
        return 'low2'

And then in rule definition:

...
params: partition=getPartition
...

In my profile, I set following default resources:

default-resources: [cpus_bmm=0, cpus_med2=0, cpus=1, mem_mb_bmm=0, mem_mb_med2=0,, mem_mb=2000, time_min=120, node=1, task=1, download=0]

mr-eyes · 2022-03-03T06:02:40Z

One more thought -- I see the default jobs is 100 and default partition is med2 -- can we change these to follow our recommended queue usage?

options: default low2 to keep default jobs at 100, or default jobs <= 30 on med2. alternatively (or in addition), you can add resources: [cpus=30, mem_mb=350000] to limit cpu and memory allocation. The one caveat is that we don't need these limits for low2 or bml, so they may be annoying to have in the cluster profile when running on those queues.

Thanks, @bluegenes for the suggestions. I have edited the default parameters for partition. I don't think setting the default mem_mb to 350GB is a good idea because that will consume a lot of memory for the total running job on default parameters. Same with the cpu. What do you think?

mr-eyes · 2022-03-03T06:06:12Z

A little trick that worked for me is using cpus_med2 and cpus_bmm to separate resource use on different partitions.

That's a cool workaround, thanks for sharing! I think controlling the default parameters for each partition separately can also work using Python functions with the partition name as input.

bluegenes · 2022-03-03T16:50:16Z

I don't think setting the default mem_mb to 350GB is a good idea because that will consume a lot of memory for the total running job on default parameters. Same with the cpu. What do you think?

As I've used it,resources at the top level doesn't actually allocate that memory (or cpu/etc), it just limits the total amount you can allocate at once. The resources within each rule does try to allocate that particular amount of memory/etc, as does default-resources which is used to fill in resources for rules missing any of the default resource parameters.
https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources

That's a cool workaround, thanks for sharing! I think controlling the default parameters for each partition separately can also work using Python functions with the partition name as input.

This sounds like an excellent workaround. If we can set limits for med, high partitions by default and no limits for low, that would be really helpful. Of course for rare cases (deadlines, huge jobs, etc), users can override the limits by setting different ones on the command line with, e.g. --resources mem_mb=XX.

ctb · 2022-03-04T15:01:16Z

this is all greek to me. Maybe we need (or could use) a lab meeting tutorial/demo on cool farm/snakemake hacks...

bluegenes · 2022-03-05T15:08:25Z

this is all greek to me. Maybe we need (or could use) a lab meeting tutorial/demo on cool farm/snakemake hacks...

😂 I ran an ILLO on farm/snakemake (w/profiles and resource limitation hacks!) back in Aug 2020, but we could do another/up-to-date one? @mr-eyes, interested in doing this with me? Partition-specific allocation using this profile is already making my life better! @SichongP, I would also love your feedback on what we come up with if you have time, in case you have more/different tricks you use.

Back when profiles were newer, the hard part was figuring out how to introduce them without leaving folks behind who are newer to snakemake. But now I think profile setup is something we should just help everyone do as soon as possible, since it makes so many things easier (and doesn't add much complication, aside from setup).

ILLO from 8/24/2020 - http://bluegenes.github.io/hpc-snakemake-tips/
My practices have changed a little since then, but not a ton. I think for the next one, I would start with profiles and assume snakemake conda environment management :)

mr-eyes · 2022-03-05T17:34:07Z

@mr-eyes, interested in doing this with me?

Sure!

snakemake_profiles.md

Nice! Thanks, Tessa! Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

mr-eyes added 3 commits January 23, 2022 13:45

Create snakemake_profiles.md

f572333

Update snakemake_profiles.md

a139137

Update snakemake_profiles.md

d163c2d

mr-eyes changed the title ~~adding snakemake --profile~~ [MRG] adding snakemake --profile Jan 23, 2022

mr-eyes added 2 commits January 24, 2022 10:49

Update README.md

b3bc6f0

Update README.md

c0bcfea

Example + custom profile

18622fa

bluegenes reviewed May 17, 2022

View reviewed changes

snakemake_profiles.md Show resolved Hide resolved

Update snakemake_profiles.md

ffe66e4

Nice! Thanks, Tessa! Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] adding snakemake `--profile` #33

[MRG] adding snakemake `--profile` #33

mr-eyes commented Jan 23, 2022

bluegenes commented Mar 2, 2022 •

edited

Loading

bluegenes commented Mar 2, 2022

SichongP commented Mar 2, 2022

mr-eyes commented Mar 3, 2022

mr-eyes commented Mar 3, 2022

bluegenes commented Mar 3, 2022 •

edited

Loading

ctb commented Mar 4, 2022

bluegenes commented Mar 5, 2022 •

edited

Loading

mr-eyes commented Mar 5, 2022

[MRG] adding snakemake --profile #33

Are you sure you want to change the base?

[MRG] adding snakemake --profile #33

Conversation

mr-eyes commented Jan 23, 2022

bluegenes commented Mar 2, 2022 • edited Loading

bluegenes commented Mar 2, 2022

SichongP commented Mar 2, 2022

mr-eyes commented Mar 3, 2022

mr-eyes commented Mar 3, 2022

bluegenes commented Mar 3, 2022 • edited Loading

ctb commented Mar 4, 2022

bluegenes commented Mar 5, 2022 • edited Loading

mr-eyes commented Mar 5, 2022

[MRG] adding snakemake `--profile` #33

[MRG] adding snakemake `--profile` #33

bluegenes commented Mar 2, 2022 •

edited

Loading

bluegenes commented Mar 3, 2022 •

edited

Loading

bluegenes commented Mar 5, 2022 •

edited

Loading