F-Seq, trimmer choices, peak count #17

vreuter · 2017-07-14T02:57:40Z

Aims to address two of the items in #6
Repo-internal version of #16

nsheff

looking good. a few questions.

nsheff · 2017-07-14T14:10:10Z

pipelines/ATACseq.py

+	peak_output_file = os.path.join(peak_folder,  args.sample_name + "_peaks.narrowPeak")
+	peak_input_file = shift_bed
+
+	if args.peak_caller not in PEAK_CALLERS:


Let's put the failures at the beginning, so it doesn't get partway through and then fail (fail early).

nsheff · 2017-07-14T14:18:40Z

pipelines/ATACseq.py

+	if args.peak_caller not in PEAK_CALLERS:
+		raise ValueError("Peak caller not in {}: '{}'".format(PEAK_CALLERS, args.peak_caller))
+
+	if args.peak_caller == "fseq":


having trouble following the fseq section -- what is fseq_optnames for? it looks like it's trying to grab params right out of param and not, for example, param.fseq ?

whoa, good catch! yes, it needs to change to param.fseq

That variable itself allows the options to be specified in the config (ATACseq.yaml) as either the true one- or two-character option names that are used by fseq, or as more human-readable names for what they represent. Probably not an issue for someone super familiar with fseq, but my thinking is nice for others to not need to memorize those or constantly refer back to the documentation to see what represents what if they want to make a change.

nsheff · 2017-07-14T14:22:28Z

pipelines/ATACseq.py

 import pypiper


+PEAK_CALLERS = ["fseq", "macs2"]
+TRIMMERS = ["trimmomatic", "pyadapt", "skewer"]


this is great, a perfect use of these variables. Nice!

nsheff · 2017-07-14T14:22:50Z

pipelines/ATACseq.py

 						help='Reference peak set for calculating FRIP')

-	parser.add_argument('--pyadapt', action="store_true",
-						help="Use pyadapter_trim for trimming? [Default: False]")
+	parser.add_argument("--peak-caller", dest="peak_caller",


good call to do this on peak callers as well!

nsheff · 2017-07-14T14:24:06Z

pipelines/ATACseq.py

-		cmd = cmds
-
-	else:  # default to trimmomatic
+	if args.trimmer not in TRIMMERS:


we can move this failure up, too.

I'm just going to get rid of these, I think--they'll always pass so long as the args attribute each for trimmer and for peak_caller isn't modified within main. argparse will catch any invalid specs for these via choices. I get paranoid with multi-branch conditional blocks in which there's a default, but I think it's fine to let it go here ;D

nsheff · 2017-07-14T14:25:41Z

pipelines/ATACseq.py

+		base = os.path.join(tools.scripts_dir, "pyadapter_trim.py")
+		flags = {"-a": local_input_files[0], "-b": local_input_files[1], "-o": out_fastq_pre}
+		flag_text = " ".join(["{} {}".format(flag, value) for flag, value in flags])
+		cmd = "{} {} -u".format(base, flag_text)


one advantage of the odd += method employed previously is that it's really easy to comment out individual options. that's the rationale there... we can discuss

what about:

flags = [ ("-a", local_input_files[0]), ("-b", "local_input_files[1]") ... ]

Then we'd get both benefits -- easy to comment out the individual options, and any typing errors would still be in Python syntax rather than logical/part of the actual command.

Also, it was good that you drew attention to this--right now the iteration should be over flags.items() in the pyadapt section in the flag_text comprehension; I accidentally wrote the skewer and pyadapt sections with different builtins 😱

…omment-out

nsheff · 2017-07-14T16:13:21Z

pipelines/ATACseq.py

-		cmd1 += " -o {0}".format(out_fastq_pre)
-		cmd1 += " {0}".format(local_input_files[0])
-
-		if args.paired_end:


local_input_files[1] if args.paired else None

Was looking for this and then remembered -- I took out these conditionals because of where args.paired_end is always set to True at the start of main

but that is there because we are trying to accept either paired or not. right now it's paired only but this will change (hopefully soon). so if you take it out where it's already set up, we will have to add it back in.

Ah OK well I already changed all instances of this so I'll have to revert those

My bad I shoulda picked up on that's what was going on

no worries, I don't think we had much in there, it was mostly relics from previous stuff...

nsheff · 2017-07-18T21:10:51Z

pipelines/ATACseq.py


 	parser.add_argument("--prealignments", default=[], type=str, nargs="+",
 						help="Space-delimited list of reference genomes to align to before primary alignment.")

+	# F-seq as peak caller
+	parser.add_argument("--fragment-size", type=int,


parameters to be passed to specific tools we usually do through the pipeline yaml file, rather than as a command-line argument; these can become quite numerous and would cloud the command line space, and are also usually modulated at the project level (not the sample level) meaning they are best included in the config file

Ah OK yeah good thinking. I added this when I was playing around with getting fseq working as the peak caller on the test_project, but I'll remove it now. Totally agree with trying to control the expansion of the option/parameter universe.

nsheff · 2017-07-18T21:11:36Z

pipelines/ATACseq.py

 	if not args.input:
 		parser.print_help()
 		raise SystemExit

 	return args


+
+def build_command(chunks):


nsheff · 2017-07-18T21:16:18Z

pipelines/ATACseq.py

+		# Rename the logfile.
+		#skewer_filename_pairs.append(("{}-trimmed.log".format(out_fastq_pre), trimLog))
+
+		trim_cmd_base = tools.skewer #+ " --quiet"


I would propose merging cmd_base with cmd_options so there's only cmd_chunks from the beginning... I'm not sure I see the need to define them separately...

nsheff · 2017-07-18T21:16:43Z

pipelines/ATACseq.py

 	if args.single_or_paired == "paired":
 		args.paired_end = True
 	else:
 		args.paired_end = True

 	# Initialize
 	outfolder = os.path.abspath(os.path.join(args.output_parent, args.sample_name))
-	pm = pypiper.PipelineManager(name="ATACseq", outfolder=outfolder, args=args, version=__version__)
+	pm = pypiper.PipelineManager(name="ATACseq", outfolder=outfolder, args=args, version=__version__, strict_config=True)


can we think of a way to do this without introducing a reliance on a pypiper dev upgrade? What needs the attribute dict behavior change?

Not super easily, but wouldn't be a giant lift either. I'd rather go ahead with the upgrade, though, since the open PR there is tiny and the way this is written now shouldn't need to change much--if at all--once pypiper and looper share the same AttributeDict. What about a minor pypiper release so that the dependency is still on master?

ok, possible -- but what is the reason for the change? what new thing is using this? I have not run into this before, what is it that needs it?

also I really don't want to make a pipeline developer put another argument in to the PipelineManager. I'd rather make this the default if we really must change pypiper

nsheff · 2017-07-18T21:17:49Z

pipelines/ATACseq.py

-		if not args.paired_end:
-			cmd2 = "mv {0} {1}".format(out_fastq_pre + "-trimmed.fastq", trimmed_fastq)
-			cmds.append(cmd2)
+			skewer_mode = "pe"


why not format this command construction in the same way as the others? I think we should be consistent and construct all commands the same way if we can

I made the others a bit more consistent; this one I went a different way so that the sections are more well-defined and grouped together. That is, there's only one conditional check on args.paired_end, and the mode setting, input files, and rename targets are all handled together, based on that single check. If they're split apart, there's more clutter from multiple checks about paired_end.

Regardless, I'm glad you drew attention here; I'd omitted the skewer_input_files from the command.

what happened to the if args.paired end "pe" else None concept? would that help here?

can't remember the exact way you suggested...

That would be one of the spots where the args.paired_end would be checked. Then it'd be checked separately to determine the input filenames and how to rename them.

nsheff · 2017-07-18T21:18:56Z

pipelines/ATACseq.py


-	parser.add_argument('--skewer', action="store_true",
-						help="Use skewer for trimming? [Default: False]")
+	parser.add_argument("--skip-tss", dest="skip_tss", action="store_true",


why would someone want to skip-tss from the cli?

You've mentioned a desire to allow someone to run the pipeline outside of looper; I think more hypothetical users will feel comfortable toggling behavior from the command line than by editing a configuration file.

I agree on the fragment-size front since that's much more internal to the behavior, but this feels more amenable to opt-in/-out.

agreed, this is a command-line thing -- I just don't think it needs to be configurable at all, not CLI vs config file

In other words, TSS enrichment should not be optional. this gets into issue #11

You & @rcorces certainly know better than I do here, so I'll remove this.

…re present; missing/extra commas

nsheff · 2017-07-19T16:26:41Z

pipelines/ATACseq.py

+    # TODO: determine if it's safe to handle this requirement with argparse.
+	# It may be that communication between pypiper and a pipeline via
+	# the pipeline interface (and/or) looper, and how the partial argument
+	# parsing is handled, that makes this more favorable.
 	if not args.input:


yeah -- args.input comes from parser = pypiper.add_pypiper_args(parser, all_args = True) -- this is standard in all our pipelines. open to other suggestions but this works at least

nsheff · 2017-07-19T16:33:09Z

pipelines/ATACseq.py

-	# filter peaks in blacklist 
+	else:
+		# MACS2
+		macs_cmd_chunks = [


I like this. it looks great!

nsheff · 2017-07-19T16:34:48Z

pipelines/ATACseq.yaml

+    q: 0.01
+    shift: 0
+  fseq:
+    of: npf    # narrowPeak as output format


crossed my mind; if you really wanted to hard-code a second backup default, you could add it here in comments; so:

l: 600 # feature length; default: 600

I think it's unnecessary personally, people can just add it in here if they change the original and want a record. but this is place I would do it if I were going to

After pondering a bit more, I'm OK with leaving them off. I'm thinking about it something like this:
"Keep our baseline configuration file as simple as possible; not providing the defaults here will still effect the same behavior from fseq. If a user really wants to fiddle with fseq, he/she should go read about the tool itself. Assuming the defaults have been suitably determined, I'd rather not invite tinkering when that may disrupt fseq in such a way that leads a user to mistakenly believe that there's a problem with the ATACseq pipeline."

nsheff · 2017-07-19T16:37:08Z

Ok, looks fine for me if you're happy with this. So we don't forget: eventually we want build_command in pypiper, but for now having it here is fine.

vreuter · 2017-07-19T16:41:42Z

Yeah definitely, that'll tidy this up a bit and make the command builder more broadly accessible.

vreuter · 2017-07-19T16:41:57Z

OK, going ahead and merging...

vreuter added 5 commits July 13, 2017 00:08

trimmer choice; tad of tidying; ignore JetBrains stuff

cddb9c3

first pass at fseq option as peak caller

3b02d30

Merge branch 'dev' of github.com:databio/ATACseq into cli-opts

7f2bf5b

avoid option conflicts and organize the config file

82e7c28

remove unused function stub

d2a7ca4

nsheff requested changes Jul 14, 2017

View reviewed changes

nsheff reviewed Jul 14, 2017

View reviewed changes

vreuter added 4 commits July 14, 2017 10:46

allow argparse to validate parameterization

be4ebae

fseq parameters are from fseq section

b8be6f1

use list-of-tuples, not dict, to store flags; facilitate a-la-carte c…

ec585fb

…omment-out

better fseq naming

6d6619d

nsheff reviewed Jul 14, 2017

View reviewed changes

vreuter added 15 commits July 14, 2017 12:28

simpler handling of the fseq options

d7e906c

first pass at use of command builder for the fseq portion

75a63f3

clearer section/stage flower boxing

e34f425

more robust initial checks for command builder, tests

79d61ba

add test reqs file

97fb244

trailing newlines for when the files are cat'd

e7484fe

add option to skip TSS enrichment, and invert the conditional

15b784d

better explanation of TSS skip cases

4b590a5

restore args.paired_end; better variable naming; better control flow

45c5f16

use the command builder

3330dce

make TSS enrichment script executable; tidy imports

693a922

work around loose config params

81105c9

handle output format

fe2c02d

figure out TypeError

fd608a3

clarity

ee1bc4a

peak counting with NGSTk

4ed9c81

vreuter changed the title ~~fseq option for peak calling, and choice model for trimmer options~~ F-Seq, trimmer choices, peak count Jul 18, 2017

account for read-less chromosomes

4a85129

nsheff reviewed Jul 18, 2017

View reviewed changes

vreuter added 9 commits July 18, 2017 17:35

incorporate the skewer input files; more consistent command construction

edbb306

always do TSS

bba5786

explain fseq failure; it needs special handling since it's Java

429ac23

fseq parameters in config; consistent command chunks; assume params a…

df7d5bd

…re present; missing/extra commas

command builder handlers null singletons; cleaner command construction

4bc5c73

remove dependency on open pypiper PR

955dd40

remove fseq defaults from pipeline itself

bc502e0

null singleton command builder tests; BDD-esque names

8230ccf

add null mixins test

1129e3c

nsheff reviewed Jul 19, 2017

View reviewed changes

pipelines/ATACseq.py

# filter peaks in blacklist

else:

# MACS2

macs_cmd_chunks = [

Copy link

Member

nsheff Jul 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. it looks great!

nsheff reviewed Jul 19, 2017

View reviewed changes

nsheff approved these changes Jul 19, 2017

View reviewed changes

vreuter merged commit 106106d into dev Jul 19, 2017

This was referenced Jul 19, 2017

Incorporate command builder function databio/pypiper#42

Open

TSS enrichment failure doesn't halt the pipeline? #19

Closed

vreuter deleted the cli-opts branch July 20, 2017 20:00

rcorces mentioned this pull request Jul 21, 2017

Release 0.4 #21

Merged

F-Seq, trimmer choices, peak count #17

F-Seq, trimmer choices, peak count #17

Conversation

vreuter commented Jul 14, 2017

nsheff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nsheff Jul 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vreuter Jul 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vreuter Jul 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nsheff commented Jul 19, 2017

vreuter commented Jul 19, 2017

vreuter commented Jul 19, 2017

nsheff Jul 18, 2017 •

edited

Loading

vreuter Jul 18, 2017 •

edited

Loading

vreuter Jul 18, 2017 •

edited

Loading