added new subcommand to merge TFBS in a subset of regions #268

mohobein · 2024-05-06T12:28:00Z

No description provided.

hschult · 2024-06-05T07:50:21Z

tobias/parsers.py

+	required.add_argument("--bindetect", "--TFBS", "--input", type=os.path.abspath, dest="tfbs", help="Path to the output directory of BINDetect containing all TFBS files.")
+	required.add_argument("--regions", help="Path to the query regions bed file.", type=os.path.abspath, dest="regions")


Suggested change

required.add_argument("--bindetect", "--TFBS", "--input", type=os.path.abspath, dest="tfbs", help="Path to the output directory of BINDetect containing all TFBS files.")

required.add_argument("--regions", help="Path to the query regions bed file.", type=os.path.abspath, dest="regions")

required.add_argument("--bindetect", "--TFBS", "--input", required=True, type=os.path.abspath, dest="tfbs", help="Path to the output directory of BINDetect containing all TFBS files.")

required.add_argument("--regions", required=True, help="Path to the query regions bed file.", type=os.path.abspath, dest="regions")

You have to add required=True otherwise parameters are considered optional.

hschult · 2024-06-05T07:50:55Z

tobias/parsers.py

+
+	#Required arguments
+	required = parser.add_argument_group('Required arguments')
+	required.add_argument("--bindetect", "--TFBS", "--input", type=os.path.abspath, dest="tfbs", help="Path to the output directory of BINDetect containing all TFBS files.")


Why did you add three different names for this parameter? This could be confusing for some. Consider choosing one and an abbreviation e.g. input and i

hschult · 2024-06-05T08:03:49Z

tobias/parsers.py

+
+	#Optional arguments
+	optional = parser.add_argument_group('Optional arguments')
+	optional.add_argument( "--output", default='./merged_TFBS_subset.xlsx', help="Path for output file. If file name ends with .bed, no header column will be added.", type=os.path.abspath, dest="output")


Use a plain txt file as default output for example tsv or bed.

Also, add the default value to the help message. See the parameters of other functions, match their style and check if other parameters should have a default in the help description.

hschult · 2024-06-05T08:04:17Z

tobias/parsers.py

+	optional.add_argument("--TFs", help="Path to the file containing the list of TFs to subset.", type=os.path.abspath, dest="TF", default=None)
+	optional = add_logger_args(optional)
+
+	return(parser)


Suggested change

return(parser)

return(parser)

Git wants to have an empty line at the end of each file.

hschult · 2024-06-05T08:33:04Z

tobias/tools/submerge.py

+        parser.print_help()
+        sys.exit()
+
+    run_submerge(args.tfbs, args.regions, args.TF, args.output, args.verbosity)


Why not?

Suggested change

run_submerge(args.tfbs, args.regions, args.TF, args.output, args.verbosity)

run_submerge(args)

hschult · 2024-06-05T09:17:49Z

tobias/tools/submerge.py

+        command = f"bedtools intersect -a {regions} -b {headless_file} -wa -wb"
+        intersection = subprocess.check_output(command, shell=True).decode("utf-8")
+        all_intersections += intersection


use pybedtools

hschult · 2024-06-05T09:18:36Z

tobias/tools/submerge.py

+        if args.output.endswith(".bed"):
+            pass
+        else:


Suggested change

if args.output.endswith(".bed"):

pass

else:

if not args.output.endswith(".bed"):

hschult · 2024-06-05T09:22:15Z

tobias/tools/submerge.py

+    logger.end()
+
+
+def main():


The main function is not really needed. Since it is not much you could just add the code to the "main-if". However, this comes down to preference consider this a comment and keep it as you like.

hschult · 2024-06-05T09:28:25Z

tobias/tools/submerge.py

+    # remove 'chr' str from all chr columns TODO not optimal as contigs may be named differently, i.e. 'contig' or 'chrIV'
+    df["query chr"] = df["query chr"].str.replace("chr", "")
+    df["TFBS_chr"] = df["TFBS_chr"].str.replace("chr", "")


If you do this for sorting purposes alone consider using the key parameter instead.

hschult · 2024-06-05T09:34:08Z

tobias/tools/submerge.py

+    df.to_csv(args.output, sep="\t", index=False)
+
+    if args.output.endswith(".xlsx"):
+        df = pd.read_csv(args.output, sep="\t")
+        df.to_excel(args.output, index=False)


You are writing the output file then you read it back in and then you overwrite it as an excel file? What? Please explain :D

added new subcommand to merge TFBS in a subset of regions

3b0aaec

mohobein requested a review from hschult May 6, 2024 12:28

hschult requested changes Jun 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added new subcommand to merge TFBS in a subset of regions #268

added new subcommand to merge TFBS in a subset of regions #268

mohobein commented May 6, 2024

hschult Jun 5, 2024

hschult Jun 5, 2024

hschult Jun 5, 2024

hschult Jun 5, 2024

hschult Jun 5, 2024

hschult Jun 5, 2024

hschult Jun 5, 2024

hschult Jun 5, 2024

hschult Jun 5, 2024

hschult Jun 5, 2024

		required.add_argument("--bindetect", "--TFBS", "--input", type=os.path.abspath, dest="tfbs", help="Path to the output directory of BINDetect containing all TFBS files.")
		required.add_argument("--regions", help="Path to the query regions bed file.", type=os.path.abspath, dest="regions")

	run_submerge(args.tfbs, args.regions, args.TF, args.output, args.verbosity)
	run_submerge(args)

added new subcommand to merge TFBS in a subset of regions #268

Are you sure you want to change the base?

added new subcommand to merge TFBS in a subset of regions #268

Conversation

mohobein commented May 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment