Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added new subcommand to merge TFBS in a subset of regions #268

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

mohobein
Copy link
Collaborator

@mohobein mohobein commented May 6, 2024

No description provided.

@mohobein mohobein requested a review from hschult May 6, 2024 12:28
Comment on lines +568 to +569
required.add_argument("--bindetect", "--TFBS", "--input", type=os.path.abspath, dest="tfbs", help="Path to the output directory of BINDetect containing all TFBS files.")
required.add_argument("--regions", help="Path to the query regions bed file.", type=os.path.abspath, dest="regions")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
required.add_argument("--bindetect", "--TFBS", "--input", type=os.path.abspath, dest="tfbs", help="Path to the output directory of BINDetect containing all TFBS files.")
required.add_argument("--regions", help="Path to the query regions bed file.", type=os.path.abspath, dest="regions")
required.add_argument("--bindetect", "--TFBS", "--input", required=True, type=os.path.abspath, dest="tfbs", help="Path to the output directory of BINDetect containing all TFBS files.")
required.add_argument("--regions", required=True, help="Path to the query regions bed file.", type=os.path.abspath, dest="regions")

You have to add required=True otherwise parameters are considered optional.


#Required arguments
required = parser.add_argument_group('Required arguments')
required.add_argument("--bindetect", "--TFBS", "--input", type=os.path.abspath, dest="tfbs", help="Path to the output directory of BINDetect containing all TFBS files.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add three different names for this parameter? This could be confusing for some. Consider choosing one and an abbreviation e.g. input and i


#Optional arguments
optional = parser.add_argument_group('Optional arguments')
optional.add_argument( "--output", default='./merged_TFBS_subset.xlsx', help="Path for output file. If file name ends with .bed, no header column will be added.", type=os.path.abspath, dest="output")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Use a plain txt file as default output for example tsv or bed.
  • Also, add the default value to the help message. See the parameters of other functions, match their style and check if other parameters should have a default in the help description.

optional.add_argument("--TFs", help="Path to the file containing the list of TFs to subset.", type=os.path.abspath, dest="TF", default=None)
optional = add_logger_args(optional)

return(parser)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return(parser)
return(parser)

Git wants to have an empty line at the end of each file.

parser.print_help()
sys.exit()

run_submerge(args.tfbs, args.regions, args.TF, args.output, args.verbosity)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?

Suggested change
run_submerge(args.tfbs, args.regions, args.TF, args.output, args.verbosity)
run_submerge(args)

Comment on lines +87 to +89
command = f"bedtools intersect -a {regions} -b {headless_file} -wa -wb"
intersection = subprocess.check_output(command, shell=True).decode("utf-8")
all_intersections += intersection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use pybedtools

Comment on lines +105 to +107
if args.output.endswith(".bed"):
pass
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if args.output.endswith(".bed"):
pass
else:
if not args.output.endswith(".bed"):

logger.end()


def main():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main function is not really needed. Since it is not much you could just add the code to the "main-if". However, this comes down to preference consider this a comment and keep it as you like.

Comment on lines +120 to +122
# remove 'chr' str from all chr columns TODO not optimal as contigs may be named differently, i.e. 'contig' or 'chrIV'
df["query chr"] = df["query chr"].str.replace("chr", "")
df["TFBS_chr"] = df["TFBS_chr"].str.replace("chr", "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do this for sorting purposes alone consider using the key parameter instead.

Comment on lines +137 to +141
df.to_csv(args.output, sep="\t", index=False)

if args.output.endswith(".xlsx"):
df = pd.read_csv(args.output, sep="\t")
df.to_excel(args.output, index=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are writing the output file then you read it back in and then you overwrite it as an excel file? What? Please explain :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants