Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip comments in BED files #13

Closed
edmundmiller opened this issue Feb 28, 2022 · 3 comments · Fixed by #14
Closed

Skip comments in BED files #13

edmundmiller opened this issue Feb 28, 2022 · 3 comments · Fixed by #14
Labels
enhancement New feature or request formats format specificaion or parsing

Comments

@edmundmiller
Copy link

Expected Behavior

BED file reader respects comments. Not sure if this is in the BED spec, or if it's just not going to be supported, making this issue for others to reference though

Current Behavior

When trying to read a file with comments(prefixed with #, see included example file) this error is thrown

LoadError: ArgumentError: malformed file

Possible Solution / Implementation

Allow comments

Steps to Reproduce (for bugs)

Sorry, GH issues don't support BED files, apparently.

# HOMER Peaks
# Peak finding parameters:
# tag directory = GM_tagdir
#
# total peaks = 158781
# peak size = 153
# peaks found using tags on both strands
# minimum distance between peaks = 306
# fragment length = 152
# genome size = 2000000000
# Total tags = 322230773.0
# Total tags in peaks = 93989466.0
# Approximate IP efficiency = 29.17%
# tags per bp = 0.114786
# expected tags per peak = 17.562
# maximum tags considered per bp = 16.0
# effective number of tags used for normalization = 10000000.0
# Peaks have been centered at maximum tag pile-up
# FDR rate threshold = 0.001000000
# FDR effective poisson threshold = 1.591138e-05
# FDR tag threshold = 38.0
# number of putative peaks = 682969
#
# size of region used for local filtering = 10000
# Fold over local region required = 4.00
# Poisson p-value over local region required = 1.00e-04
# Putative peaks filtered by local signal = 523066
#
# Maximum fold under expected unique positions for tags = 2.00
# Putative peaks filtered for being too clonal = 1122
#
# cmd = findPeaks GM_tagdir -style factor -o GM.peaks.txt
#
# Column Headers:
#PeakID	chr	start	end	strand	Normalized Tag Count	focus ratio	findPeaks Score	Fold Change vs Local	p-value vs Local	Clonal Fold Change
chr21	8401346	8401499	chr21-3	1	+
chr21	8445578	8445731	chr21-1	1	+
chr21	8218308	8218461	chr21-2	1	+
for interval in open(BED.Reader, "test.bed")
    print(BED.chrom(interval))
end

Context

Your Environment

  • Package Version used:
  • Julia Version used:
  • Operating System and version (desktop or mobile):
  • Link to your project:
@kescobo
Copy link
Member

kescobo commented Feb 28, 2022

Looks like using # to denote a header is common, but not part of the official spec.

🙄 Sigh.

Just earlier today there was some conversation on slack about something similar in FASTX.jl - so many of these biological formats have poorly-defined specs that get extended / modified in various ways that people then expect to work, but supporting all of the variants while maintaining the strict validation that we'd like to have is challenging (both conceptually and in terms of code complexity / maintenance).

I don't know exactly what we should target support-wise for .bed, but I think minimally including track, browser, and # header lines might be workable and worth doing.

@kescobo kescobo added enhancement New feature or request formats format specificaion or parsing labels Feb 28, 2022
@edmundmiller
Copy link
Author

edmundmiller commented Mar 31, 2022

@CiaranOMara
Copy link
Member

CiaranOMara commented Apr 2, 2022

GA4GH BED v1.0: A formal standard sets ground rules for genomic features

The article references https://github.com/samtools/hts-specs/blob/master/BEDv1.pdf, and section 1.4.2 therein states that comment lines may occur anywhere within the BED file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request formats format specificaion or parsing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants