Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when parsing Study or Assay files with repeated header names (as Term Source REF) #1

Closed
agbeltran opened this issue Jun 20, 2012 · 1 comment

Comments

@agbeltran
Copy link
Contributor

Error when parsing Study or Assay files with repeated header names (as Term Source REF)
The code builds a named tuple to store the attributes in multiple columns and named tuples don't allow duplicates.
Some output for a few datasets below.

Error with Yox1 data

Python 2.7.1 (r271:86832, Jul 31 2011, 19:30:53)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

from bcbio import isatab
rec = isatab.parse("/Users/agbeltran/workspace/datasets/Yox1")
Traceback (most recent call last):
File "", line 1, in
File "bcbio/isatab/parser.py", line 57, in parse
rec = s_parser.parse(rec)
File "bcbio/isatab/parser.py", line 192, in parse
["Raw Data File"])
File "bcbio/isatab/parser.py", line 228, in _parse_study
node.metadata)
File "bcbio/isatab/parser.py", line 248, in _line_keyvals
self._collapse_attributes)
File "bcbio/isatab/parser.py", line 260, in _line_by_type
val = collapse_quals_fn(line, header, hgroups[index])
File "bcbio/isatab/parser.py", line 275, in _collapse_attributes
Attrs = collections.namedtuple('Attrs', names)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 259, in namedtuple
ValueError: Encountered duplicate field name: 'Term_Source_REF'

Error with BII-S-6

rec = isatab.parse("/Users/agbeltran/workspace/datasets/BII-S-6")
Traceback (most recent call last):
File "", line 1, in
File "bcbio/isatab/parser.py", line 57, in parse
rec = s_parser.parse(rec)
File "bcbio/isatab/parser.py", line 185, in parse
["Sample Name", "Comment[ENA_SAMPLE]"])
File "bcbio/isatab/parser.py", line 228, in _parse_study
node.metadata)
File "bcbio/isatab/parser.py", line 248, in _line_keyvals
self._collapse_attributes)
File "bcbio/isatab/parser.py", line 260, in _line_by_type
val = collapse_quals_fn(line, header, hgroups[index])
File "bcbio/isatab/parser.py", line 275, in _collapse_attributes
Attrs = collections.namedtuple('Attrs', names)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 259, in namedtuple
ValueError: Encountered duplicate field name: 'Term_Source_REF'

Error with mtbls2

rec = isatab.parse("/Users/agbeltran/workspace/datasets/mtbls2")
Traceback (most recent call last):
File "", line 1, in
File "bcbio/isatab/parser.py", line 57, in parse
rec = s_parser.parse(rec)
File "bcbio/isatab/parser.py", line 192, in parse
["Raw Data File"])
File "bcbio/isatab/parser.py", line 228, in _parse_study
node.metadata)
File "bcbio/isatab/parser.py", line 248, in _line_keyvals
self._collapse_attributes)
File "bcbio/isatab/parser.py", line 260, in _line_by_type
val = collapse_quals_fn(line, header, hgroups[index])
File "bcbio/isatab/parser.py", line 275, in _collapse_attributes
Attrs = collections.namedtuple('Attrs', names)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 259, in namedtuple
ValueError: Encountered duplicate field name: 'Term_Source_REF'

chapmanb added a commit that referenced this issue Jun 21, 2012
…iles to be used for organizing. Closes issue #1
@chapmanb
Copy link
Member

Alejandra;
Thanks for the report on this issue. I checked in some fixes that handle this: I was missing 'Parmeter Value' when collapsing the header into sections which got a huge ol' section and led to the multiple Term Source REF error. I added a test for BII-S-6 to handle this node type as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants