Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'alignment_id' #151

Closed
sacdallago opened this issue Mar 24, 2018 · 9 comments
Closed

KeyError: 'alignment_id' #151

sacdallago opened this issue Mar 24, 2018 · 9 comments

Comments

@sacdallago
Copy link
Member

This problem happens on stable in compare and makes the execution of later stages crash.

AFAIK it happens when no PDB structure to compare with is found, a complete stack follows.

Traceback (most recent call last):
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2525, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'alignment_id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/evcouplings/utils/pipeline.py", line 454, in execute_wrapped
    outcfg = execute(**config)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/evcouplings/utils/pipeline.py", line 174, in execute
    outcfg = runner(**incfg)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/evcouplings/compare/protocol.py", line 1034, in run
    return PROTOCOLS[kwargs["protocol"]](**kwargs)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/evcouplings/compare/protocol.py", line 499, in standard
    "prefix": aux_prefix,
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/evcouplings/compare/protocol.py", line 101, in _identify_structures
    **kwargs
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/evcouplings/compare/sifts.py", line 772, in by_alignment
    mapping_df, on=hit_columns
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/frame.py", line 5370, in merge
    copy=copy, indicator=indicator, validate=validate)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 57, in merge
    validate=validate)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 565, in __init__
    self.join_names) = self._get_merge_keys()
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 837, in _get_merge_keys
    left_keys.append(left[lk]._values)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/frame.py", line 2139, in __getitem__
    return self._getitem_column(key)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/frame.py", line 2146, in _getitem_column
    return self._get_item_cache(key)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/generic.py", line 1842, in _get_item_cache
    values = self._data.get(item)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/internals.py", line 3843, in get
    loc = self.items.get_loc(item)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2527, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
[cd174@login04 BB22OPb]$ cat Effector.failed
Traceback (most recent call last):
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2525, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'alignment_id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/evcouplings/utils/pipeline.py", line 454, in execute_wrapped
    outcfg = execute(**config)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/evcouplings/utils/pipeline.py", line 174, in execute
    outcfg = runner(**incfg)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/evcouplings/compare/protocol.py", line 1034, in run
    return PROTOCOLS[kwargs["protocol"]](**kwargs)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/evcouplings/compare/protocol.py", line 499, in standard
    "prefix": aux_prefix,
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/evcouplings/compare/protocol.py", line 101, in _identify_structures
    **kwargs
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/evcouplings/compare/sifts.py", line 772, in by_alignment
    mapping_df, on=hit_columns
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/frame.py", line 5370, in merge
    copy=copy, indicator=indicator, validate=validate)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 57, in merge
    validate=validate)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 565, in __init__
    self.join_names) = self._get_merge_keys()
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 837, in _get_merge_keys
    left_keys.append(left[lk]._values)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/frame.py", line 2139, in __getitem__
    return self._getitem_column(key)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/frame.py", line 2146, in _getitem_column
    return self._get_item_cache(key)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/generic.py", line 1842, in _get_item_cache
    values = self._data.get(item)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/internals.py", line 3843, in get
    loc = self.items.get_loc(item)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_stable/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2527, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'alignment_id'

In the config for this file, the pdb alignment method is set so:

  pdb_alignment_method: jackhmmer

@aggreen should have fixed this on develop but due to the circular dependency, it's hard to test, and I would keep it open until confirmed fixed. Also, I remember @aggreen saying yesterday that it's odd that it also happened using jackhmmer, so just for the reference.

@aggreen
Copy link
Contributor

aggreen commented Mar 24, 2018

I have verified that this is fixed on develop for both jackhmmer and hmmsearch options.

Jackhmmer: I tested this using the config file @sacdallago sent this morning, which was yielding the error using the jackhmmer option.

Hmmsearch: tested earlier this week with Bobby

Not sure if we should close this issue as it is still something that is broken on the master branch and stable environment on o2.

@sacdallago
Copy link
Member Author

sacdallago commented Mar 25, 2018

No; if a fix is merged tested in dev, we can close :)

And thanks @aggreen

@aggreen aggreen reopened this Mar 29, 2018
@aggreen
Copy link
Contributor

aggreen commented Mar 29, 2018

This issue is fixed in the master branch of my development environment (or at least, it does not occur in my development environment). But, the issue is still occurring on evcouplings_develop on o2. I have checked the following:

  1. There are no differences between my master branch and the evcouplings development branch on the github site

  2. The same config file produces no error in my environment (agg_evcomplex) and an error in the evcouplings_develop environment

diff ../../../../software/anaconda_o2/envs/evcouplings_develop/lib/python3.6/site-packages/evcouplings/compare/sifts.py ../EVcouplings/evcouplings/compare/sifts.py 

diff ../../../../software/anaconda_o2/envs/evcouplings_develop/lib/python3.6/site-packages/evcouplings/align/tools.py ../EVcouplings/evcouplings/align/tools.py 

return no differences between the two files (my environment package vs development environment package).

(evcouplings_develop) ag300@login02:/n/groups/marks/users/agreen/dev/tests$ which evcouplings
/n/groups/marks/software/anaconda_o2/envs/evcouplings_develop/bin/evcouplings
  1. My orchestra environment is up to date with my master on github.

So, I'm really at a loss as to what could be causing this problem, and as of now I can't say that the development branch is really working. I'm attaching a culprit config file if anyone would like to try to reproduce this issue.

Are we sure that the evcouplings_develop is really using the evcouplings found in the site packages folder? That's the only other possibility I can think of.

ptch1_loop1_b0.1_config_dev.txt

@thomashopf
Copy link
Contributor

@aggreen I reproduced the same problem on o2 with the evcouplings_develop environment. I am a bit confused though... I don't see any recent commits of yours in here: https://github.com/debbiemarkslab/EVcouplings/commits/develop/evcouplings/compare/sifts.py

Could you point me to the commit where you fixed the issue?

@aggreen
Copy link
Contributor

aggreen commented Mar 30, 2018

I believe this was fixed by the January 2nd commit, c2a166f
By creating that raw focus alignment file with the query sequence, we guaranteed that hmmsearch would always return at least the query sequence as a hit.

Can you reproduce the lack of a problem with my environment, agg_evcomplex? This may be a new error which I can't reproduce for whatever reason, but it seems like the same issue as was fixed before.

@thomashopf
Copy link
Contributor

thomashopf commented Mar 30, 2018

Seems to be some sort of pandas version issue... the by_alignment() call worked successfully on my local machine with pandas 0.19 until I upgraded to 0.22, now I get the same problem as on evcouplings_develop, which is on 0.22. Your agg_complex environment is on 0.20.3...

I'll have a look now if I can trace down why this breaks all of a sudden with the latest version of pandas.

@thomashopf
Copy link
Contributor

thomashopf commented Mar 30, 2018

This is a groupby agggregation gone wrong - dataframe is empty cause no structures found, index columns are lost during aggregation because of empty dataframe, and then missing during the subsequent merge. Allegedly, this issue pandas-dev/pandas#8093 has been fixed in pandas 0.22 but I find it odd that this is the very release that first caused the problem for us. Anyways, fixed now and merging into develop in a second.

@thomashopf
Copy link
Contributor

Should be working now on any pandas version (I haven't updated the evcouplings_develop environment yet). Closing for now, please reopen if this resurfaces ever again.

@aggreen
Copy link
Contributor

aggreen commented Mar 30, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants