Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem merging tables with overlapping broadcast relationships #38

Open
smmaurer opened this issue Jun 16, 2018 · 0 comments
Open

Problem merging tables with overlapping broadcast relationships #38

smmaurer opened this issue Jun 16, 2018 · 0 comments

Comments

@smmaurer
Copy link
Member

I'm having trouble merging sets of tables with overlapping broadcast relationships.

For example, these combinations run:

  • broadcasts from A -> B and A -> C, merge tables A, B, C
  • broadcasts from A -> B and B -> C, merge tables A, B, C

But this combination raises an error:

  • broadcasts from A -> B, A -> C, B -> C, merge tables A, B, C

This came up in real-world use (https://github.com/ual/urbansim_parcel_bayarea/issues/11), but here's a stand-alone demonstration that you can paste into a python script:

import orca
import pandas as pd

a = pd.DataFrame({'ix': [1,2], 'val_a': ['a1','a2']})
b = pd.DataFrame({'ix': [1,2], 'val_b': ['b1','b2'], 'a': [1,2]})
c = pd.DataFrame({'ix': [1,2], 'val_c': ['c1','c2'], 'a': [1,2], 'b': [1,2]})

orca.add_table('a', a.set_index('ix'))
orca.add_table('b', b.set_index('ix'))
orca.add_table('c', c.set_index('ix'))

orca.broadcast(cast='a', onto='b', cast_index=True, onto_on='a')
orca.broadcast(cast='b', onto='c', cast_index=True, onto_on='b')

df = orca.merge_tables(target='c', tables=['c', 'b', 'a'])

orca.broadcast(cast='a', onto='c', cast_index=True, onto_on='a')

df = orca.merge_tables(target='c', tables=['c', 'b', 'a'])  # throws error

Here is the error:

  File "test.py", line 19, in <module>
    df = orca.merge_tables(target='c', tables=['c', 'b', 'a'])  # error on this line
  File "/Users/maurer/Dropbox/Git-imac/udst/orca/orca/orca.py", line 1799, in merge_tables
    cast_table = frames[cast]
KeyError: 'a'
Twin-Clouds-iMac:Desktop maurer$ python test.py
Traceback (most recent call last):
  File "test.py", line 19, in <module>
    df = orca.merge_tables(target='c', tables=['c', 'b', 'a'])  # throws error
  File "/Users/maurer/Dropbox/Git-imac/udst/orca/orca/orca.py", line 1799, in merge_tables
    cast_table = frames[cast]
KeyError: 'a'

This is a bug, right? I can see how it's a potentially ambiguous merge, but if we just resolve it in a consistent way it seems like a supportable use case. Overlapping broadcasts are helpful if you want to do different merge combinations at different times with maximum efficiency.

I don't see an obvious source for the error, but will dig into it more when I have a chance.

I'm running Orca 1.5.1 and Pandas 0.22.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant