-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss of DF multi-index when joining a DF without multi-index #429
Comments
I believe it is a bug, but can you please provide a self-containing example for easier reproduction? |
@zverok sorry for the delay, will get to this today. |
OK here we go: [1] pry(#<Nemawashi::Analytics::Transforms::Update>)> @source.data
=> #<Daru::DataFrame(4x7)>
unit testA_ft1 testA_ws1 testB_ft1 testB_ws1 testC_ft1 testC_ws1
1 1 nil 0.0 nil 0.0 nil 1.0
2 2 0.0 nil 1.0 nil 0.0 nil
3 3 nil 0.0 nil 0.0 nil 1.0
4 4 0.0 nil 1.0 nil 0.0 nil
[2] pry(#<Nemawashi::Analytics::Transforms::Update>)> dataobj
=> #<Daru::DataFrame(4x4)>
unit testA_ws1 testD_ws1 testE_ws1
1 1 1.0 0.0 1.0
2 2 0.0 1.0 0.0
3 3 1.0 1.0 0.0
4 4 0.0 1.0 0.0
[3] pry(#<Nemawashi::Analytics::Transforms::Update>)> @source.data.index
=> #<Daru::MultiIndex(4x1)>
1
2
3
4
[4] pry(#<Nemawashi::Analytics::Transforms::Update>)> dataobj.index
=> #<Daru::MultiIndex(4x1)>
1
2
3
4
[5] pry(#<Nemawashi::Analytics::Transforms::Update>)> updated_dataframe = @source.data.join(dataobj, how: :left, on: match_columns)
=> #<Daru::DataFrame(4x10)>
testA_ft1 testA_ws1_ testB_ft1 testB_ws1 testC_ft1 testC_ws1 unit testA_ws1_ testD_ws1 testE_ws1
0 nil 0.0 nil 0.0 nil 1.0 1 1.0 0.0 1.0
1 0.0 nil 1.0 nil 0.0 nil 2 0.0 1.0 0.0
2 nil 0.0 nil 0.0 nil 1.0 3 1.0 1.0 0.0
3 0.0 nil 1.0 nil 0.0 nil 4 0.0 1.0 0.0
[6] pry(#<Nemawashi::Analytics::Transforms::Update>)> updated_dataframe.index
=> #<Daru::Index(4): {0, 1, 2, 3}> In reality, it seems like any join between a multiindex DF and any type of indexed DF results in a Daru::Index DF. I believe this is related to the merge issue. |
Please check if it works on current |
@zverok I am still seeing the join method turn 2 DFs with multiindex into a single index but my example perhaps is not the best due to the fact that it looks like daru is actually 'optimizing' away the multi-index. [1] pry(#<Nemawashi::Analytics::Transforms::Update>)> @source.data.index.to_a
=> [[1], [2], [3], [4]]
[2] pry(#<Nemawashi::Analytics::Transforms::Update>)> dataobj.index.to_a
=> [[1], [2], [3], [4]]
[3] pry(#<Nemawashi::Analytics::Transforms::Update>)>
60: binding.pry
61: updated_dataframe = @source.data.join(dataobj, how: :left, on: match_columns)
=> 62: binding.pry
[2] pry(#<Nemawashi::Analytics::Transforms::Update>)> updated_dataframe.index.to_a
=> [0, 1, 2, 3] Is this intended behavior? I will try to find a better example where the pre-join index cannot be flattened to be the same as the post-join index. |
Well, in fact, currently:
I have a suspicion that the most reasonable behavior would be "make join columns an index", but not sure it should/would/can work (see "duplicate index discussion" here ;)). Anyways, I am pretty sure your initial request (preserving index of both dataframes on join) can't be performed. Why do you believe otherwise? |
@zverok if two dataframes have the same index shouldn't the index be preserved? |
But why it should? |
@zverok so let's say my multi-index is [:work_week, :factory] and i pull yield data from one database and I pull volume forecast data from another database that uses the same multi-index of [:work_week, :factory] and want them to be joined. why should I lose the matching multi-index on the subsequently created dataframe? |
By which column you join them, then? |
those 2 columns. |
So, you join them on index? |
Hi,
I am seeing a multi-index DF lose its multi-index (goes to 0,1,2,3...) when doing a left join with a DF without a multi-index. Is that the expected behavior?
thx
The text was updated successfully, but these errors were encountered: