Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Excel Input eats incoming fields, but still passes the metadata for them downstream (as if they still exist) #3221

Closed
usbrandon opened this issue Sep 12, 2023 · 8 comments · Fixed by #3447
Assignees
Labels
bug P2 Default Priority Transforms
Milestone

Comments

@usbrandon
Copy link
Contributor

Apache Hop version?

2.5-GA

Java version?

11

Operating system

Windows

What happened?

I wish the Excel Input had an option to pass incoming rows downstream, but it currently prevents incoming rows from passing through, but it does still forget to update the metadata and remove their definitions from the stream. Other steps like Select still think the fields exist in the stream, but as you can see from the preview on Excel, it only knows of the 16 fields configured from running that transform, not the 3 that came in before it. Those first three are still picked up by the Select transform downstream (to the right).

image

Issue Priority

Priority: 2

Issue Component

Component: Transforms

@mattcasters
Copy link
Contributor

Can you attach a sample @usbrandon? My eyes aren't that great anymore these days ;-)

@bjackson-ep
Copy link

excel-different-layout-solution.zip
image

There are 3 fields that go into the Excel Input Transform. The Excel file has 3 fields to output itself.
Notice when you check the dummy step output shows 3 fields only from the excel file. However, if you "Show Output Fields" it shows 6 (the three from the data grid, and the three from the Excel file). It would be great to have the option of letting those incoming fields pass through, or if that is not possible, to at least remove them from the stream that comes out of the Excel File Input. As is, if you have a Select Transform, it will see A, B, C, from the datagrid and the others from the excel file, but will mess up and throw errors because somehow it does not have real access to A, B, C.

Generally I like the option to either let incoming fields flow through or not.
This case is a bug because it looks like the transform did not intend to let those incoming fields flow out, but let the metadata for them flow out anyway.

@bjackson-ep
Copy link

My included example is 2.6-GA

@sramazzina sramazzina self-assigned this Nov 20, 2023
@sramazzina
Copy link
Contributor

Ciao @usbrandon. The standard behavior of this transform is to stop the incoming flow to go through the Excel Input. Overall, the best option in my opinion is to fix the discrepancy you have in your visualization. I mean, what you see in the "Show output fields" is wrong.

Apart from this, adding the flag you are asking for is trivial but, in case the user set it, we must decide what to do. IMO the way to go is cartesian product between the input rows and Excel file rows. I think we don't have any other options.

@usbrandon
Copy link
Contributor Author

Hello @sramazzina! Thank you for taking a look. I agree with you on both fronts. By default the transform should not emit the incoming fields or metadata. If there were a checkbox to allow them to flow though, you are right make them appear for each row of the output. That is consistent with other steps.

Happy to write the checkbox thing as a feature request. The incoming field metadata escaping definitely is a bug and screws up downstream steps by listing fields that are not present in the stream from before the Excel transform.

Thanks for whatever you can do to fix it. I very much appreciate it.

@sramazzina
Copy link
Contributor

Ciao @usbrandon thanks for your answer. The fix is almost done I was just waiting your confirmation to complete it. I will let you know when it will be ready so that you can test it. Have a nice day ;-)

sramazzina added a commit to sramazzina/hop that referenced this issue Nov 22, 2023
…e metadata for them downstream (as if they still exist)
@sramazzina
Copy link
Contributor

Hi @usbrandon, after a careful evaluation I decided to proceed differently and just fix the issue that let unwanted metadata move along.

About the idea of having a flag that gives the ability to pass-through incoming rows, well I decided to postpone that request and ask you to open a feature request because is not that easy to be done and requires careful evaluation. Moreover, the same behavior can be obtained very easily by using a cartesian product the way it is shown below.

image

Attached the sample I shown you in the picture

3221.zip

@sramazzina sramazzina mentioned this issue Nov 22, 2023
5 tasks
@usbrandon
Copy link
Contributor Author

usbrandon commented Nov 22, 2023 via email

usbrandon added a commit that referenced this issue Dec 28, 2023
@hansva hansva added this to the 2.8 milestone Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug P2 Default Priority Transforms
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants