Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactoring org.apache.hadoop.mapred usage against org.apache.hadoop.mapreduce #87

Closed
daniel-sudz opened this issue Mar 3, 2022 · 8 comments

Comments

@daniel-sudz
Copy link
Collaborator

something that was noticed when trying to use new versions of cascading on other projects like scalding (https://github.com/twitter/scalding/pull/1586/files example but this was abondoned) is an awkwardness between old API org.apache.hadoop.mapred and new API org.apache.hadoop.mapreduce and casting between them.

would cascading accept contributions to incrementally refactor parts against the new API?

@cwensel
Copy link
Owner

cwensel commented Mar 3, 2022

You can try and submit something to review, but historically you cannot mix and match api calls without instability/failure.

Moving to the new API (mr2) would be welcome, but that would be under a 5.0 release, and the last time I tried to do it I gave up after a week of effort.

@cwensel
Copy link
Owner

cwensel commented Mar 3, 2022

Can you summarize the issue that showed up in the pr?

If it's about parquet, note that they dropped Cascading from the project, and I have an intermediate branch I'm trying to restore it in, but I can't get the test to work.

@daniel-sudz
Copy link
Collaborator Author

daniel-sudz commented Mar 3, 2022

it's nothing unsolvable but keep having to have wide types that support for instance both JobConf and Configuration and then casting between the both of them to get different API. I understand that for this particular case you can cast directly from Configuration to JobConf so it's not a big deal.

That's what I've noticed so far I'm sure more things will come up over time and as you said in the general case it's not supported to mix new/old with any compatibility guarantees so it's more of a pre-caution.

@daniel-sudz
Copy link
Collaborator Author

Agree that this is very difficult. I've spent today poking around and it has a lot of moving parts. Instead of doing it all at once, this would need to be incrementally adopted because the changes are just too big otherwise.

@cwensel
Copy link
Owner

cwensel commented Mar 3, 2022

Agree that this is very difficult. I've spent today poking around and it has a lot of moving parts. Instead of doing it all at once, this would need to be incrementally adopted because the changes are just too big otherwise.

this is probably why I gave up, there was no middle ground. but that was years ago, I only remember being really mad when I threw in the towel.

it's nothing unsolvable but keep having to have wide types that support for instance both JobConf and Configuration and then casting between the both of them to get different API.

So Cascading 3 moved to using Configuration and away from JobConf, in general. If there is a spot the API needs to accept Configuration to prevent a cast, let's call it out (we can always make a new API and let the old one wrap it with a cast).

Unfortunately this move/change is exactly whey adoption of Cascading 3 was limited, gaining Tez support wasn't sufficient.

@cwensel
Copy link
Owner

cwensel commented Mar 3, 2022

@daniel-sudz so is it safe to say we should not release 4.5 until we have Scalding running on it?

I think this is reasonable and buys me time to get Parquet and a couple other features up into 4.1/4.5.

@daniel-sudz
Copy link
Collaborator Author

works for me @cwensel

@daniel-sudz
Copy link
Collaborator Author

closing this for now, using Configuration everywhere seems to be good with cascading from I've tested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants