New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modin integration #22
Comments
Comment by devin-petersohn Hi @skrawcz, I'm happy to help support this idea.
What is required from the Modin side for an integration? |
Comment by skrawcz In my ideal world, we could we enable people to maintain https://github.com/stitchfix/hamilton/blob/main/examples/hello_world/my_functions.py without touching it. Reasoning:
idea 1Add this to the top of the file, if people want from hamilton.augment import pandas as pd And if idea 2We would require users to "hard code" modin -- i.e. replace the pandas import in https://github.com/stitchfix/hamilton/blob/main/examples/hello_world/my_functions.py with modin. idea 3Is there some other python duck typing way (this might be considered hacky), that "driver" or "graph adapter" code would own? |
Comment by elijahbenizzy Either one could be OK I think. That said, I want to try something like the following, in the driver:
So long as its the first thing executed (big IF, but it should be), then this should work... |
Comment by skrawcz
Hmm, that could work. Will have to prototype and see how the ergonomics feel. |
Comment by skrawcz
Just putting it in here, that we could use https://docs.python.org/3/library/importlib.html#checking-if-a-module-can-be-imported or something like that to check if modin is installed. |
Comment by devin-petersohn Thanks @skrawcz and @elijahbenizzy! Is there some way we can help to support this on the Modin side? Typically the approach I have taken with Modin is that the choice should be the users and that users should be aware that Modin is being used. We make it easy to not only move to Modin, but also back to pandas from Modin if you choose. Replacing
|
Comment by skrawcz Thanks @devin-petersohn comments inline:
It wouldn't happen on import of hamilton no. It would be as part of a script/flow of execution: from hamilton import driver, switch_modin_for_pandas, switch_pandas_for_modin
# do switch here
switch_modin_for_pandas()
# have to import after doing the switch
import func_module
dr = driver.Driver({}, func_module)
df = dr.execute(['a', 'b', ...])
# switch it back
switch_pandas_for_modin()
save_df(df)
The idea is that they opt-in to this, but ideally they don't have to change any of their logic to do so.
Potentially. It would enable us to depend on it, rather than having to write and maintain it ourselves. |
Comment by devin-petersohn Ok this should actually be pretty straightforward as an opt-in utility. I think we can probably start with a hacky approach in Hamilton and then merge it into Modin as it matures as a configuration (i.e. |
Comment by skrawcz @devin-petersohn sounds good. Do you happen by chance to have that hacky incantation? |
Comment by skrawcz @devin-petersohn okay so we can't play with sys modules, because modin still requires access to pandas. So you'd have to provide a means to do it. Created modin-project/modin#4488 to track. Otherwise I am going to prototype idea 1 and see how that feels. |
Issue by skrawcz
Thursday Mar 10, 2022 at 19:09 GMT
Originally opened as stitchfix/hamilton#85
Is your feature request related to a problem? Please describe.
Modin - https://github.com/modin-project/modin - also enables scaling pandas computation. Since we have ray, dask, and koalas, why not add Modin?
Describe the solution you'd like
Modin requires a replacement of the pandas import in user code to work.
We would need to think how to do this:
Additional context
N/A
The text was updated successfully, but these errors were encountered: