Skip to content

Conversation

@jabraham17
Copy link
Contributor

@jabraham17 jabraham17 commented Jan 8, 2025

Adds prototypical support for calling an arbitrary python function on a distributed Arkouda array.

For example:

import arkouda as ak
ak.connect()
arr = ak.array([1,2,3])
res = ak.apply(arr, lambda x: x+1)
# res is now [2, 3, 4]

This PR uses cloudpickle to pickle arbitrary python code. The pickle is shipped to the server as base64, where it is depickled and executed.

To use this module, there are new requirements for Arkouda.

  • The arkouda_server must be built with the same version of python that it will be run with.
  • The arkouda_server's python version and the client's python version must be the same major/minor version.
  • cloudpickle must be available to the arkouda_server, not just the client
  • Any dependencies used within the body of the applied function must be available to the arkouda_server

@jabraham17 jabraham17 marked this pull request as draft January 8, 2025 00:11
@jabraham17 jabraham17 marked this pull request as ready for review January 18, 2025 02:17
@jabraham17 jabraham17 force-pushed the add-prototype-apply branch 2 times, most recently from 8a58f08 to 6df58b5 Compare January 21, 2025 16:40
@jabraham17
Copy link
Contributor Author

The CI failures appear unrelated to the contents of this PR

@jabraham17 jabraham17 requested a review from ajpotts January 24, 2025 16:19
Copy link
Contributor

@e-kayrakli e-kayrakli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jabraham17 -- thanks for this PR. I left some comments.
@ajpotts -- I see that Jade has some questions for you in the PR, could you take a look?

@e-kayrakli
Copy link
Contributor

Some libraries seem to not play with being cloudpickled multiple times. For example, using numpy inside a function applied to a pdarray is fine. However, if running from a jupter notebook and the apply is run from a cell twice in a row, the server will segfault

Do you have an idea how to mitigate this? It sounds like the second time you cloudpickle a numpy function, it is not playing well with the server, is that correct? Could the server cache functions? Arguably, that will not prevent the client from resending them, but they can be ignored. Or ideally, client can maintain a state, in which the fact that a given function is cached by the server can be stored.

@jabraham17
Copy link
Contributor Author

Do you have an idea how to mitigate this? It sounds like the second time you cloudpickle a numpy function, it is not playing well with the server, is that correct? Could the server cache functions? Arguably, that will not prevent the client from resending them, but they can be ignored. Or ideally, client can maintain a state, in which the fact that a given function is cached by the server can be stored.

Its not just the use of the same function, its about the use of a module in a function. This snippet replicates the issue

import arkouda as ak
import numpy as np

ak.connect()

n = 100_000
arr = ak.randint(0, 10, n)

def foo(x):
    np
    return 2.0
res = ak.apply(arr, foo, "float64")
print(res)

def bar(x):
    np
    return 1.0
res = ak.apply(arr, bar, "float64")
print(res)

@jabraham17
Copy link
Contributor Author

I have tracked down the crash to an issue with numpy and multiple interpreters: numpy/numpy#28271. This error should not crash the server (only throw an exception), so there is something on the Chapel Python interop side to fix. But this will just make a nicer error message, it will not fix the problem. For that, we either need to reusue the interpreter for each locale or numpy needs to be fixed.

@jabraham17
Copy link
Contributor Author

This error should not crash the server (only throw an exception), so there is something on the Chapel Python interop side to fix

This is a general Chapel bug, there is nothing that can be done within the Python module to fix it

Copy link
Contributor

@ajpotts ajpotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this so much!

Unfortunately I don't think we can merge it in until we can reduce the potential for server crashes.

Copy link
Contributor

@ajpotts ajpotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for all the hard work on this one.

add test case

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

add test to ini

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

add server msg

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

add client code

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

add support for arbitrary python code

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

add more tests

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

add python interop flags

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

add compat modules

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

add cloudpickle

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

use non-generic function

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

add stubs

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

add missing dependency

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

remove unused import

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

format code nicely

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

fix test

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

skip tests if not supported

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

ignore mypy errors for cloudpickle

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

add missing cloudpickle deps

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

apply reviewer feedback

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

update server modules

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

add ge-24 compat modules

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

encode python version and reuse interpreters

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

fix line length

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

fix compat

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

resolve limitation

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

commit changes to ge-24

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

expose python version to server config

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

fix typo

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

wrap line

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

improve tests

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

update docs

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

update test to use get_array_ranks

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

add isPythonModuleSupported

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

adjust problem size

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

update test skipif

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>

fix flake errors

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>
@ajpotts ajpotts added this pull request to the merge queue Feb 13, 2025
Merged via the queue into Bears-R-Us:master with commit 5437236 Feb 13, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants