Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liu 367 #235

Merged
merged 87 commits into from
Mar 18, 2024
Merged

Liu 367 #235

merged 87 commits into from
Mar 18, 2024

Conversation

awicenec
Copy link
Contributor

@awicenec awicenec commented Jul 5, 2023

Thus PR covers the full application of named ports for application components, rather than just Pyfunc components. It is now possible to connect any component with a port having the same name as one of the application arguments or component parameters and the engine will connect the value with the input or output. The component will still need to read or write. The best example is the updated version of dlg.apps.simple.SleepApp. There is now also a test related to that, based on a new example graph. In addition also the HelloWorld_Universe graph has been updated to run without the addition of the example components and has also been added to the tests, because there was no test for the gather component at all and the gather component broke with the first version of this update. We now have a few simple functions in dlg.apps.simple_functions, which are mostly used for testing, but are also useful to show how things are being done when using functions.
An InputFiredAppDROP will now first run it's internal _run() method and then execute the run() method provided by the App component implementation. The checking and population of the named ports is in that _run() method and thus applies to any component derived from the BarrierAppDROP(InputFiredAppDROP).

rtobar and others added 30 commits May 23, 2023 13:04
Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
Calling random.randint(1e6) is deprecated, as 1e6 is a float, not an
integer. Likewise, threading.Event.isSet is deprecated in favour of
is_set, which we were using in most places anyway.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
There is no need to re-compile these regular expressions each time we
create a drop, so this should save us some memory and CPU.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
This object is a simple dataclass that holds the information that
DropProxy objects need to perform their duties. This is the *static*
information -- the DropProxy still requires a RPCClient to actually
interact with a remote RPC server.

This little utility class encapsulates some of the behavior that was
previously found in the dynlib module, where we setup proxies for
newly-created processes. While this isn't a great change in itself, it
prepares the codebase for a much bigger change: the introduction of
subprocesses for the effective execution of drop apps.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
Drops stored a reference to the session that contained them in the
_dlg_session attribute. This attribute was then later used to extract
the ID of such session via *its* sessionID attribute. The existence of
this ID is then seen in a number of places across the code, causing a
number of effects (setting environment variables, logging the session
ID, etc).

A closer inspection to the code revealed that the sessionID attribute
was the only attribute ever read from the drop._dlg_session attribute.
Thus, storing the full session object is unnecessary. While during
normal usage this doesn't matter much, there is a negative effect on
serialisation of drops, which cannot be achieved because the Session
object bound to one of its attributes isn't serialisable (it holds not
only a lock, but also a reference to the Node Manager, which contains
open file descriptors, thread pools, and more).

This commit removes the internal _dlg_session attribute from the
AbstractDROP class, and replaces it with a _dlg_session_id. To make
things easier overall we default its value to an empty string, both when
the drops are created directly (e.g., MyDrop()) and when they are
constructed from the graph_loader module.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
Similarly to how each drop doesn't need full access to its session, but
only needs to know about the session ID, full access to the RPC server
hosting the drop isn't necessary either, only its endpoint. The latter
is used to create drop proxies on newly spawned processes running app
drops so they can contact their inputs/outputs. This is currently needed
by the dynlib module, which does such spawning, but we want to move to a
multiprocessing world where most (if not all) app drops execute in
separate processes.

This commit removes the _rpc_server attribute injected by the Session
object into each drop, and referencing the full NodeManager, and
replaces it with a simpler _rpc_endpoint that simply contains the
(host,port) tuple needed to contact the RPC server.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
awicenec and others added 29 commits November 7, 2023 15:37
By extracting this into an ABC we can change the implementation (by
default it will occur synchronously, but in the context of a
NodeManager, it will execute on a ThreadPool). This sets the stage for
an implementation that utilises Processes for true parrallelisation.

If a drop is asynchronously executed a seperate, daemon thread is
created to wait for execution to finish. This job shouldn't be run on
the same pool as the DropRunner as it is easy to deadlock by running out
of executors in the implementing pool (e.g. a ThreadPool with
max_executors=1, async_execute() is submitted, takes up the single
Thread prior to the actual run() method being submitted to the pool. In
this case run() will never execute as there is never a free thread and
async_execute() blocks forever). Note that the daemon thread will not be
terminated until the process exits, so this causes a memory leak and
will need to be addressed in future.
So each drop will run on seperate processes for true parallelism. Mirror
the ThreadDropRunner tests to ensure the functionality is the same
across both implementations.
As we can't rely on an external service for unit tests. Update the NGAS
host to one that will be maintained for manual tests when required.
Looks like a later version of pydantic is stricter about whether values
are required. A `Union[int, None]` without an initialised value of
`None` is no longer valid. To fix this, we simply initialise with
`None`.
@awicenec awicenec merged commit bc75eef into master Mar 18, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants