Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[not-for-merge] Towards statically type-checking parsl #1676

Draft
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

benclifford
Copy link
Collaborator

@benclifford benclifford commented May 10, 2020

This is a branch i've been working on since starting on parsl, initially to understand how static type checking would help with parsl, and how parsl worked from a type perspective. I've made it into a draft PR in case anyone else is interested in looking at it.

This branch is not intended for merge to master - instead, I have been keeping master regularly merged into this branch, and when I've discovered interesting fixes to make, I've usually pushed those changes across to master.

This branch drops support for versions of python < 3.8 in order to make unconstrained use of newer type system features.

@benclifford benclifford changed the title Towards statically type-checking pasl Towards statically type-checking parsl Jul 7, 2020
@benclifford benclifford changed the title Towards statically type-checking parsl [not-for-merge] Towards statically type-checking parsl Sep 16, 2021
@benclifford benclifford force-pushed the benc-mypy branch 2 times, most recently from 73bb376 to cc26640 Compare February 1, 2023 23:03
@benclifford benclifford force-pushed the benc-mypy branch 2 times, most recently from 3d60f35 to 849a834 Compare April 4, 2023 13:53
@benclifford benclifford force-pushed the benc-mypy branch 4 times, most recently from 0e026c7 to 0b5924a Compare April 24, 2023 11:06
@benclifford benclifford force-pushed the benc-mypy branch 3 times, most recently from 5f0eac1 to f1163da Compare November 1, 2023 14:50
This is broken in two ways:

* Hard-coding IP addresses is bad internet citizenship, especially into
address blocks that are not owned by the Parsl project. If name service
lookup fails, it is bad practice to fall back to a hard-coded address.

* The hard-coded IP address goes into a variable that is then not used
anywhere, so the hard-coded IP address functionality doesn't actually
work.
periodically

might help with 3.12 typechecking with paramspecs? (not needed for master because paramspec is too new... 3.10)
The codepath would be used when scale_in is called with:

  force=False
  max_idletime=None

and would pick blocks from the (weirdly sorted - see issue #2120) list of
blocks which are currently idle.

This PR removes that unused codepath, merging the choice of
forced/non-forced scale-in and idle time specification into a single
parameter that indicates "do not scale in blocks that have not been
idle this long".
…nto a simple strategy test

a different PR will introduce a variation that tests htex_auto_scale functionality
…g idle, not from the whole executor being idle)

this is quite awkward ot test. what can we do?

the current test has min_blocks 2, which means if we launch a single task (pretty much all we can), we'll not be able to distinguish min_blocks scaling from single task scaling. so a separate test perhaps with min_blocks 0?
…at least one manager registered

how to test this? make an htex with init_blocks=min_blocks=1 block and a worker command that does something like sleep forever (it needs to not fail, because failed blocks wont' get scaled in). and then shutdown without any tasks. in the old behaviour that unregistered block will not be scaled in, i think, but in the new behaviour it should be.
* add retcode

* use "job" not "task"

in parsl: "job" refers to a batch system job / aka a block,
and "task" refers to a user-supplied piece of code submitted
to the DFK that is run by an executor
remove typechecker note that typeguard appears to have evolved beyond, although in a way that makes me uncomfortable?

make default_staging a sequence which can help guard against mutating it
…r interface) only for printing a status line. this is awkward to statically typecheck, so remove it - and its awkward to typecheck because it's awkward to have this in the executor interface, or even in the block provider executor interface?

an alternative is to have a HasConnectedWorkers interface type that gives the connected_workers attribute, but that's a lot of lines of code for something that is probably misabstracted to strategy anyway

The htex connected workers info is useful, but that should be displayed elsewhere

the now-unused/reused connected_workers property is made back into a method not a property, because it makes an RPC callout (so subject to hangs from a broken interchange), which is unclear when reading connected_workers as a property.

TODO: put that connected workers logging into htex itself, rather than in strategy - probably into the interchange log?
there's already a place in the interchange log - there's a debug message with manager counts every 10ms - could make that less frequent and at INFO level, rather tha nevery 10ms?
serialise many times a few different structures

deserialise many times a few different structures

potentially do some serious stats (maybe theres a library to do that like criterion in haskell already? google for python microbenchmarking...)

when deserializing the same list object, it's 10x slower...

in practice, one thing I'd expect that caching to be relevant is repeated deserialization of function objects in htex executions (rather than caching of values - although that would sometimes be true too...)

a more nuanced approach might allowed caching on the deserialisation side only for known-whitelisted types (such as function - but not sure about partial?) - which is the 'callable' distinction differently (and perhaps better?)

however, as these tmings are don in the 20ns range, this may not really matter too much for htex...
The user-facing DFK add_executors call will raise a ConfigurationError,
but the internally used Strategy.add_executors call performs this check only
as an assertion, as it is not user facing.

This is follow-on work from PR2712, which removes duplicate
adding of executors.
This should get more type checking to happen inside the test
suite.

It reveals a number of mypy failures that come from broken
(disabled) tests and which need fixing. Perhaps it could drive
actually fixing some of those tests?
if running many htexes in sequence in a process, this results in a buildup of htex threads churning away

i have some suspicion that this causes some hangs in CI... but also, I don't want these threads running in a many-sequential-DFK environment for general threads vs multiprocessing reliability

this patch was written on the assumption that the timeout parameter caused the checking loop to repeatedy iterate so as to eventually discover is_alive is set to false; however, turns out that parameter is not implemented (see PR #2673) so I'll have to implement this in some different way
TODO: this breaks: the port hasn't been chosen by this time - it happens at startup
this PR relies on earlier PRs which fix broken behaviour of ignore_for_cache vs AUTO_LOG,
which was fixed in a few previous PRs. (TODO: numbers here)
i should check these asserts to see if they can disappear
This should get more type checking to happen inside the test
suite.

It reveals a number of mypy failures that come from broken
(disabled) tests and which need fixing. Perhaps it could drive
actually fixing some of those tests?
This is awkward to typecheck, because "None" is still a valid return code: it is a job id

TODO: this patch should become a documentation patch that changes the docstring only to say that None is deprecated, and cross-references the relevant github issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant