Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial implementations of Transformation, AlchemicalNetwork, Protocol; stubs for Executor components #13

Merged
merged 46 commits into from
May 25, 2022

Conversation

dotsdl
Copy link
Collaborator

@dotsdl dotsdl commented Mar 31, 2022

Adds the following components:

  • Transformation
  • Protocol
  • AlchemicalNetwork

Also adds stubs for Executor components; these were necessary to reason about Protocol structure.

Checklist

  • add AlchemicalNetwork.__init__ with state held in networkx graph
  • add initial set of unit tests

@pep8speaks
Copy link

pep8speaks commented Mar 31, 2022

Hello @dotsdl! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 66:80: E501 line too long (88 > 79 characters)

Line 57:80: E501 line too long (83 > 79 characters)

Line 175:80: E501 line too long (86 > 79 characters)

Line 18:80: E501 line too long (86 > 79 characters)
Line 32:80: E501 line too long (80 > 79 characters)
Line 50:80: E501 line too long (83 > 79 characters)

Line 32:80: E501 line too long (86 > 79 characters)
Line 46:80: E501 line too long (86 > 79 characters)
Line 51:80: E501 line too long (88 > 79 characters)

Line 33:80: E501 line too long (83 > 79 characters)
Line 50:80: E501 line too long (85 > 79 characters)
Line 60:80: E501 line too long (86 > 79 characters)

Line 11:1: E266 too many leading '#' for block comment
Line 49:1: E266 too many leading '#' for block comment
Line 62:1: E266 too many leading '#' for block comment
Line 117:80: E501 line too long (82 > 79 characters)
Line 122:80: E501 line too long (81 > 79 characters)
Line 125:1: E266 too many leading '#' for block comment
Line 131:80: E501 line too long (84 > 79 characters)

Line 74:80: E501 line too long (84 > 79 characters)

Line 27:80: E501 line too long (92 > 79 characters)
Line 37:80: E501 line too long (83 > 79 characters)
Line 39:80: E501 line too long (83 > 79 characters)
Line 46:80: E501 line too long (92 > 79 characters)
Line 60:80: E501 line too long (84 > 79 characters)

Line 140:80: E501 line too long (86 > 79 characters)

Line 23:80: E501 line too long (85 > 79 characters)
Line 27:80: E501 line too long (84 > 79 characters)
Line 47:80: E501 line too long (88 > 79 characters)
Line 50:80: E501 line too long (84 > 79 characters)
Line 95:80: E501 line too long (84 > 79 characters)

Line 9:80: E501 line too long (80 > 79 characters)
Line 160:80: E501 line too long (86 > 79 characters)

Comment last updated at 2022-05-25 02:15:44 UTC

gufe/network.py Outdated Show resolved Hide resolved
gufe/network.py Outdated Show resolved Hide resolved
gufe/transformation.py Outdated Show resolved Hide resolved
gufe/transformation.py Outdated Show resolved Hide resolved
gufe/transformation.py Outdated Show resolved Hide resolved
@dotsdl
Copy link
Collaborator Author

dotsdl commented Mar 31, 2022

So, we have objects for nodes and edges, but now we need to decide how state is structured in AlchemicalNetwork.
We have some options here.

  1. Don't store state as networkx graph at all; simply hold on to all nodes and edges fed to AlchemicalNetwork as objects already (e.g. Transformation, ChemicalSystem). Only yield a networkx.MultiDiGraph on-demand as a method call.
  2. Store all state as networkx graph; on AlchemicalNetwork init, convert all ChemicalSystem nodes and Transformation edges into a MultiDiGraph representation. Can use ChemicalSystems directly as nodes, but Transformations will have to be dangled off edges in the graph as attributes.
  3. Both 1 and 2, with duplicated state. Tolerable so long as object is immutable, but may present pain in serialization/deserialization roundtripping.
    • effectively works as a cache, with data in two forms most convenient for different purposes.
    • may also be the worst of both worlds

@dwhswenson
Copy link
Member

dwhswenson commented Mar 31, 2022

  • Don't store state as networkx graph at all; simply hold on to all nodes and edges fed to AlchemicalNetwork as objects already (e.g. Transformation, ChemicalSystem). Only yield a networkx.MultiDiGraph on-demand as a method call.

This is basically what we did the OpenFE ligand network. Advantage is that we can force things to be immutable, and we return a frozen NetworkX graph. It also gives us access to custom serialization for our objects (EDIT: by serializing our nodes and edges to strings and making that into a NetworkX graph, which is then serialized using NetworkX.)

@dwhswenson
Copy link
Member

Actually, I guess we sort-of do 3; but only because we cache the nx graph after first creation.

@dotsdl
Copy link
Collaborator Author

dotsdl commented Apr 19, 2022

Working notes from today, with some questions I could use some help resolving:

  1. name or identifier for user-settable identifier?
    • want this the same across all object, yes?
  2. should ChemicalSystem and Transformation have tags, key-value pairs?
    • basically, easy queryables for e.g. subgraph generation?
  3. for serialization, reasonable to decree that all objects must have a .to_dict method?
    • is there a "gold standard" serialization approach we should identify? GraphML better than JSON? Something else?
    • need a registry for Component, Protocol subclasses to allow roundtripping for non-pickle formats
  4. Although there may be conceptual value in separating Transformation from Protocol, not really any clear practical value
    • consider collapsing these abstractions together
      • subclasses of Transformation, NonTransformation are our protocols
      • openfe already does this

@dwhswenson
Copy link
Member

  • is there a "gold standard" serialization approach we should identify? GraphML better than JSON? Something else?

On this one point (too late at night for a more complete response -- and this is before even reviewing the code in detail), but I would say the folowing:

GraphML is a good serialization format for graphs. Its benefits are wide support, with the ability to recognize the same, repeated, node as the same object. That isn't guaranteed in naïve JSON representations. One of the advantages of GraphML is that it allows arbitrary node/edge attributes. (Personally, not a fan of XML, but GraphML is well supported by other programs.)

GraphML attributes can be stored as arbitrary strings. My recommendation would be for these strings to be JSON, which gives a good ability to serialize node/edge information. It becomes a kind of weird mix of XML and JSON, but there are good tools to handle each in most programming languages, so I think client code has the ability to work freely and serialize objects in this format.

@dotsdl
Copy link
Collaborator Author

dotsdl commented Apr 19, 2022

  • Don't store state as networkx graph at all; simply hold on to all nodes and edges fed to AlchemicalNetwork as objects already (e.g. Transformation, ChemicalSystem). Only yield a networkx.MultiDiGraph on-demand as a method call.

This is basically what we did the OpenFE ligand network. Advantage is that we can force things to be immutable, and we return a frozen NetworkX graph. It also gives us access to custom serialization for our objects (EDIT: by serializing our nodes and edges to strings and making that into a NetworkX graph, which is then serialized using NetworkX.)

Adopted your approach; agree it is quite good. 😁

from openff.units import unit


class SystemSettings(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be critical to sync up with the Open Force Field Initiative folks to come up with a self-consistent division between:

  1. Choices that impact the fully interacting potential energy function (which OpenFF is aiming to fully encode in the force field definition)
  2. Choices that impact the free energy difference computed for a transformation (atom mapping, alchemical modification functional form, thermodynamic state at which the calculation is performed)
  3. Choices that impact the efficiency of the calculation (integrators, barostats, collision rates, alchemical protocols, alchemical methods, alchemical intermediates, hydrogen mass)

Right now, you're mixing several of these together. This is highly problematic, and will lead directly to an inability to fulfill the primary mission of OpenFE (enabling free energy calculations with OpenFF force fields) because OpenFF is encoding all options that impact the fully interacting system potential function within the force field definition. Therefore, enabling options that impact the potential energy definition (such as nonbonded_method, nonbonded_cutoff, constraints, rigid_water) will mean you are failing to actually perform free energy calculations with the specified OpenFF force fields.

I'd be happy to help clarify these issues, but I think we need a design document that clearly separates these items.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we had this discussion internally and had a good argument over whether e.g. the cutoff is free from the force field etc, there's room for improvement here.

While we need to support openff force fields, I'm not sure we can support them at the exclusion of other force fields, so there might have to be variations on these settings where the parameters are stated separately (such as is happening here).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing in my argument that says you should support OpenFF to the exclusion of other force fields.
My only point is that you must support OpenFF force fields, which means that you will need to bundle all of the parameters that impact the potential energy for a fully interacting system together, separated from other parameters that impact simulation efficiency.

One way to do this would be to recognize that an OpenFF force field will bundle all of the parameters that impact the potential energy of a fully-interacting system into a single object. For other force fields, you can use a container object that similarly bundles everything together, explicitly specifying all the parameters left undefined by the choice of force field. These objects can then be used interchangeably to specify which "complete" force field should be used, and the choices of auxiliary parameters for traditional force fields like GAFF can subsequently be optimized to define best practices.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we had this discussion internally and had a good argument over whether e.g. the cutoff is free from the force field etc, there's room for improvement here.

Might I also suggest that internal discussions about how to represent force fields should likely also include a representative from OpenFF? There seems to be a great deal of accumulated knowledge within OpenFF that is not making its way into these discussions at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing in my argument that says you should support OpenFF to the exclusion of other force fields. My only point is that you must support OpenFF force fields, which means that you will need to bundle all of the parameters that impact the potential energy for a fully interacting system together, separated from other parameters that impact simulation efficiency.

Yep sorry misread this a bit at first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another great cross-cutting discussion to have might be "which force fields should be supported?" and "how can we unify force field support across simulation engines?". There are a number of tools that have been developed that attempt to band-aid this situation for now, like simtk.openmm.app.ForceField, openmmforcefields.generators.SystemGenerator, OpenFF Interchange, and now espaloma. If we know what you'd like to support, there's likely some things that can be done to make this easy that OpenFE need not solely support.



class TopologySettings(BaseModel):
"""Settings for creating Topologies for each component
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be more clearly defined? I'm not clear on what this class is supposed to represent from the brief doctstring.

Comment on lines 26 to 27
class Config:
arbitrary_types_allowed = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, this is required by pydantic to allow strange types. In the future I think we could write validators for openff.units objects, then register these with pydantic somehow and not have to have this setting

solvent_model = 'tip3p'


class AlchemicalSettings(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is a dumping ground for random perses alchemical configuration parameters. Perhaps we could meet specifically to discuss how to come up with a better object model to more flexibly represent alchemical protocols, alchemical intermediates, and the like? We've been meaning to do this for a while, and I think it can be done flexibly, extensibly, and elegantly. I fear that just dumping everything into an object like this with attributes that will need to be supported is going to be problematic in the medium- and long terms.



class EngineSettings(BaseModel):
"""OpenMM MD engine settings
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is general MM engine settings, not OpenMM specific?

return v


class BarostatSettings(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a problematic design, since it convolves parameters that define the alchemical state and hence impact the free energy (pressure) with parameters that simply control the efficiency of sampling (frequency).

I highly recommend instead following something like what openmmtools does and defining a ThermodynamicState that encodes the information about thermodynamic parameters that will impact the thermodynamic ensemble sampled (and hence the free energy)---such as temperature and optionally pressure, pH, etc.

This should be separated from settings that simply impact the efficiency of the sampling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting with a point of curiosity - I realise that at the limit of infinite sampling the MC frequency has no impact on how the thermodynamic state of the system, but given we are likely running short simulations and often non-equilibrium ones, does the change in relaxation time not affect the sampled free energy?

A thought on the splitting of parameters - I can understand & support the idea of separating out the "things that affect the ensemble", but at what limit can we differentiate between algorithmic details that can have an impact and broad ensemble parameters? (e.g. the choice of gamma_ln for Langevin integrators or the splitting sequence, etc...)

Copy link
Collaborator

@jchodera jchodera Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be many things that impact the the finite-time behavior of simulations---up to and including the exact choice of random numbers you generate. But it simply isn't sensible to convolve (1) parameters that impact the infinite-time limit of the calculation (thermodynamic state, potential function) with (2) parameters that impact the rate at which bias and variance (and hence total error to the asumptotic limit) diminish. There is a very good reason for this:

First, it's possible to independently optimize and tune (or even auto-tune adaptively on the fly) the simulation strategy to improve the rate of convergence to the asymptotic limit, and hence make the calculation more efficient. That makes for a very useful separation of effort, since work can be carried out independently on optimizing the efficiency of protocols, alchemical methods, engines, etc.

Second, if you attempt to optimize solely for accuracy of finite-time calculations to experiment directly, you will never get anywhere. Any choice you make (force field, calculation strategy, collision rate) will impact these finite-time calculations in uncontrollable and unknowable ways. We will never make any progress in improving either the force field, the protocol, or the calculation strategy because there is absolutely no way to separate any of these effects. Progress becomes completely impossible because all modifications are a random walk through calculation space.

arbitrary_types_allowed = True

timestep = 2 * unit.femtosecond
temperature = 298.15 * unit.kelvin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with BarostatSettings, this convolutes thermodynamic parameters that impact the free energy of interest (temperature) with a mess of settings that impact the efficiency of sampling and stability of sampling. These should be separated, grouping temperature, pressure, etc. into the thermodynamic state information.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also highly recommend considering we adopt some flexible definition of how the sampling is done, perhaps in a hierarchical manner: It's most important to specify first (1) what calculation do you want to perform (openff-2.0.0 at 300 K and 1 atm), then (2) how you computed it, at a high level (e.g. replica exchange, nonequilibrium switching), then (3) how exactly it is performed can be left to best practices defaults, auto-tuned, or specified explicitly if this specification can actually be implemented by the engine. But I imagine a great deal of work will be trying to automate best practices and auto-tuning for each calculation type.

More broadly, in openmmtools, for the lowest level (3), we have embraced a general MCMC framework that allows us to compose or chain together MCMC samplers that sample from the same alchemical state in order to efficiently and flexibly sample from within an alchemical state. This is a much more general design than specifying all the details of a single integrator, since a calculation may want to compose compatible types of sampling strategies.

Comment on lines 31 to 36
def __init__(
self,
*,
settings: ProtocolSettings = None
):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the two ChemicalSystems aren't here, then this class becomes a factory for simulations. I think being able to serialize the factory separately is a good idea for extending a network with the same protocol (else where is the factory stored).

In which case, I think run() isn't run, but instead prepare(Transformation(initial, final, mapping)) -> PreparedProtocol and PreparedProtocol has the run() and is_complete() etc.

edit: I think this is also what John is saying on the run method below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, here are my notes when I separated things out this way:

  1. Benefits of separating Transformation from Protocol
    • can have a single Protocol object instance serving many Transformations
      • can save on memory for very large networks, serialization cost
      • may be worthwhile making them function as singletons when the hash is identical
    • only two types of Transformation: Transformation and NonTransformation
      • we allow for a zoo of Protocols
    • can execute a Protocol in isolation, without any graph/network machinery at all
      • define ChemicalSystems, AtomMapping if appropriate, settings, and fire it off with

        Protocol(settings).run(start, end, mapping)
        
      • very similar to sklearn-style estimator classes

      • should we adopt a similar approach to sklearn for results?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will have to think some more on Protocol structure to accommodate @jchodera's points here. I think you're right that a Protocol with an initial, final, and mapping at this point is a job that's ready to be executed, and it is what would need to be serializable to be shipped to a worker process, local or remote.

PreparedProtocol.run could have an async=True kwarg that proceeds with execution in a separate thread or process, with its methods able to yield result status from file outputs or stream buffers to the main thread/process. We'll need to determine what operations we support (stop, start, extend?).

...

@abc.abstractmethod
def run(self,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is intended to be the actual class that can be run() to accumulate data from which a free energy difference can be estimated, it would be useful to structure this class so that it can meet the needs we've defined:

  • It must be serializable so we can store it to disk, transmit it over the wire, or restore it later
  • It must enable us to assess how complete the calculation is, rather than just check if it is fully complete
  • It must enable us to assess the current free energy estimate and the estimates of statistical fluctuation if we want to be able to adaptively run calculations
  • It must enable us to extend the calculation

I honestly don't think this structure makes sense as-is, though. You probably need to generate some kind of simulation object that is constructed from asking to perform a particular kind of transformation that would then provide an API for all of the above queries.

@richardjgowers
Copy link
Contributor

@dotsdl I think adding the settings definition in here is maybe a step too far and/or removed from the API for Protocol. We probably need to hash out the taxonomy of settings (which @jchodera has started) with a wider audience in something like a google doc and they try and encode this in nested pydantic structures later.

@dotsdl
Copy link
Collaborator Author

dotsdl commented May 21, 2022

Ready to go @richardjgowers! Got tests passing again after DAG restructuring from yesterday.

@IAlibay IAlibay self-requested a review May 23, 2022 16:44
@dotsdl
Copy link
Collaborator Author

dotsdl commented May 23, 2022

@IAlibay blackified changes in the PR against main using darker --revision main gufe.

gufe/mapping.py Outdated Show resolved Hide resolved
gufe/mapping.py Outdated Show resolved Hide resolved

Attributes
----------
initial : ChemicalSystem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't we converge on stateA and stateB for these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not married to initial and final, but I believe I'm going from @jchodera's suggestion from earlier on this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll keep A/B and have a great renaming once we're happy with the objects. I think we're still determining how many objects/layers there are right now.



# we subclass `Transformation` here for typing simplicity
class NonTransformation(Transformation):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So throwing my 2 cent here, I don't really like the Transformation and NonTransformation naming (especially the latter), it's strangely non-intuitive (especially if you have nontransformation sitting in transformations, I honestly can just see users sitting there being confused).

Is there an alternative we could use here? The concept of AlchemicalPaths (through states) or just calling them Experiments? Even being more descriptive with AlchemicalTransform(ation) would probably ease some of the likely pain points users are going to feel here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experiment sounds like a bench thing.

What a Transformation is the application of a Protocol onto specific chemical systems. It is also used as the edge on alchemical Graphs.

You could call it ParametrisedProtocol but I feel like we've already got too many names revolving around Protocol.

I don't hate AlchemicalTransformation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experiment sounds like a bench thing.

That's big bench biasing you to feel inferior about your computational work 🙃

You could call it ParametrisedProtocol but I feel like we've already got too many names revolving around Protocol

Yeah it ends up getting confusing (I would expected ParameterisedProtocol to be a child of a Protocol class, etc..)

Maybe something to through for a quick view at today's meeting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree NonTransformation is a bit obtuse. I think it's also totally descriptive without stacking in any additional, unintended meaning. One could even imagine it as a lambda protocol that changes a system to itself, effectively just doing dynamics and no alchemistry.

I'm fine with Transformation being renamed to AlchemicalTransformation, which goes with our AlchemicalNetwork.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll kick this down the road to the "great renaming" event we can have once we've got all the objects.

...


class TestNonTransformation:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a cursory look these two classes seem like they could be squashed to the one base class with just ntnf renamed to something that's declared as a class attribute?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is some potential for consolidation, but there are some differences in test_equality.

completed: Set[ProtocolUnit] = set()

while len(completed) != len(self._graph.nodes):
for pu in graph.nodes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pedantic but for the sake of sanity can we make pu a bit more explicitly named here? Someone is going to eventually mess up when looking at pu and pur

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworked this loop to be much simpler

self._kwargs = kwargs
self._name = name

def __hash__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an __eq__ to deduplicate units? (or maybe deduplication isn't happening at this level?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deduplication has to happen within the Scheduler with the way we've defined state for these objects. A ProtocolUnit should be deduplicated on the basis of its settings and the inputs to its execute method, so it's not possible to do all of that in __eq__.

gufe/tests/test_network.py Outdated Show resolved Hide resolved
@richardjgowers
Copy link
Contributor

@dotsdl I pushed some stuff up

graph = self._graph.copy(as_view=False)

# iterate in DAG order
for unit in reversed(list(nx.topological_sort(self._graph))):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're having to reverse the provided iterator for DAGs from Networkx. Not a drama, but it might be worth revisiting if we've got our DAG definition the right way round.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always found that the DAG order in NetworkX is backwards. For example, if you're recreating a tree of objects that depend on other objects, you need to create dependents first. That requires reversing the nx iterator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies @richardjgowers, from our discussion Friday we had decided to encode DAGs as ProtocolUnits pointing to their dependents, but traversal appeared easier to go the other direction (ProtocolUnits point to their dependencies).

I should have mentioned this in a comment here that I reversed it in the tests, but I think you noticed this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either is fine, you're either saying "A leads to B" or "B requires A", I just don't want to be going against the grain with how everyone else seems to phrase it. Can leave this as-is, but if we keep finding counter examples we might have to switch polarity.

Copy link
Contributor

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got an off by one error on the dependencies in one of the tests. I think I'll change the test to be more meaningful (sum of integers) so it'll be easier to debug.

@dotsdl
Copy link
Collaborator Author

dotsdl commented May 25, 2022

I've got an off by one error on the dependencies in one of the tests. I think I'll change the test to be more meaningful (sum of integers) so it'll be easier to debug.

The problem appears to be that nx.descendants gives all reachable nodes of the given node, not just nearest neighbors. We only want immediate dependencies included in dependency_results, but for FinishUnit we end up getting all previous ProtocolUnits in there.

We don't want *all* dependencies, recursively.
@dotsdl
Copy link
Collaborator Author

dotsdl commented May 25, 2022

Fixed tests with change to use only immediate successors instead of all descendants.

Copy link
Contributor

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok looks good enough to start trying it out now, I think it'll be easier to see how it fits once we try it on. @IAlibay want to merge?

Copy link
Contributor

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@richardjgowers please go ahead with the merge, I'm on the move right now.

@richardjgowers richardjgowers merged commit 5330b14 into main May 25, 2022
@richardjgowers richardjgowers deleted the transformations branch May 25, 2022 07:51
@dotsdl
Copy link
Collaborator Author

dotsdl commented May 25, 2022

Thanks all! Very excited to build on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants