Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a new scheduler algorithm for determining job input candidates #3602

Closed
vito opened this issue Mar 27, 2019 · 41 comments

Comments

@vito
Copy link
Member

@vito vito commented Mar 27, 2019

Tentative issue to track known issues with the current algorithm and track any progress/thoughts on a new algorithm to replace it.

Problems with the current algorithm:

  • It loads up all versions and build inputs and outputs for the entire pipeline, meaning memory use and network transfer from the DB will increase as this data set grows. This is also slow as it involves a bunch of joins across a ton of data. See #2624, #3487, #3426.
    • We are also careful to cache this so it's only loaded when the data changes, but that can only help so much - periodic jobs or resources will invalidate the cache anyway.
  • The algorithm itself is fairly brutal; it's hella optimized but prone to worst-case scenarios where it churns and uses a ton of CPU. And it's difficult to instrument without slowing it down.

An experimental new algorithm lives on the algorithm-experiment branch.

@vito

This comment has been minimized.

Copy link
Member Author

@vito vito commented Apr 2, 2019

This is going to be dependent on #413.

The new algorithm uses build history as the source of truth, i.e. the latest build for a given job becomes the first candidate to consider for downstream builds. This is different from the old algorithm, which went through versions, newest-to-oldest, and tried to find matching sets.

This approach has a ton of benefits, but breaks expectations for the "pin old version -> re-trigger job -> un-pin old version" flow, as the new build's older version will now be the first candidate to consider.

@vito

This comment has been minimized.

Copy link
Member Author

@vito vito commented Apr 3, 2019

Quick update: in working support for version: every in to this new algorithm we've landed on an approach that should make 'resource causality' (#1538) much cheaper to query for. We'll be tracking the history of which builds from which jobs "fed into" each build. So now 'causality' can be done by just following that flow through the new build_pipes table and collecting the input/output versions along the way. Pretty neat!

@clarafu

This comment has been minimized.

Copy link
Contributor

@clarafu clarafu commented Apr 17, 2019

Update: We are removing the independent build inputs table which is used by the preparation step (showing resolved inputs for the progress of pending builds) because the way the scheduler worked was it would resolve the inputs independently for the independent build inputs and then did another resolve for all inputs for the next build inputs. This method worked before as we only needed to load up all the versions at the beginning and just compute in memory the inputs twice. But with the new algorithm, we query during the computation of the inputs and doing the resolving of inputs twice will mean doubling the workload.

After some discussion, we decided to have the resolving of all inputs construct a list of input candidates and save either a version or an error for every input. The algorithm will always try to compute every input and after resolving all the inputs, it will determine whether or not to schedule a new pending build depending if all the inputs have valid versions as the candidate. If there is an error within one of the input candidates, then a new pending build will not be scheduled. The error state within the candidate will be shown to the user as well as information about what went wrong (ex. pinned version is missing, the input is vouched for by job 1 and 2 but not 2).

@vito

This comment has been minimized.

Copy link
Member Author

@vito vito commented May 3, 2019

We're going to make a semantic change to get steps with passed constraints: the name of the get step will now refer to the name of the output from the referenced (passed) jobs. This way, in situations where the upstream jobs have multiple occurrences of the same resource (i.e. version and final-version), the name of the get step clearly disambiguates it.

This is a backwards-incompatible change, so we'll be bumping to 6.0, which we probably would be doing regardless as the new algorithm's passed constraint behavior is already subtly different - this just makes it clearer, because now they explicitly refer to job output sets.

To make the migration path easier, we can have static verification so that if you say get: foo, passed: [a, b, c] we can validate that the jobs a, b, and c all have an output named foo. This would catch scenarios where an input or output was renamed in the upstream jobs but the downstream job only referred to the resource.

@vito

This comment has been minimized.

Copy link
Member Author

@vito vito commented May 6, 2019

Update for #3602 (comment): rather than attempting all inputs and saving the partial inputs and collecting all the errors, we're going to just have the algorithm return the first error. It's costly and kind of difficult to present useful information going beyond the first error, so we'll just try to return the first one with as much detail as possible instead.

@clarafu

This comment has been minimized.

Copy link
Contributor

@clarafu clarafu commented May 8, 2019

We decided to save the partial inputs and the resolve error into the next_build_inputs table because it will allow us to show in the preparation pending build view the state of each input rather than the overall state of whether the algorithm was able to resolve or not. We will still be returning the first resolve error and setting a static error for all the later inputs that we have not successfully resolved yet. We can later show in the UI a different state (other than the current blocking error state) for the inputs that we have not yet gotten to resolving.

TODOS:

  • Returning the first error we hit in the algorithm and save it into the resolve_error column on the jobs table
    • Use resolve_error in preparation
    • Write more tests for error cases?
  • Show a different state for when the resolve error is the have not reached/attempted to resolve this input yet
  • Implement pinned with passed constraints within the algorithm
  • Write better error messages (ex. use job names instead of job ids)

Later Todos:

  • Optimize pinned with passed constraints
  • Reimplement LoadVersionsDB for the get versions db api endpoint
  • Are there any migrations we need to do by dropping the independent build inputs table?
  • Make AlgorithmInput a pointer in the InputResult object?
  • Make build outputs use input name instead of resource id
  • After optimizing pinned with passed, now if the pinned version is not in the db we do not expose that error to the user through the preparation, it is only exposed in the web logs, is that ok???

After testing, things to optimize for:

  • Add indexes to build pipes
  • First Occurrence query run a lot... try to optimize
  • Fetching all the builds at once in the algorithm returns a ton of rows, maybe we can lazy load them?
@vito vito added this to the v6.0.0 milestone May 10, 2019
clarafu added a commit that referenced this issue May 15, 2019
indepdendent build inputs are removed and replaced with partially filled
out next build inputs. Next build inputs can also contain resolve
errors, which will be shown to the users as to why the input cannot be
resolved. The algorithm will also stop at the first input that it is
unable to resolve. Inputmapper and transformer packages were removed and
condensed into the algorithm.

[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
Co-authored-by: Joshua Winters <jwinters@pivotal.io>
@clarafu

This comment has been minimized.

Copy link
Contributor

@clarafu clarafu commented May 16, 2019

Added a BuildStatusPreparationSkipped status that can be returned by the build preparation endpoint in order to represent the state of an input that has not been attempted to fully resolve yet because of an error that was reached with a previous input. This state will need to be consumed by the web UI, maybe showing the state as just the input with a greyed out text and no spinning icon?

@vito

This comment has been minimized.

Copy link
Member Author

@vito vito commented May 21, 2019

Going back on my word for #3602 (comment), at least for now. I think having get: x refer to named outputs introduces a few uncomfortable situations.

  1. In general, it makes jobs more tightly coupled. When I say get: x, resource: y, passed: [a, b, c] I now have to make sure a, b, and c all use x as their name for y. While this encourages consistency throughout the pipeline, it's also more to keep in your head, and potentially leads to changing jobs b and c just because a had to rename it (for whatever reason).
  2. It makes it impossible to rename an input in a downstream job. get with passed is effectively "stuck" at whatever name was used at the earliest point in the pipeline. There's no "get output x of job a renaming it as z" syntax - we would have to add it.
  3. This would be a backwards-incompatible change, which is fine (that's what semver is for), but it has a migration path that is somewhat difficult to reason about. You would basically have to audit all the names for all get steps, because now they have implications outside of the scope of the job. Jobs are currently designed to be fairly isolated and independent, so any trend away from this is cause for concern.

For now, I think we should just stick to today's behavior, which is to always pick the latest version for each resource.

/cc @pivotal-jwinters

pivotal-jwinters pushed a commit that referenced this issue May 30, 2019
By resolving the pinned version to IDs earlier, we can run those queries
only once before the algorithm rather than having to run the query
multiple times if we put it in the algorithm.

[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
Co-authored-by: Rui Yang <ryang@pivotal.io>
pivotal-jwinters added a commit that referenced this issue May 30, 2019
[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
Co-authored-by: Joshua Winters <jwinters@pivotal.io>
xtremerui pushed a commit that referenced this issue May 31, 2019
[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
Co-authored-by: Alex Suraci <suraci.alex@gmail.com>
clarafu added a commit that referenced this issue Jun 5, 2019
[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
Co-authored-by: Alex Suraci <suraci.alex@gmail.com>
@vito

This comment has been minimized.

Copy link
Member Author

@vito vito commented Jun 9, 2019

This looks to be a valuable index to add:

CREATE INDEX deleteme ON builds (job_id, id DESC) where status = 'succeeded';

Here's how the EXPLAIN ANALYZE looks with/without the index:

concourse@127:atc> EXPLAIN ANALYZE SELECT b.id FROM builds b WHERE b.job_id = 5 AND b.status = 'succeeded' ORDER BY b.id DESC;                                                             
+----------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                       |
|----------------------------------------------------------------------------------------------------------------------------------|
| Index Only Scan using deleteme on builds b  (cost=0.43..4598.27 rows=4685 width=4) (actual time=0.051..10.511 rows=5184 loops=1) |
|   Index Cond: (job_id = 5)                                                                                                       |
|   Heap Fetches: 1099                                                                                                             |
| Planning Time: 0.160 ms                                                                                                          |
| Execution Time: 10.891 ms                                                                                                        |
+----------------------------------------------------------------------------------------------------------------------------------+
EXPLAIN
Time: 0.088s
concourse@127:atc> DROP INDEX deleteme;                                                                                                                                                    
You're about to run a destructive command.
Do you want to proceed? (y/n): y
Your call!
DROP INDEX
Time: 0.063s
concourse@127:atc> EXPLAIN ANALYZE SELECT b.id FROM builds b WHERE b.job_id = 5 AND b.status = 'succeeded' ORDER BY b.id DESC;                                                             
+------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                         |
|------------------------------------------------------------------------------------------------------------------------------------|
| Sort  (cost=16976.91..16988.63 rows=4685 width=4) (actual time=65.040..65.571 rows=5184 loops=1)                                   |
|   Sort Key: id DESC                                                                                                                |
|   Sort Method: quicksort  Memory: 436kB                                                                                            |
|   ->  Bitmap Heap Scan on builds b  (cost=185.63..16691.27 rows=4685 width=4) (actual time=2.725..62.929 rows=5184 loops=1)        |
|         Recheck Cond: (job_id = 5)                                                                                                 |
|         Filter: (status = 'succeeded'::build_status)                                                                               |
|         Rows Removed by Filter: 196                                                                                                |
|         Heap Blocks: exact=5792                                                                                                    |
|         ->  Bitmap Index Scan on builds_job_id  (cost=0.00..184.46 rows=4804 width=0) (actual time=1.629..1.629 rows=6416 loops=1) |
|               Index Cond: (job_id = 5)                                                                                             |
| Planning Time: 0.242 ms                                                                                                            |
| Execution Time: 65.851 ms                                                                                                          |
+------------------------------------------------------------------------------------------------------------------------------------+
EXPLAIN
Time: 0.124s

At face value it's a minor improvement but it really adds up. Prior to adding this index to our testing environment the web node backend connection pool was maxed out at 32, Postgres was showing high memory usage, etc. Adding the index seems to have improved things pretty dramatically. Unfortunately I can't figure out how to share our dashboard publicly, but here's a link capturing the time range for posterity:

https://app.datadoghq.com/dashboard/e2x-kuz-6md/concourse?screenId=e2x-kuz-6md&screenName=concourse&from_ts=1560047880818&live=false&tile_size=m&to_ts=1560048650000&tpl_var_environment=concourse-algorithm

Here's a screenshot for now:

image

A few things to note:

  • Per-job scheduling time is now lower than ever.
  • The connection pool became unstuck, though it continues to flirt with the 32 cap every now and then.
  • The worker container count climbed prior to adding the index - this tends to happen when the connection pool is full as GC can't run.
  • DB memory usage went down significantly
  • Queries and DB CPU usage went up, but this seems to just be a result of the connection pool un-clogging; it's in the same ballpark it was in before.
@vito

This comment has been minimized.

Copy link
Member Author

@vito vito commented Jun 10, 2019

Another thing that may help: currently we query both build_resource_config_version_inputs and build_resource_config_version_outputs all the time to find versions that were produced by builds. More specifically, we care about succeeded builds for a particular job.

What if we had a single table that made this querying easier, by containing all inputs + outputs, only for succeeded builds, and had a job_id column directly on the table so we don't have to join? We could insert into it whenever a build succeeds. A materialized view is tempting here but it'd probably be refreshed way too often. Might be better to just denormalize it.

@pivotal-jwinters

This comment has been minimized.

Copy link
Contributor

@pivotal-jwinters pivotal-jwinters commented Jun 10, 2019

What if we just started adding implicit outputs to build_resource_config_version_outputs? That would make it so that we only ever have to query the outputs table for everything.

And as far as I can tell (outside of the algorithm) the only thing we use the outputs table for is build.Resources which is used by web.

EDIT: Actually on second thought keeping explicit outputs separate seems nicer. I wonder if we can add some better indexing to help us here as well. It just seems like we'd be duplicating a lot of data if we denormalize this table.

@vito

This comment has been minimized.

Copy link
Member Author

@vito vito commented Jun 10, 2019

EDIT: Actually on second thought keeping explicit outputs separate seems nicer. I wonder if we can add some better indexing to help us here as well. It just seems like we'd be duplicating a lot of data if we denormalize this table.

Yeah we've actually had problems in the past from putting implicit outputs in there. We used to, but it made it hard to distinguish between them on e.g. the resource page ("inputs to" / "outputs of").

I agree that duplicating the data into this table could be problematic. We'll spike on it and see what the disk usage ends up being. If it ends up being a lot of disk usage but a huge win in terms of scheduling performance we may want to keep iterating on it and find ways to lower the disk usage. The only column I'm really worried about is version_md5 - maybe we could find an alternate approach to that?

An intermediate step could be to just add job_id to the inputs/outputs tables. We'd still need to join to only match succeeded builds though. :/

@jchesterpivotal

This comment has been minimized.

Copy link
Contributor

@jchesterpivotal jchesterpivotal commented Jul 24, 2019

If and when we get to using partitioned tables, the v11 docs say that you can apply an index which itself be partitioned. I expect that most build scheduling is looking at recent records, so partitioning over time should further reduce the search cost.

@clarafu

This comment has been minimized.

Copy link
Contributor

@clarafu clarafu commented Jul 25, 2019

We added metrics to each of the garbage collection component (resource caches, builds, containers, volumes, etc) in order to see how long each takes to fully complete and how the metrics look when the autovacuum starts. The graph below shows the time it takes each gc component to run and the vertical line is approximately where the autovacuum on the builds table started.

Screen Shot 2019-07-25 at 1 42 06 PM

You can see that all the lines slowly go up near the black vertical line, meaning that they are all taking longer to run during the autovacuum. The autovacuum on the builds table ended up taking around 1300 seconds to run and once it ended, the time it took all the gc components to run went back to normal. (You can see a sudden drop back down on the graph) We presume that when an autovacuum runs for a big table (in this case the builds table) it slows down all other queries to the db possibly because the autovacuum takes up a lot of the db CPU? Taking into consideration that the garbage collector runs each component in series and the container and volume gc is the last to run, it means that it has to slowly finish each component before getting to the container gc which might be the cause of the workers periodically hitting max containers.

The solution that we tried was to parallelize each gc component so that they run on their own intervals and have their own dedicated connection pools. This way they will never compete with each other and it doesn't have to wait for all the other gc components to finish before running the container gc. After we deployed this change and observed when an autovacuum kicked off for a large table, we saw that the container count for the workers did not spike at all.

Now that we fixed the max container issue, we also want to look into having queries not be affected by autovacuum runs. Right now, we are still seeing spikes in many of our metrics.

Screen Shot 2019-07-25 at 1 57 44 PM

Screen Shot 2019-07-25 at 1 57 52 PM

Screen Shot 2019-07-25 at 1 58 19 PM

Screen Shot 2019-07-25 at 1 58 14 PM

Screen Shot 2019-07-25 at 1 58 02 PM

Screen Shot 2019-07-25 at 1 58 26 PM

Each of these graphs show the metrics at the same time range, and you can see that all the graphs have two distinctive spikes except for the worker container count. We might want to look into possible indexes that we can remove or tune from large tables in order to speed up the autovacuums or some postgres configurations that we can adjust to allow for autovacuums to not affect the db that much.

clarafu added a commit that referenced this issue Jul 25, 2019
Refactored the creation of all the different GC components.

Fixed a bug within the algorithm where if there are two inputs with the
same resource id and version, it would overwrite the version candidate
twice. This causes the restore to be set to the same value and not nil,
so when it restores, it doesn't properly set the candidate to empty.
Also it adds the feature of preferring outputs over inputs, where
outputs are always ordered before inputs so we will set the candidate to
the output version always (if they have the same resource id).

Added to the algorithm so that pinned version not found errors are added
to the next build inputs table so that it will appear in the
preparation.

[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
@vito vito added this to In progress in Algorithm v3 Jul 30, 2019
clarafu added a commit that referenced this issue Aug 1, 2019
The skipped column in the next build inputs was removed because we will
now only save either a version or an error in that table. So the table
will either contain a fully resolved set of versions for all its inputs
or a few errors for inputs and all the other inputs won't have a row
populated. We used to fill out the table with partially resolved inputs
when there were errors (with versions that we were able to partially
able to resolve to and also skipped inputs that were never reached
within the same resolve call). This was removed to only populate errors
when the resolve failed because we don't expect to show the partially
resolved versions to the users, so adding this feature is useless at
this time. It also added complexity to the algorithm.

[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
Co-authored-by: Alex Suraci <suraci.alex@gmail.com>
clarafu added a commit that referenced this issue Aug 1, 2019
Split off the algorithm into different files representing different
components or objects within the algorithm. The flow is currently that
it starts with the algorithm compute, then it grabs new resolvers from
the resolver file and uses those resolvers (which might be the pinned,
individual or group resolvers) to resolve all the inputs. After it
finishes resolving, it uses the input mapper to map the candidate
versions into an input mapping object.

Also did minor cleanups everywhere to get tests to pass, fixed the flake
within the first occurrence tests and moved the gc Collector interface
to the atccmd package. This is because we no longer have an aggregate
object so we can move the interface to where it is actually used.

[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
clarafu added a commit that referenced this issue Aug 1, 2019
We are testing out using a jsonb column for the successful build outputs
table in order to replace having multiple rows for each build
input/output. Instead we are trying to collect all the build outputs and
inputs into a map of resource IDs to slice of versions for each build
and then storing that map as a json into the jsonb column. We are using
jsonb in order to add an index to a field on the json column and also
using the postgres feature where you can query for specific fields
within the jsonb column.

[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
clarafu added a commit that referenced this issue Aug 1, 2019
Refactored the creation of all the different GC components.

Fixed a bug within the algorithm where if there are two inputs with the
same resource id and version, it would overwrite the version candidate
twice. This causes the restore to be set to the same value and not nil,
so when it restores, it doesn't properly set the candidate to empty.
Also it adds the feature of preferring outputs over inputs, where
outputs are always ordered before inputs so we will set the candidate to
the output version always (if they have the same resource id).

Added to the algorithm so that pinned version not found errors are added
to the next build inputs table so that it will appear in the
preparation.

[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
clarafu added a commit that referenced this issue Aug 1, 2019
The skipped column in the next build inputs was removed because we will
now only save either a version or an error in that table. So the table
will either contain a fully resolved set of versions for all its inputs
or a few errors for inputs and all the other inputs won't have a row
populated. We used to fill out the table with partially resolved inputs
when there were errors (with versions that we were able to partially
able to resolve to and also skipped inputs that were never reached
within the same resolve call). This was removed to only populate errors
when the resolve failed because we don't expect to show the partially
resolved versions to the users, so adding this feature is useless at
this time. It also added complexity to the algorithm.

[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
Co-authored-by: Alex Suraci <suraci.alex@gmail.com>
clarafu added a commit that referenced this issue Aug 1, 2019
Split off the algorithm into different files representing different
components or objects within the algorithm. The flow is currently that
it starts with the algorithm compute, then it grabs new resolvers from
the resolver file and uses those resolvers (which might be the pinned,
individual or group resolvers) to resolve all the inputs. After it
finishes resolving, it uses the input mapper to map the candidate
versions into an input mapping object.

Also did minor cleanups everywhere to get tests to pass, fixed the flake
within the first occurrence tests and moved the gc Collector interface
to the atccmd package. This is because we no longer have an aggregate
object so we can move the interface to where it is actually used.

[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
clarafu added a commit that referenced this issue Aug 1, 2019
add back the functionality to use the get versions db api endpoint in
order to easily grab information from users about the current state of
their database.

[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
clarafu added a commit that referenced this issue Aug 2, 2019
Updates the GIN index on the successful build outputs table to turn fast
update on false. This will make postgres update the index each time a
row is inserted into the table instead of if fast update being on where
it would've built up a pending list of inserted rows and inserted that
pending list as a batch update onto the index. This feature is great for
when the table the index is on does not get new rows often but for our
case, the successful build outputs table is constantly being inserted
with new rows so it actually made the performance slower.

[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
clarafu added a commit that referenced this issue Aug 12, 2019
[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
clarafu added a commit that referenced this issue Aug 13, 2019
This will remove all the first_occurrence yellow arrows for all past
builds. For most people, it will be ok to remove these indicators
because they don't care about the previous builds. For the people that
do care, we are thinking of adding a separate batch command you can run
to fill in that data. This will be an expensive operation so we thought
it would be best to have it be optional (since it is purely used for
cosmetic purposes).

[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
clarafu added a commit that referenced this issue Aug 13, 2019
Before, the scheduled state on the builds object would be used to
determine whether a build is ready to be used within the algorithm (it
has all its build inputs moved from next build inputs to the build
inputs table). But after our change with fixing the bug with max in
flight, the scheduled boolean now is used to determine whether this
build has passed the max in flight test. We still need a boolean to help
us know when a build is ready to be used by the algorithm (has
inputs/build pipes in the db) so I added a new boolean called inputs
ready. There might be a better way to restructure the buildstarter that
will involve not needing to have two different boolean states, but this
way is more explicit with how far the build has gotten in terms of the
buildstarter code which is very important.

[#3602]

Signed-off-by: Clara Fu <cfu@pivotal.io>
Co-authored-by: Alex Suraci <suraci.alex@gmail.com>
@vito vito moved this from In progress to Done in Algorithm v3 Aug 15, 2019
@clarafu

This comment has been minimized.

Copy link
Contributor

@clarafu clarafu commented Sep 30, 2019

Something to keep an eye on is the fact that we don't order the versions saved into the successful_build_outputs table if it is the same resource within the same build. This will result in a user possibly seeing different versions being used as a passed constraint for their upstream jobs, rather than if we always ordered it from latest to oldest versions (but this will introduce extra joining to the resource versions table and will add a lot of overhead to the queries).

We can see if anybody complains. :D

@jchesterpivotal

This comment has been minimized.

Copy link
Contributor

@jchesterpivotal jchesterpivotal commented Sep 30, 2019

Could you elaborate on the last one? On its face I think folks would be surprised to see reordering.

@clarafu

This comment has been minimized.

Copy link
Contributor

@clarafu clarafu commented Sep 30, 2019

@jchesterpivotal It would have to be a situation where someone has a job that has two inputs that are the same resource but different versions. For example,

jobs:
- name: example
   plan:
   - get: input-1
      resource: resource-1
   - get: input-2
      resource: resource-1
      version: {some: ref}

If this job became a passed constraint for resource-1 in an upstream job, either the latest version or the pinned version could be used as the input version for the upstream job.

@vito maybe u can have a say in this since it would create more overhead on the queries.

@vito

This comment has been minimized.

Copy link
Member Author

@vito vito commented Sep 30, 2019

It feels like a bit of an edge case, and one where I'm not super convinced that picking the newest version is technically the 'right thing', either. Just imagining a scenario where someone is pinning input-2 to an older version than input-1 until one day when they pin to a newer version than input-1.

@vito vito moved this from Done to End goals in Algorithm v3 Nov 19, 2019
@vito vito moved this from End goals to Done in Algorithm v3 Nov 19, 2019
@vito vito closed this Nov 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Algorithm v3
  
Done
7 participants
You can’t perform that action at this time.