Time resource shouldn't share version history #2908

vito · 2018-12-04T17:28:31Z

What challenge are you facing?

With #2386 we identified that the time resource would be adversely affected by a shared history, because it'll fire off a ton of builds at once if everyone configures the same interval.

What would make this better?

In concourse/rfcs#1 we're planning to make this feature (shared history) opt-out, by allowing the resource's info script to include some configuration that disables it.

We don't want to do all of concourse/rfcs#1 just yet, so for now let's just hack in some way for v1 resources to opt-out.

Implementation note discussed with @clarafu: this could work by generating a random hash for resources that have opted-out and using it to generate a unique resource config for the resource. This would allow everything else to just keep working as-is (versions remain tied to resource configs, and we'll just have different resource configs).

The text was updated successfully, but these errors were encountered:

clarafu · 2019-01-02T22:10:00Z

After lots of discussion, we have decided to change the database schema in a few different ways than originally intended. For this current version, we will no longer be using the /info script to determine whether or not the resource should have shared version history, but rather use the resource_metadata.json file (which already exists) within each resource type's tarball to return back a unique_version_history field along with the existing fields (such as privileged). We might want to implement the /info for the resources V2 epic but we didn't want to worry about that for now because it might involve more architecture changes. One thing to note is that the worker version will need to be bumped in order to make sure that all the workers will register with latest change.

We will then save the boolean value into the unique_version_history column in the base_resource_types table. This introduces a little bit of weirdness due to the fact that the latest worker that gets registered will set the unique_version_history value.

A unique_versions_resource_id column will be added to the resource_configs table in order to mark resource configs that have unique version history. If the unique_versions_resource_id column is null, resources that point to that resource config will have shared version history. If it is not null, that resource_config will belong uniquely to the resource_id specified in the unique_versions_resource_id column (meaning unique versions).

ToDOO:

add migration for the two additional columns (unique_versions_resource_id foreign key to resources table id column ON DELETE CASCADE and unique_version_history boolean) and a unique constraint for source and unique_versions_resource_id for the resource_configs table
populate the unique_version_history from the WorkerResourceType (from resource_metadata.json)
findOrCreate in resource_configs.go, use the unique_version_history in order to determine if the versions will be shared by creating a new resource config for each resource id
add the option to specify unique_version_history for a resource in the pipeline config
add the unique_version_history field to the time-resource's resource-metadata.json
bump the worker version

clarafu · 2019-01-07T22:04:44Z

The feature was implemented with the unique_version_history property on the resource configuration, but the issue asks for it to be on the resource_type. This will come easy because when we construct the resource config for a resource that uses a uniquely configured resource_type, we can construct the resource config descriptor with it's unique value from the resourceTypes object.

topherbullock · 2019-01-09T16:26:24Z

Here's how it works now:

base resources can be marked as having unique (non-global) version histories
specific resource types in users' pipelines can be marked as having unique_version_history (similar to privileged)

TODO:

look into moving the uniqueness of the version history down to the resource_config_versionstable; currently this is on the resource_config, and this causes us to need to pass this around to all the places we deal with versions.

vito · 2019-01-09T17:11:25Z

@clarafu and I chatted about moving uniqueness down to the versions table rather than on the resource config. This would make it easier to feature-flag this (#3013) without causing an increase in check containers (caused by everyone having unique resource configs).

We initially proposed just adding a resource_id column to resource_config_versions, but thinking about it again we would need to be careful to know when to use unique vs. when to use global versions in the various queries that run against the resource_config_versions table. For example, if a resource is unique, the radar and scheduler will need to always look for resource_id = ?, but if not, it'll always need to look for resource_id IS NULL. The trouble is the scheduler doesn't inherently know whether it's unique; that's determined by the radar.

We didn't have this problem when the uniqueness was on resource_configs because the resource pointed directly to its already-unique resource config.

Maybe we need that join table after all? 🤔 Then the resource would just point to its entry in the join table, and versions would point back to it.

resource_version_ownership:

| id  | resource_config_id | resource_id |
| 123 | 456                | 789         | # for unique
| 101 | 112                | NULL        | # for global

vito · 2019-01-09T17:12:23Z

idk

vito · 2019-01-09T17:17:34Z

The join table kind of scares me because we would need a guarantee that there's only one NULL entry for a given resource config. And I don't think UNIQUE constraints work on NULL values. (We would have to make sure.)

topherbullock · 2019-01-09T17:18:38Z

Would we need the null values for each global resource config? Could we check that table before defaulting to global versions for a given resource config?

vito · 2019-01-09T17:21:16Z

@topherbullock Yeah, the thinking was that each resource config would have an entry with NULL for resource_id for the global versions. But there should only ever be one, so we'd need upsert semantics.

Not sure what you mean by defaulting to global versions

vito · 2019-01-09T19:09:03Z

And I don't think UNIQUE constraints work on NULL values.

Looks like this could be done with partial indexes. I think something like this would work:

CREATE UNIQUE INDEX resource_version_ownership_global 
ON resource_version_ownership (resource_config_id)
WHERE resource_id IS NULL;

CREATE UNIQUE INDEX resource_version_ownership_resource_id 
ON resource_version_ownership (resource_config_id, resource_id)
WHERE resource_id IS NOT NULL;

clarafu · 2019-01-10T16:01:07Z

A summary of where this issue is going now is that we will be adding a join table named resource_version_ownershipwhich has three columns: id, resource_id and resource_config_id. It will have a row for every resource config that is shared between resources and a row for every resource config that is unique for every resource. An example of the table would look like this:

resource_version_ownership:

| id  | resource_config_id | resource_id |
| 123 | 456                | 789         | # for unique
| 101 | 112                | NULL        | # for global

This join table will allow the creation of resource configs to be directly dependent on a pipeline resource. The population of this table will happen when we SetResourceConfig during radar. Resource config versions will then point to resource_version_ownership_ids instead of resource_config_ids because versions will be dependent on the ownership of each resource config.

Resource types versioning will also be affected because we currently save custom resource type's version history (since we treat resource types the same as resources in terms of them both having a resource config and resource config versions). But it does not make much sense to have resource types be able to have unique version history (it would be a very weird situation like this?

resource_types:
- name: some-type
  type: docker
  source: ...
  unique_version_history: true

- name: some-other-type
  type: some-type
  source: ...

) but since users can't even see (or probably don't even know that resource types have its own version history), we will just have resource types to always default to shared.

Todo list:

add migration for creating table
figure out how to consolidate unique between base resource type and custom types
work on SetResourceConfig to Select if ownership row exists, and if it does and the resource_config_id is different than the current on, delete it then insert. If it does not exist insert it.
make resource_config_versions point to resource_version_ownership_id instead of resource_config_ids (Changes to resource_config.go (SaveVersions, FindVersion, etc), pipeline.go (LoadVersionsDB), etc)

jchesterpivotal · 2019-01-18T22:07:46Z

My considered view is that NULLs are satan, or at least have satan on speed dial. There's no way to prove that a NULL wasn't inserted by buggy code and no way to repair the data if that's the case. Plus they cause wonkiness and edge cases in all sorts of queries and query plans.

(Disclaimer: Based on just passing by, I don't have deep context etc etc)

topherbullock · 2019-01-21T21:16:39Z

@jchesterpivotal I think as long as the schema ensures that combinations of values ( assuming NULL is a "value" ) always amount to a sensible state by using the indexes @vito mentioned above, and constraints where necessary, we'll maintain the integrity of the data and mitigate most of your (very valid) concerns.

We embrace NULL for GC of containers and volumes, where a foreign key constraint acts as a reference counter for a row, and when the set of reference fields is NULL we know it can be removed.

I'm satisfied so long as that NULL is useful, and the guard rails are there in the schema; but the devil is often in the details.

Finish up tasks before v5.0.0

vito added resiliency enhancement epic/resources-v2 labels Dec 4, 2018

clarafu assigned clarafu and xtremerui Jan 2, 2019

vito added this to the Resources v2 milestone Jan 7, 2019

vito removed the epic/resources-v2 label Jan 8, 2019

vito removed this from the Resources v2 milestone Jan 8, 2019

This was referenced Jan 8, 2019

Tagged resources should not re-use check containers from workers that don't have the configured tags #3002

Closed

Feature-flag 'global resources' functionality #3013

Closed

vito added this to the v5.0.0 milestone Jan 10, 2019

clarafu mentioned this issue Jan 18, 2019

Add support for unique version history #3090

Closed

clarafu mentioned this issue Jan 23, 2019

Finish up tasks before v5.0.0 #3133

Merged

vito closed this as completed in #3133 Jan 29, 2019

vito added a commit that referenced this issue Jan 29, 2019

Merge pull request #3133 from /issues/2908-3013-2951

4eb581a

Finish up tasks before v5.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time resource shouldn't share version history #2908

Time resource shouldn't share version history #2908

vito commented Dec 4, 2018

clarafu commented Jan 2, 2019 •

edited

clarafu commented Jan 7, 2019

topherbullock commented Jan 9, 2019

vito commented Jan 9, 2019

vito commented Jan 9, 2019

vito commented Jan 9, 2019

topherbullock commented Jan 9, 2019 •

edited

vito commented Jan 9, 2019

vito commented Jan 9, 2019

clarafu commented Jan 10, 2019 •

edited

jchesterpivotal commented Jan 18, 2019 •

edited

topherbullock commented Jan 21, 2019

Time resource shouldn't share version history #2908

Time resource shouldn't share version history #2908

Comments

vito commented Dec 4, 2018

What challenge are you facing?

What would make this better?

clarafu commented Jan 2, 2019 • edited

clarafu commented Jan 7, 2019

topherbullock commented Jan 9, 2019

vito commented Jan 9, 2019

vito commented Jan 9, 2019

vito commented Jan 9, 2019

topherbullock commented Jan 9, 2019 • edited

vito commented Jan 9, 2019

vito commented Jan 9, 2019

clarafu commented Jan 10, 2019 • edited

jchesterpivotal commented Jan 18, 2019 • edited

topherbullock commented Jan 21, 2019

clarafu commented Jan 2, 2019 •

edited

topherbullock commented Jan 9, 2019 •

edited

clarafu commented Jan 10, 2019 •

edited

jchesterpivotal commented Jan 18, 2019 •

edited