Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to migrate database for v5.4.0 (Up_1561558376) #4139

Open
dpb587 opened this issue Jul 20, 2019 · 1 comment

Comments

@dpb587
Copy link
Contributor

commented Jul 20, 2019

Hi - I'm not sure if this is worth even raising as an issue. I think it was probably caused by an old bug, but somehow it left some data hiding in the database which is now being surfaced by one of the recent migrations. Even if it's a rare edge case, I figured I could at least document it in an issue for potential searches. Feel free to close if you think no changes are needed.

Bug Report

After my Concourse was upgraded from to 5.4.0 (from 5.3.0), my web process was not starting up successful. In the error logs I found:

$ tail /var/vcap/sys/log/web/web.stderr.log
...
failed to migrate database: 1 error occurred:
	* Migration 'Up_1561558376' failed: pq: column "type" contains null values

From that I was able to find 1561558376_add_type_to_resources.up.go where type seemed to be getting backfilled.

Manually looking through my database I found several entries which looked like they could cause a failure due to empty config and missing type:

$ /var/vcap/packages/postgres-9.6.4/bin/psql concourse
# select pipelines.name pipeline, resources.name resource, resources.id, resources.config from resources join pipelines on pipelines.id = resources.pipeline_id where length(resources.config) < 16;
       pipeline       |   resource    | id  | config
----------------------+---------------+-----+--------
 openvpn-bosh-release | rc-release    | 206 | {}
 openvpn-bosh-release | rc-repo       | 205 | {}
 openvpn-bosh-release | mastered-repo | 216 | {}
(3 rows)

From that I tried tracking down when those resources were configured (because they're not active resources in the pipeline). Unfortunately, those resource names don't ever appear in the repository's entire history, so it was likely something that happened during development.

So then I tried checking versioned_resources to see if they ever ended up tracking any versions:

# select * from versioned_resources where resource_id in (205, 206, 216);
1412 | {"ref":"3ae2895d556195c80f999eb1c40657b991d80d77"} | [{"Name":"commit","Value":"3ae2895d556195c80f999eb1c40657b991d80d77"},{"Name":"author","Value":"Danny Berger"},{"Name":"author_date","Value":"2016-04-30 00:02:01 -0700"},{"Name":"message","Value":"wip\n"}] | git  | t       |         205 |           2
1413 | {"path":"rc/release/openvpn-0.0.0-dev.13.tgz"}     | [{"Name":"filename","Value":"openvpn-0.0.0-dev.13.tgz"},{"Name":"url","Value":"https://s3.amazonaws.com/example...bucket/rc/release/openvpn-0.0.0-dev.13.tgz"}]                                               | s3   | t       |         206 |           2
...
(6 rows)

From that I was able to see the metadata git date and also cross-reference the s3 object metadata to see the timestamps were from late April 2016. It seems like it happened when I was first developing the repository's pipeline before I committed in May 2016.

From that, I checked back in my environment configuration history and it looks like I was running Concourse 0.75.0 at that time.

Given all that, I assume it was caused by a probably-old bug, but it seems suspicious that those records have been quietly sitting there unreferenced for three years?

Steps to Reproduce

Unfortunately, not sure if it's still reproducible, per se. I would theorize there was probably a since-fixed Concourse bug which allowed some incomplete database records to be created before updating a pipeline finished, and when updating it failed, the records were abandoned.

Expected Results

Successful database migrations; and probably no unnecessary ghost resources records that my pipelines are not using.

Actual Results

Migration failure.

Additional Context

I needed to get my Concourse up and running again, so I went ahead and tried to resolve it (in a possibly-not recommended way). I stopped the database to make a filesystem backup, and then I went ahead and manually removed those three ghost resources records, letting it cascade delete the others. After that I was able to start everything successfully.

I have the database backup for a little while longer in case there are any other details I can extract to help you investigate or patch.

Before the upgrade, that pipeline was green; and I don't recall seeing any weird, visual quirks in the UI that would have suggested the ghost resources.

As always, thanks for Concourse. It has been awesome watching it evolve over the years :)

Version Info

  • Concourse version: 5.4.0
  • Deployment type BOSH
  • Infrastructure/IaaS: AWS
  • Browser (if applicable): n/a
  • Did this used to work? yes

@dpb587 dpb587 added the bug label Jul 20, 2019

@vito

This comment has been minimized.

Copy link
Member

commented Jul 29, 2019

Hmm, yeah it looks like it chokes on old inactive resources from before we stored the config data?

We could just make it default to an empty string. Kinda hacky, but at least we don't have to invent a value. The row is already inactive, and for it to become active it'll become valid at the same time.

@vito vito added this to To do in Core side-road Jul 31, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants
You can’t perform that action at this time.