Allow to configure a seperate db for build logs #5306

evanchaoli · 2020-03-14T00:47:08Z

evanchaoli
Mar 14, 2020
Collaborator

What challenge are you facing?

As more and more pipelines are onboard, I see build logs as a potential risk. Some pipelines require to retain very long period of build logs.

What would make this better?

I'm thinking if Concourse can allow a separate database to store build logs. Which has at least two benefits:

Keep the core db small and fast
Allow more build logs to retain

Are you interested in implementing this yourself?

Back-end change is ok, but I try to avoid to touch UI code.

gowrisankar22 · 2020-03-16T07:08:49Z

gowrisankar22
Mar 16, 2020

This is something which is important for our case as well. We have more than 50 teams and 300 over pipelines(it will grow). This will be a potential risk.

0 replies

agurney · 2020-03-17T19:50:52Z

agurney
Mar 17, 2020

Is it feasible for you to use the existing syslog functionality, coupled with a more aggressive build pruning strategy? Then you could have a brief retention period for "roughly current" builds where they're visible in Concourse itself - including the important case of builds which are currently running - backed by another system, where logs can be retained for longer. Perhaps that's syslog feeding into Filebeat then Elasticsearch, or it's syslog to fluentd to S3, or whatever you fancy.

Postgres is honestly not the most natural choice for log storage, but it does have the advantage that it's already there in any Concourse installation - as opposed to an additional dependency for operators to set up and manage. If I imagine a second Postgres for just the logs, I can see how that would work at a code level, and it would have a few benefits. One is that log reaping could happen without necessarily degrading performance on the "main" DB. (Currently, if you have a "really huge" build log from a runaway process, it can be tricky to delete since Postgres doesn't love multi-million-row changes.) But otherwise I am quite wary of this idea since it's a fair amount of extra complexity, all to store logs in basically the same way.

Personally, I think one thing that could be handy is the ability to read logs back in from wherever it is that syslog put them. Right now, old builds are tombstoned, but perhaps we could instead arrange for them to gain a "cold storage URL". When you tried to view such a build, Concourse would HTTP GET the log from that URL, and then return it through the API in just the same way as currently happens with selection of rows from the DB - it ends up displayed in the web UI, or fly watch, and the end user is unaware that it was coming from longer-term storage.

That said, this would involve a few extra bits and pieces of configuration.

A mechanism to set the cold storage URL. Is there just a privileged API for an admin user to set the URL? The idea would be that you build up some automation where once the build log comes over the syslog path, the recipient calls back into that API to provide the cold storage URL. (It makes sense to do this immediately, as otherwise you don't know when reaping is going to happen. The logic has to be that if logs are in the DB, they take priority, but if the logs are reaped then you fall back to the cold storage URL, if any.)
Probably some sort of auth information is needed when retrieving those logs. Does Concourse just forward a regular Concourse JWT, which the cold-storage system has to interpret? Or should Concourse instead send a statically-configured token value, and the receiving system just trusts "aha, this is Concourse, I believe this value would only be sent if Concourse was dealing with a properly logged-in user with the correct rights"? I'm not sure offhand what the right choice is, for the diversity of what the cold-storage end locations might look like.

0 replies

deepakmohanakrishnan07 · 2020-03-20T15:38:29Z

deepakmohanakrishnan07
Mar 20, 2020

Based on the discussion and Alex inputs, I tried to implement a POC in my fork to store pipeline logs into an Elasticsearch cluster and it works seamlessly.
(This is my first Go code, and it has no unit/functional tests).
POC: deepakmohanakrishnan07@59dca83

The idea is to provide an optional external storage solution, probably an elasticsearch cluster, for persisting pipeline event logs instead of storing in postgres.
I think in a typical concouse installation, 95% of postgres dataset would be pipeline logs over a period of time. I think this feature would help those who runs concourse in large scale providing CICD solution at their enterprise level.
The reason for being this as optional component is for those who run concourse in smaller scale may not feel this as useful as it sounds, because its an additional dependency for operators to set up and manage.

0 replies

deepakmohanakrishnan07 · 2020-05-01T13:57:06Z

deepakmohanakrishnan07
May 1, 2020

The support for pluggable event processor looks a great value add.

0 replies

aoldershaw · 2020-05-01T14:43:28Z

aoldershaw
May 1, 2020
Maintainer

@deepakmohanakrishnan1984 if you're referring to my commit here: 1d3172d, just FYI it's very much a WIP, and is just something I'm experimenting with - so I wouldn't expect it any time soon.

That said, I'll keep playing around with it when I get the chance, and hopefully it'll materialize into something useful!

1 reply

aoldershaw May 1, 2020
Maintainer

For some context, the idea here is to try to remove the coupling of build events with the database by introducing an EventProcessor interface.

EventProcessors can be chained, and the terminal EventProcessor should implement a wider EventStore interface that knows how to store, retrieve, and delete to/from the desired backend.

Chaining EventProcessors should allow us to decouple things like secret redaction from the core build running logic.

deepakmohanakrishnan07 · 2020-05-01T15:36:25Z

deepakmohanakrishnan07
May 1, 2020

yes, I was referring to your commit and the vision you have on chaining event processors. I will look deeper on it shortly and try if I can contribute fixing some tests or writing a terminal event processor for storing events in elasticsearch(which I am super interested).

0 replies

aoldershaw · 2020-05-01T16:22:40Z

aoldershaw
May 1, 2020
Maintainer

That sounds cool! I'm going to convert this issue to a discussion for now, so hopefully we can get some more visibility on this and ideas from the rest of the community!

0 replies

aoldershaw · 2020-05-10T17:18:23Z

aoldershaw
May 10, 2020
Maintainer

I've added an RFC with a proposal for how to approach this: concourse/rfcs#53

If anyone has the chance to read it through, I'd be happy to hear any feedback!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to configure a seperate db for build logs #5306

{{title}}

Replies: 8 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Allow to configure a seperate db for build logs #5306

evanchaoli Mar 14, 2020 Collaborator

What challenge are you facing?

What would make this better?

Are you interested in implementing this yourself?

Replies: 8 comments · 1 reply

gowrisankar22 Mar 16, 2020

agurney Mar 17, 2020

deepakmohanakrishnan07 Mar 20, 2020

deepakmohanakrishnan07 May 1, 2020

aoldershaw May 1, 2020 Maintainer

aoldershaw May 1, 2020 Maintainer

deepakmohanakrishnan07 May 1, 2020

aoldershaw May 1, 2020 Maintainer

aoldershaw May 10, 2020 Maintainer

evanchaoli
Mar 14, 2020
Collaborator

Replies: 8 comments 1 reply

gowrisankar22
Mar 16, 2020

agurney
Mar 17, 2020

deepakmohanakrishnan07
Mar 20, 2020

deepakmohanakrishnan07
May 1, 2020

aoldershaw
May 1, 2020
Maintainer

aoldershaw May 1, 2020
Maintainer

deepakmohanakrishnan07
May 1, 2020

aoldershaw
May 1, 2020
Maintainer

aoldershaw
May 10, 2020
Maintainer