Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whitelisting firehose messages by org #57

Closed
bonzofenix opened this issue Jul 23, 2015 · 13 comments
Closed

whitelisting firehose messages by org #57

bonzofenix opened this issue Jul 23, 2015 · 13 comments
Labels

Comments

@bonzofenix
Copy link
Member

We are currently facing the following situation:

  • Stress tests on some apps can take down our ELK cluster.
  • We want logsearch and l4cf only consuming operations apps logs that are in some specific cf orgs.

For this reason we would like to have l4cf only parsing logs from whitelisted Orgs.
@mrdavidlaing, @malston I would like your input on this.
Where do you think we should implement this? Among the options that I thought are:

  • Firehose-to-syslog
  • Cloudfoundry-ingrestor
  • log_parser
  • Firehose
@mrdavidlaing
Copy link
Member

Stress tests on some apps can take down our ELK cluster.

Could you provide some specifics? How much load are you throwing at the apps, and a how many logs / second are they producing through the firehose?

@mrdavidlaing
Copy link
Member

take down our ELK cluster.

Also, which bits of the ELK cluster fail?

@bonzofenix
Copy link
Member Author

We are currently persisting around 36gb worth of data per day (on average). Our ES cluster has 4 shards distributed on 8 ES persistent nodes. We seen 20k worth x 5sec. We think the last crash was due to too many writes and a corrupted shard on the ES cluster. We are working on setting disks on different Data stores to avoid disk IO limitations if any. We do not have stats on this so we are not sure if this will help at all. We seen the system crashed on different situations and we scale accordingly.

  • Redis not being able to start because the AOF file being too big to load on memory and being restart all the time. -> Fixed by adding parsers to keep up with the load and deleting AOF
  • ES 90% heaps when shard was corrupted and cluster was partial offline. -> Fixed by deleting shard.

Those are the once I can recall.

@bonzofenix
Copy link
Member Author

Long term we are thinking on having 2 deployments of Logsearch+l4cf per environment. We want one with predictable growth and longer logs retention for operations that we can monitor and relay on. We will have another deployment that will be available for our cf users which we do not necessary mind if it goes down.

@shinji62
Copy link
Contributor

What about adding a broker like kafka to offload the load to it ...
In firehose-to-syslog we already filter by message type, so maybe filtering here can make sens ...

@simonjohansson
Copy link
Contributor

@shinji62 Do you mead to be able to filter in firehose-to-syslog on arbitrary fields in events?

@bonzofenix
Copy link
Member Author

@shinji62 can you point out to the filtering that you mentioned? @simonjohansson looking into the code of firehose-to-syslog it should be fairly easy to implement. I do not know if the PR will be accepted or not.

@simonjohansson
Copy link
Contributor

@bonzofenix sure, but I doubt firehose-to-syslog is the right place for this as it should just be a forwarder for general messages you want to a aggregation tool.

@mrdavidlaing
Copy link
Member

Given the failure scenarios, I think its best to try filter the logs before they hit the queue.

Currently, the logs flow like this:

firehose --> ingestor_cloudfoundry-firehose_ctl / firehose-to-syslog --> ingestor_cloudfoundry-firehose_ctl / logstash --> queue

We could make the logstash part of the ingestor_cloudfoundry-firehose job configurable with extra logstash config.

I think this would be a good place to use logstash's drop filter

I propose we update the ingestor_cloudfoundry-firehose job to support the filters property like the upstream ingestor_syslog / logstash_ingestor.filters job, which would then allow you to configure deployment specific drop rules via your deploy manifest:

properties:
  ingestor_cloudfoundry-firehose:
      filters: |
                if [loglevel] == "debug" {
                   drop { }
                } 

@shinji62
Copy link
Contributor

@simonjohansson Yeah you are right firehose-to-syslog should be simple as possible.
So maybe the @mrdavidlaing idea make more sens.

@MaheshRudrachar
Copy link

@mrdavidlaing in the same context, I have another usecase.
whitelist only those logs for a given appname and orgname

At present, I am in process of implementing and customizing the ELK stack, where in user should have flexibility to bind the app to elk and only those app logs needs to be flowed/pushed from doppler to fierhose-to-sylog plugin.. What would be your option and approach to address this usecase. Your valuable inputs are very much appreciated..

Thanks

@hannayurkevich
Copy link
Collaborator

Hi guys,

Is it still an issue?

@hannayurkevich
Copy link
Collaborator

Closing this issue. Created enhancement #173 because filtering feature can be quite useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants