Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki integration #2 #2306

Merged
merged 74 commits into from
Nov 22, 2023
Merged

Loki integration #2 #2306

merged 74 commits into from
Nov 22, 2023

Conversation

lperdereau
Copy link
Contributor

@lperdereau lperdereau commented Jun 24, 2023

Refer to : Loki integration #1574

  • Code updated
  • All old tests pass
  • Work with latest Loki version

@github-actions
Copy link

@lperdereau: There are no 'kind' label on this PR. You need a 'kind' label to generate the release automatically.

  • /kind feature
  • /kind enhancement
  • /kind fix
  • /kind chore
  • /kind dependencies
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

@github-actions
Copy link

@lperdereau: There are no area labels on this PR. You can add as many areas as you see fit.

  • /area agent
  • /area local-api
  • /area cscli
  • /area security
  • /area configuration
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

@lperdereau
Copy link
Contributor Author

/kind feature
/area agent

@codecov
Copy link

codecov bot commented Jul 3, 2023

Codecov Report

Attention: 195 lines in your changes are missing coverage. Please review.

Comparison is base (947b247) 56.87% compared to head (695a104) 56.97%.

Files Patch % Lines
...on/modules/loki/internal/lokiclient/loki_client.go 51.62% 93 Missing and 11 partials ⚠️
pkg/acquisition/modules/loki/loki.go 72.06% 56 Missing and 13 partials ⚠️
pkg/acquisition/modules/loki/entry.go 0.00% 13 Missing ⚠️
...uisition/modules/loki/internal/lokiclient/types.go 53.84% 4 Missing and 2 partials ⚠️
pkg/acquisition/modules/loki/timestamp.go 82.35% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2306      +/-   ##
==========================================
+ Coverage   56.87%   56.97%   +0.09%     
==========================================
  Files         190      195       +5     
  Lines       26169    26675     +506     
==========================================
+ Hits        14884    15197     +313     
- Misses       9729     9897     +168     
- Partials     1556     1581      +25     
Flag Coverage Δ
bats 37.78% <0.00%> (-0.77%) ⬇️
unit-linux 55.45% <61.46%> (+0.23%) ⬆️
unit-windows 51.80% <46.40%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lperdereau lperdereau force-pushed the loki branch 2 times, most recently from 7eb48f1 to c6a2c27 Compare July 3, 2023 20:48
@mmoscher
Copy link

Interesting! Any news on this? What's missing and/or where is help needed?

@lperdereau
Copy link
Contributor Author

lperdereau commented Aug 11, 2023

Interesting! Any news on this? What's missing and/or where is help needed?

@mmoscher no news, I'm trying to keep working on it if you want to contribute we can try to list what's missing.
This is an existing branch that I've updated and increased testing by adding Loki to docker compose tests.

@pschiffe
Copy link

Looking forward to this feature, hopefully you'll be able to finish it 🤞

@lperdereau
Copy link
Contributor Author

I hope this feature can be review by a maintainer. Just few comment about the feature and keys to complete this PR.

@lperdereau
Copy link
Contributor Author

Thank you @LaurenceJJones for CI pipeline.
I've just patch the build error caused by :

package loki_test
import "github.com/crowdsecurity/go-cs-lib/pkg/cstest"

Updated into:

package loki_test
import "github.com/crowdsecurity/go-cs-lib/cstest"

@buixor
Copy link
Contributor

buixor commented Oct 3, 2023

Hey,

Sorry for the delay, having a look at it now and in the coming days :)
Thanks a lot,

Copy link
Contributor

@buixor buixor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,

Here goes a quick review. It looks overall good to me; I have a few suggestions. Please let me know what you think. If we agree, it can go in the next release.

Thanks for your contribution 👍

Self note : When creating the associated documentation, we need to make it loud & clear that the user can "miss" some log lines depending on his config and the volume of logs he wants to ingest.

pkg/acquisition/modules/loki/loki.go Show resolved Hide resolved
pkg/acquisition/modules/loki/loki.go Outdated Show resolved Hide resolved
pkg/acquisition/modules/loki/loki.go Outdated Show resolved Hide resolved
}

func (lc *LokiClient) tailLogs(conn *websocket.Conn, c chan *LokiResponse, ctx context.Context) error {
tick := time.NewTicker(100 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't read "fast enough" from loki, we will lose messages (as far as I understand). While we can track it with the metrics and log messages, having a dynamic ticker might help. I suggest the following approach:

  • we can start with a 50ms ticker
  • if we don't get a message when reading, we can increase the ticker duration (ie. double it, with 100ms being the upper bound)
  • on the other hand, if we miss messages, we lower the ticker duration (ie. divide it by two, and go as low as 5ms)

I'm not sure about the upper/lower bound we should pick with the ticker as I don't have XP with Loki, but I hope you get the idea 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any information about this choice made by the previous developer.
I thought it might be a websocket connection keepalive.

I will say: If we don't read "fast enough" from loki, we will lose connection.

I did a search on grafana which allows the user to stream Loki's logs live (feature documentation).
Here the codebase grafana/grafana

We find the same idea

        ticker := time.NewTicker(time.Second * 60) //.Step)
	defer ticker.Stop()

	for {
		select {
		case <-done:
			logger.Info("Socket done")
			return nil
		case <-ctx.Done():
			logger.Info("Stop streaming (context canceled)")
			return nil
		case t := <-ticker.C:
			count++
			logger.Error("Loki websocket ping?", "time", t, "count", count)
		}
	}

I agree with we can have a dynamic ticker.

I'm not sure about this ticker, but I still think it's a connection keeper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dig into grafana code base, and I understand that they have an UI who allow stream aquisition for user via a proxy API who tail loki logs.
This ticker and and the above code (in grafana repository) is used to keep connection between user's browser and Grafana Proxy API.
In this way we don't need this for our usage. I simplified the implementation.

@buixor buixor mentioned this pull request Oct 3, 2023
7 tasks
Copy link
Contributor Author

@lperdereau lperdereau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @buixor for the review and comment.
It's really good to see activity on this 😃

pkg/acquisition/modules/loki/loki.go Outdated Show resolved Hide resolved
}

func (lc *LokiClient) tailLogs(conn *websocket.Conn, c chan *LokiResponse, ctx context.Context) error {
tick := time.NewTicker(100 * time.Millisecond)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any information about this choice made by the previous developer.
I thought it might be a websocket connection keepalive.

I will say: If we don't read "fast enough" from loki, we will lose connection.

I did a search on grafana which allows the user to stream Loki's logs live (feature documentation).
Here the codebase grafana/grafana

We find the same idea

        ticker := time.NewTicker(time.Second * 60) //.Step)
	defer ticker.Stop()

	for {
		select {
		case <-done:
			logger.Info("Socket done")
			return nil
		case <-ctx.Done():
			logger.Info("Stop streaming (context canceled)")
			return nil
		case t := <-ticker.C:
			count++
			logger.Error("Loki websocket ping?", "time", t, "count", count)
		}
	}

I agree with we can have a dynamic ticker.

I'm not sure about this ticker, but I still think it's a connection keeper.

@buixor
Copy link
Contributor

buixor commented Oct 16, 2023

Hey thanks,

Having a look ! Hopefully this time I won't keep you waiting ;)

@buixor
Copy link
Contributor

buixor commented Oct 17, 2023

Hello @lperdereau !

I made a few minor changes in the code (bubbling on errors, not capitalizing errors and a few logging clarifications).
However during our internal testing we realized that the tail approach can lead to some missed messages. An easy way to reproduce it is like using ab (concurrency of 10 with few thousand messages: ab -n 5000 -c 10 localhost/ ) on a web-server plugged to loki: Loki tail is going to miss a significant part of the messages.

While investigating, we found out that, according to this issue, The tail endpoint is not built with the intention of providing all logs for a stream 😞

We are experimenting with the query approach that another guy recommends. I will let you know if the PoC is conclusive.

Stay tuned, you can drop by on discord if you want to chat about it!

@buixor
Copy link
Contributor

buixor commented Oct 17, 2023

Added a PR on your PR : lperdereau#1

image

@lperdereau
Copy link
Contributor Author

Hello @lperdereau !

I made a few minor changes in the code (bubbling on errors, not capitalizing errors and a few logging clarifications). However during our internal testing we realized that the tail approach can lead to some missed messages. An easy way to reproduce it is like using ab (concurrency of 10 with few thousand messages: ab -n 5000 -c 10 localhost/ ) on a web-server plugged to loki: Loki tail is going to miss a significant part of the messages.

While investigating, we found out that, according to this issue, The tail endpoint is not built with the intention of providing all logs for a stream 😞

We are experimenting with the query approach that another guy recommends. I will let you know if the PoC is conclusive.

Stay tuned, you can drop by on discord if you want to chat about it!

Oh right, I'll check that out, i didn't notice this, no problem for a query acquisition mode.

Copy link
Contributor

@buixor buixor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @lperdereau for the contribution. Tests and code is fine for me. It shall ship in 1.5.6 🥳

@buixor buixor merged commit 92f923c into crowdsecurity:master Nov 22, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants