Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix line skipping issue in receive_lines method #4491

Closed
wants to merge 0 commits into from

Conversation

yugeeklab
Copy link
Contributor

@yugeeklab yugeeklab commented May 10, 2024

Which issue(s) this PR fixes:
Fixes #4494

What this PR does / why we need it:
Before this patch, long lines could cause breakdowns in fluentd, potentially posing a vulnerability. With this patch, max_line_size will be integrated into the FIFO, enabling the system to skip lines exceeding the maximum size before executing receive_lines.

Docs Changes:

Release Note:

@daipom daipom self-requested a review May 13, 2024 02:54
@daipom
Copy link
Contributor

daipom commented May 13, 2024

@yugeeklab Thanks for this fix!
CI is currently unstable because of #4487. We will fix it. Sorry for the trouble.

I see the intent of this fix as follows.

  • In the current implementation, large lines that would eventually be discarded in receive_lines are temporarily held in IOHandler's @lines.
  • This is a waste of memory.
  • This PR resolves the waste.

Surely, such a fix would allow us to limit memory consumption by the max_line_size setting to some extent!

This PR would be effective to some extent, however I believe the problem of memory consumption will remain.
It would be possible that FIFO's @buffer becomes unlimitedly large if the @eol does not appear in the data.

Are these my understandings correct?

@yugeeklab yugeeklab marked this pull request as ready for review May 13, 2024 09:18
@yugeeklab
Copy link
Contributor Author

yugeeklab commented May 13, 2024

Hi, @daipom

I've just published an issue #4491 for more information.

This PR would be effective to some extent, however I believe the problem of memory consumption will remain.
It would be possible that FIFO's @buffer becomes unlimitedly large if the @EOL does not appear in the data.

When max_line_size isn't set, FIFO's @buffer can grow indefinitely. Or if max_line_size has large value, FIFO's buffer will be limited, but there's still a possibility of fluentd experiencing slowdowns.

Summary:

as-is: max_line_size helps you avoid buffer overflow configuring via buffer section.
to-be: max_line_size helps prevent buffer overflow by configuring the buffer section and also ensures FIFO's buffer size remains limited.

If you have any suggestions, such as the fifo_buffer_size parameter or any other ideas, please feel free to discuss them with me.

Thank you for your review!

@daipom
Copy link
Contributor

daipom commented May 13, 2024

@yugeeklab

I've just published an issue #4491 for more information.

Thanks so much!

as-is: max_line_size helps you avoid buffer overflow configuring via buffer section. to-be: max_line_size helps prevent buffer overflow by configuring the buffer section and also ensures FIFO's buffer size remains limited.

Now I understand!
The following understanding was not correct.

This PR would be effective to some extent, however I believe the problem of memory consumption will remain.
It would be possible that FIFO's @buffer becomes unlimitedly large if the @eol does not appear in the data.

This fix clears FIFO's @buffer when read_lines.
So, this fix ensures FIFO's buffer size remains limited.

If you have any suggestions, such as the fifo_buffer_size parameter or any other ideas, please feel free to discuss them with me.

Thanks!
Basically, it seems to be a very good idea to limit the FIFO's buffer.

@yugeeklab
Copy link
Contributor Author

Basically, it seems to be a very good idea to limit the FIFO's buffer.

Thank you for your comment!! @daipom

Please let me know if there's any feedback on my code or idea. I'll review and accept your feedback as soon as possible.

Thank you.

@daipom
Copy link
Contributor

daipom commented May 17, 2024

About CI failures, although #4493 has been resolved, we still need to resolve #4487.

@daipom
Copy link
Contributor

daipom commented May 17, 2024

@yugeeklab The CI issue has been resolved. Sorry for the trouble.
Could you please rebase this branch on the latest master?

@yugeeklab yugeeklab force-pushed the master branch 2 times, most recently from 7082a95 to 8463d57 Compare May 19, 2024 08:38
@yugeeklab
Copy link
Contributor Author

Hi, @daipom

Rebase is done!!

Thank you for your review!!

@daipom
Copy link
Contributor

daipom commented May 27, 2024

Sorry for waiting.
I will review this soon.

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yugeeklab Thanks for this fix!
This fix basically looks good to me!
I've commented on some minor details (about the following), please check!

  • Keeping the same debug log as before
  • Improving codes
  • Improving tests

lib/fluent/plugin/in_tail.rb Outdated Show resolved Hide resolved
lib/fluent/plugin/in_tail.rb Outdated Show resolved Hide resolved
lib/fluent/plugin/in_tail.rb Outdated Show resolved Hide resolved
test/plugin/test_in_tail.rb Outdated Show resolved Hide resolved
test/plugin/in_tail/test_fifo.rb Outdated Show resolved Hide resolved
test/plugin/in_tail/test_io_handler.rb Outdated Show resolved Hide resolved
@daipom
Copy link
Contributor

daipom commented Jun 4, 2024

@yugeeklab
Thanks for updating.
I'm checking the CI failures.

@daipom
Copy link
Contributor

daipom commented Jun 4, 2024

Current CI failures have nothing to do with this PR.
Sorry for the trouble again.

@daipom
Copy link
Contributor

daipom commented Jun 7, 2024

The CI issue has been resolved.
So, could you please rebase this to the latest master?
Sorry for the trouble again.

@yugeeklab yugeeklab force-pushed the master branch 2 times, most recently from b7f5859 to 1c5c571 Compare June 9, 2024 08:17
@yugeeklab
Copy link
Contributor Author

Hi, @daipom

Rebase is done.
I also added a commit to resolve the following issue.
Please review it if you don't mind.

Thank you.

@yugeeklab yugeeklab force-pushed the master branch 2 times, most recently from 0d45c3b to bf73efe Compare June 9, 2024 23:48
Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!
The following point is a remaining concern.

#4491 (comment)

I saw bf73efe and realized that there is a problem that needs to be solved about the management of pos

We should not update pos like this commit (bf73efe).
We should only update pos at points where recovery is possible.
This means that we should not update pos until we can be sure that @lines has been successfully handled by @receive_lines.
If updating pos like this commit, some data may be lost when BufferOverflowError occurs or when Fluentd is forced to stop.
(see #4491 (comment))

So, we need to consider how to manage pos correctly for this feature.
It needs to be able to continue processing correctly even if Fluentd is forced to stop.
Repeating process to skip long lines would be acceptable.
Data loss or sending corrupted data would be unacceptable.

For this feature, we need to take care of @was_long_line in particular.
We need to make sure that the restart of Fluentd does not cause a subsequent incomplete log to be sent.

@daipom
Copy link
Contributor

daipom commented Jun 12, 2024

We should not update pos like this commit (bf73efe).
...
If updating pos like this commit, some data may be lost when BufferOverflowError occurs or when Fluentd is forced to stop.

Oh, sorry, it was wrong.
The @lines.empty? condition will prevent it (probably...).

if @lines.empty? && has_skipped_line
@watcher.pe.update_pos(io.pos - @fifo.bytesize)
end

So, we need to consider only the following points.

For this feature, we need to take care of @was_long_line in particular.
We need to make sure that the restart of Fluentd does not cause a subsequent incomplete log to be sent.

@daipom
Copy link
Contributor

daipom commented Jun 12, 2024

So, we need to consider only the following points.

For this feature, we need to take care of @was_long_line in particular.
We need to make sure that the restart of Fluentd does not cause a subsequent incomplete log to be sent.

I think we should change FIFO#bytesize.

def bytesize
@buffer.bytesize
end

It is used in the following pos logic:

if @lines.empty? && has_skipped_line
@watcher.pe.update_pos(io.pos - @fifo.bytesize)
end
unless @lines.empty?
if @receive_lines.call(@lines, @watcher)
@watcher.pe.update_pos(io.pos - @fifo.bytesize)
@lines.clear
else
read_more = false
end
end

def open
io = Fluent::FileWrapper.open(@path)
io.seek(@watcher.pe.read_pos + @fifo.bytesize)

The bytesize should be the uncommitted byte size that FIFO is still handling.
It does not equal the size of the buffer of FIFO anymore because FIFO can clear the buffer to skip the long line.

In the following case (max_line_size 12), very long line not finished yet will be cleared from the buffer soon.

short line\n # To be committed to the pos
very long line not finished yet # Not to be committed to the pos until the `@eol` occurs.

However, that data size should be considered for pos handling.
Since the line is not finished yet, the pos update should be done up to the end of short line\n.
(When Fluentd restarts, Fluentd should continue the process from the end of short line\n.)
Also, the reopening pos should be from the end of very long line not finished yet (especially for the case open_on_every_update).

For this, FIFO#bytesize should be the uncommitted byte size that FIFO is still handling, not the real buffer size of FIFO.

@daipom
Copy link
Contributor

daipom commented Jun 12, 2024

@yugeeklab I have fixed the remaining points and pushed them to my tmp branch (the following 3 commits).
Could you please check them?
If there is no problem, I will push these commits to this PR.
If you have any concerns or ideas, please let me know.

The main point is to resolve the issue that is tested on the 'discards a subsequent data in a long line even if restarting occurs between' test in fix to commit the correct pos to continue processing correctly.
This test would fail in the current branch.

@yugeeklab
Copy link
Contributor Author

yugeeklab commented Jun 14, 2024

Hi @daipom

So, Here is the summary of your code

AS-IS:
When a restart occurs while reading a long line, because it doesn't record the position properly, a long line can be recognized as a short line.

T0-BE:
When a restart occurs while reading a long line and the position is recorded properly, the long line is recognized as a long line.

It looks good to me!!

Thank you!!!

@daipom
Copy link
Contributor

daipom commented Jun 14, 2024

@yugeeklab Yes! Thanks for checking it!
I will push them.

@daipom
Copy link
Contributor

daipom commented Jun 14, 2024

@yugeeklab Sorry, I failed to push. Something wrong happens...
I'm fixing it. Please wait...

@daipom
Copy link
Contributor

daipom commented Jun 14, 2024

@yugeeklab Sorry for the trouble.
Could you please run the following command to recover your origin/master branch?
(I wrongly pushed the current master to your origin/master. Sorry for the trouble.)

git push -f

@yugeeklab
Copy link
Contributor Author

Hi @daipom

I recover my origin/master!!

Should I also reopen Pull Request?

Thank you.

@daipom
Copy link
Contributor

daipom commented Jun 17, 2024

Should I also reopen Pull Request?

Of course! Thanks for reopening it!
Sorry for the trouble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

in_tail plugin can cause breakdowns in fluentd
2 participants