Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FLUME-3149 reduce cpu cost for taildir file source while still maintaining reliability by using posFile in memory channel #159

Open
wants to merge 9 commits into
base: trunk
Choose a base branch
from

Conversation

zyfo2
Copy link

@zyfo2 zyfo2 commented Aug 26, 2017

File channel tracks transferred events and use transnational mechanism to make transfer recoverable. However, it increases CPU cost due to frequent system calls like write, read, etc. The Cpu cost could be very high if the transfer rate is high.
In contrast, Memory channel has no such issue which requires only about 10% of CPU cost in the same environment but it's not recovered if the system is down accidentally.
For sources like taildir, I propose we could write position file in memory channel to achieve reliability and reduce CPU cost.
After testing on my own production environment, CPU usage dropped from 13% to 3% and still maintain reliability. (Transfer rate: 1Mb/s , kafka sink, file channel -> memory channel with pos file)

Copy link
Member

@bessbd bessbd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the changes seem to be fine. Can you please provide a way of testing this change? Ie. "run this command to see the performance gain".

@@ -1182,7 +1182,6 @@ headers.<filegroupName>.<headerKey> -- Header value
byteOffsetHeader false Whether to add the byte offset of a tailed line to a header called 'byteoffset'.
skipToEnd false Whether to skip the position to EOF in the case of files not written on the position file.
idleTimeout 120000 Time (ms) to close inactive files. If the closed file is appended new lines to, this source will automatically re-open it.
writePosInterval 3000 Interval time (ms) to write the last position of each file on the position file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to remove this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's better to keep it to be compatible with other channels. I'll add it back and modify the user guide.

@@ -64,4 +64,7 @@ public String toString() {
return "[Event headers = " + headers + ", body.length = " + bodyLen + " ]";
}

public void notifySource() {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add javadoc to clarify usage?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

@@ -64,4 +64,7 @@ public String toString() {
return "[Event headers = " + headers + ", body.length = " + bodyLen + " ]";
}

public void notifySource() {}

public void commit() {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add javadoc to clarify usage?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

@@ -104,16 +107,15 @@ public synchronized void start() {
} catch (IOException e) {
throw new FlumeException("Error instantiating ReliableTaildirEventReader", e);
}
for (Entry<Long, TailFile> entry : reader.getTailFiles().entrySet()) {
inodePositionMap.put(entry.getKey() + "", entry.getValue().getPos() + "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the + ""-s needed? As per https://stackoverflow.com/a/1572724/5323166 , String.valueOf is recommended. Please, let me know if the intent was different.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. I'll fix it.

@zyfo2
Copy link
Author

zyfo2 commented Oct 30, 2017

I think this change is essentially reaching reliability in MemoryChannel when using taildirSource. So the performance diff is essentially the diff of MemoryChannel and FileChannel. In my tests it saves more than 90% percent of CPU but it's no easy way of comparing it by a simple command.
Perhaps I should add some tests to verify the reliability when using tailDirSource and MemoryChannel?

@asfgit
Copy link

asfgit commented Aug 17, 2018

Can one of the admins verify this patch?

waidr pushed a commit to waidr/flume that referenced this pull request Jul 24, 2019
stream-diagnostics:stable v2.3.4 for fix streamQos startTime
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants