Skip to content

Increase PMD batch & buffer sizes#481

Closed
movence wants to merge 7 commits intoaws:masterfrom
movence:increase-pmd-batch-size
Closed

Increase PMD batch & buffer sizes#481
movence wants to merge 7 commits intoaws:masterfrom
movence:increase-pmd-batch-size

Conversation

@movence
Copy link
Copy Markdown
Contributor

@movence movence commented Jun 8, 2022

Description of the issue

The limit on datum size with PutMetricData has been increased to 1000, and CWA should be aligned with that change in order to send more datums per API call.

Description of changes

Increase the default datum size to 1000.

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

Running the existing integration test

Requirements

Before commit the code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make linter

@movence movence requested a review from a team as a code owner June 8, 2022 18:49
@movence movence removed the request for review from a team June 8, 2022 18:50
Comment thread plugins/outputs/cloudwatch/cloudwatch.go
@khanhntd
Copy link
Copy Markdown
Contributor

khanhntd commented Jun 8, 2022

Another thing to notice would be the unit test for CWAgent is failing with different OS and its does not relate to our usual flakiness test.

--- FAIL: TestWrite (3.01s)
    cloudwatch_test.go:368: 
        	Error Trace:	cloudwatch_test.go:368
        	Error:      	Not equal: 
        	            	expected: 2
        	            	actual  : 1
        	Test:       	TestWrite
        	Messages:   	Expected number of calls (2) does not match the actual number of calls (1).
    cloudwatch_test.go:368: 
        	Error Trace:	cloudwatch_test.go:368
        	Error:      	Should be true
        	Test:       	TestWrite
FAIL
coverage: 65.7% of statements

Therefore, we would need to deep dive before moving on.
One side note from me is the coverage percentage is rather small. Are there any missing tests we would need to cover?

@SaxyPandaBear
Copy link
Copy Markdown
Contributor

Is the change already deployed on the backend for this? Is there a way for us to manually/automatically integration test just to be sure it works and there isn't any other bottleneck or limiter?

@movence
Copy link
Copy Markdown
Contributor Author

movence commented Jun 9, 2022

Another thing to notice would be the unit test for CWAgent is failing with different OS and its does not relate to our usual flakiness test.

I don't believe it's tied to any specific OSes but a minor failed test case which should have been fixed.

coverage percentage is rather small

Are you suggesting the overall code coverage should be bumped up?

@khanhntd
Copy link
Copy Markdown
Contributor

khanhntd commented Jun 9, 2022

I don't believe it's tied to any specific OSes but a minor failed test case which should have been fixed.

Yes it does not tied to any specific OS. Good job on fixing the test 👍 . However, the GitHub blocks running unit-test workflow until resolve the conflict

Are you suggesting the overall code coverage should be bumped up?

Yes. However, it's a no blocker from me. If we confident the existing unit tests has covered enough daily use case then it's fine by me.

Another non-block from me would be adding integration test , same as @SaxyPandaBear idea. For the idea, Seth has done similar to this (https://github.com/aws/amazon-cloudwatch-agent/tree/master/integration/test/metrics_number_dimension)

@SaxyPandaBear
Copy link
Copy Markdown
Contributor

What coverage are you referring to @khanhntd? The change is tiny. Are there specific test cases that you think are missing? Otherwise I don't see what you really think we can improve on test coverage for a one variable change. The only possible thing I could see is integration testing, like what was done for the max dimensions change, but that's not unit test coverage.

…h-size

# Conflicts:
#	plugins/outputs/cloudwatch/cloudwatch_test.go
@khanhntd
Copy link
Copy Markdown
Contributor

khanhntd commented Jun 9, 2022

What coverage are you referring to @khanhntd? The change is tiny. Are there specific test cases that you think are missing? Otherwise I don't see what you really think we can improve on test coverage for a one variable change. The only possible thing I could see is integration testing, like what was done for the max dimensions change, but that's not unit test coverage.

Yes. That's why I mentioned its a non-blocker from me consider only the variables change and the test bumping coverage can be done in another PR 👍.

adam-mateen
adam-mateen previously approved these changes Jun 9, 2022
Copy link
Copy Markdown
Contributor

@adam-mateen adam-mateen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved

I see existing unit test - TestIsFull() tests batch.isFull().
Unit tests do not make actual calls using the actual CloudWatch backend though.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 10, 2022

Codecov Report

Merging #481 (2d1a2b3) into master (e7a20ef) will increase coverage by 0.12%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #481      +/-   ##
==========================================
+ Coverage   56.87%   57.00%   +0.12%     
==========================================
  Files         363      373      +10     
  Lines       16937    17638     +701     
==========================================
+ Hits         9633    10054     +421     
- Misses       6753     6993     +240     
- Partials      551      591      +40     
Impacted Files Coverage Δ
plugins/outputs/cloudwatch/cloudwatch.go 75.00% <ø> (+1.62%) ⬆️
translator/cmdutil/userutil_darwin.go
plugins/inputs/logfile/tail/tail_windows.go 100.00% <0.00%> (ø)
...gins/inputs/win_perf_counters/win_perf_counters.go 68.15% <0.00%> (ø)
plugins/inputs/logfile/tmpfile_windows.go 40.00% <0.00%> (ø)
...windows_event_log/wineventlog/wineventlogrecord.go 59.09% <0.00%> (ø)
...puts/logfile/tail/file_deleting_checker_windows.go 36.36% <0.00%> (ø)
...s/inputs/windows_event_log/wineventlog/sys_call.go 47.36% <0.00%> (ø)
...gins/inputs/windows_event_log/wineventlog/utils.go 69.60% <0.00%> (ø)
plugins/inputs/win_perf_counters/pdh.go 65.47% <0.00%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e7a20ef...2d1a2b3. Read the comment docs.

@khanhntd khanhntd added this to the 1.247354.0 milestone Jun 10, 2022
SaxyPandaBear
SaxyPandaBear previously approved these changes Jun 11, 2022
Copy link
Copy Markdown
Contributor

@SaxyPandaBear SaxyPandaBear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@SaxyPandaBear SaxyPandaBear dismissed their stale review June 16, 2022 13:44

holding on to this

Comment thread plugins/outputs/cloudwatch/cloudwatch.go
@movence movence changed the title Increase PMD batch size to 1000 per API call Increase PMD batch & buffer sizes Jul 6, 2022
@SaxyPandaBear
Copy link
Copy Markdown
Contributor

Is the backend ready for this or are we still waiting?

Copy link
Copy Markdown
Contributor

@khanhntd khanhntd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

defaultMaxDatumsPerCall = 1000 // PutMetricData only supports up to 1000 data metrics per call by default
defaultMaxValuesPerDatum = 150 // By default only these number of values can be inserted into the value list
bottomLinePayloadSizeToPublish = 200000 // Leave 9600B for the last datum buffer. 200KB payload size, 5:1 compression ratio estimate.
bottomLinePayloadSizeToPublish = 500000 // Leave 2400B for the last datum buffer. 500KB payload size, 5:1 compression ratio estimate.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should sync up with internal: bottomLinePayloadSizeToPublish = 999 000 (0.999 MB) and delete 5:1 compression ratio estimate

Copy link
Copy Markdown
Contributor

@adam-mateen adam-mateen Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm the API limit is actually 1 MiB, as in 1024 * 1024 Bytes?
If so then this limit could be raised to 1023 KiB (1023 * 1024 Bytes).
I don't think it is critical since a payload of 1,000 metric datums is unlikely to reach 1023 KiB.
May as well be as accurate as possible, right?

@SaxyPandaBear
Copy link
Copy Markdown
Contributor

Done in #523

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants