Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug fix]Connections are not deleted properly in Flow Exporter when they are not exported #2516

Merged
merged 1 commit into from
Aug 4, 2021

Conversation

srikartati
Copy link
Member

@srikartati srikartati commented Jul 31, 2021

If flows are not exported to any collector, we do not clear the connections
in the connection store. This is applicable to both conntrack and deny
connections. This will increase the memory usage in the Antrea agent linearly.

Added goroutines that clean up connections if they are stale for more than 5 minutes (stale timeout)
after the last export. We also error out if activeTimeout and idleTimeout are more than stale connection timeout (constant 5 mins. value).

@srikartati srikartati changed the title Bug fix in Flow Exporter w.r.t the deletion of connections [Bug fix]Connections are not deleted properly in Flow Exporter when they are not exported Jul 31, 2021
@codecov-commenter
Copy link

codecov-commenter commented Jul 31, 2021

Codecov Report

Merging #2516 (a0cbb1f) into main (028393d) will increase coverage by 5.19%.
The diff coverage is 69.56%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2516      +/-   ##
==========================================
+ Coverage   59.78%   64.98%   +5.19%     
==========================================
  Files         284      281       -3     
  Lines       22265    25501    +3236     
==========================================
+ Hits        13312    16571    +3259     
+ Misses       7535     7387     -148     
- Partials     1418     1543     +125     
Flag Coverage Δ
e2e-tests 55.84% <47.82%> (?)
kind-e2e-tests 47.10% <73.17%> (+0.22%) ⬆️
unit-tests 42.18% <24.13%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/flowexporter/flowrecords/flow_records.go 78.94% <33.33%> (-4.08%) ⬇️
.../flowexporter/connections/conntrack_connections.go 82.44% <81.81%> (+2.27%) ⬆️
...agent/flowexporter/connections/deny_connections.go 82.75% <87.50%> (-0.17%) ⬇️
pkg/agent/flowexporter/connections/connections.go 76.92% <100.00%> (+1.92%) ⬆️
pkg/controller/egress/ipallocator/allocator.go 67.82% <0.00%> (-15.16%) ⬇️
pkg/controller/networkpolicy/endpoint_querier.go 77.64% <0.00%> (-13.79%) ⬇️
pkg/apis/controlplane/v1beta1/conversion.go 72.44% <0.00%> (-11.89%) ⬇️
pkg/legacyapis/core/v1alpha2/register.go 69.23% <0.00%> (-10.77%) ⬇️
pkg/controller/egress/controller.go 76.76% <0.00%> (-10.44%) ⬇️
pkg/apis/stats/register.go 71.42% <0.00%> (-10.39%) ⬇️
... and 272 more

cmd/antrea-agent/options.go Outdated Show resolved Hide resolved
pkg/agent/flowexporter/flowrecords/flow_records.go Outdated Show resolved Hide resolved
@srikartati srikartati force-pushed the fix_exporter_issue branch 3 times, most recently from fb9eb3b to 5d66f06 Compare August 3, 2021 18:33
Comment on lines 264 to 269
if o.idleFlowTimeout > connections.StaleConnectionTimeout {
connections.StaleConnectionTimeout = 2 * o.idleFlowTimeout
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a bit of an asymmetry here: if idleFlowTimeout is 3m and activeFlowTimeout is 2m, the StaleConnectionTimeout will be set to 4m. A symmetric solution would set it to 2*max(idleFlowTimeout, activeFlowTimeout). or 5 minutes, whichever is larger.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I saw that. I assumed idleFlowTimeout to be always smaller than activeFlowTimeout, but I see that we do not have any check in the config options to make sure of that.

Will have a combined check to resolve that.

pkg/agent/flowexporter/flowrecords/flow_records.go Outdated Show resolved Hide resolved
@srikartati srikartati force-pushed the fix_exporter_issue branch 2 times, most recently from eec9202 to 9629ec0 Compare August 3, 2021 20:11
antoninbas
antoninbas previously approved these changes Aug 3, 2021
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

)

var (
StaleConnectionTimeout = 5 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be a better pattern to have const DefaultStaleConnectionTimeout = 5 * time.Minute and pass the actual value as a parameter to the connection store when instantiating it, but I'll let you make that call

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the same pattern as active and idle timeouts make sense. I changed it. PTAL.

@srikartati
Copy link
Member Author

@antoninbas Would like to cherrypick this on to release-1.2 branch once it is merged as it is a critical bug. Thanks.

@srikartati
Copy link
Member Author

/test-e2e

If flows are not exported to any collector, we do not clear the connections
in the connection store. This is applicable to both conntrack and deny
connections. This will increase the memory usage unnecessarily in Antrea agent.

Signed-off-by: Srikar Tati <stati@vmware.com>
@srikartati
Copy link
Member Author

/test-all /test-ipv6-e2e /test-ipv6-only-e2e

@antoninbas antoninbas added kind/bug Categorizes issue or PR as related to a bug. area/flow-visibility/exporter Issues or PRs related to the Flow Exporter functions in the Agent labels Aug 4, 2021
@srikartati
Copy link
Member Author

@antoninbas This is ready to be merged. Dual stack e2e tests failed because of an unrelated test:
FAIL: TestBatchCreatePods (45.80s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/flow-visibility/exporter Issues or PRs related to the Flow Exporter functions in the Agent kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants