Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR to merge feature/flow-aggregator with master #1671

Merged
merged 1 commit into from
Dec 23, 2020

Conversation

srikartati
Copy link
Member

No description provided.

@codecov-io
Copy link

codecov-io commented Dec 17, 2020

Codecov Report

Merging #1671 (7cb2e24) into master (9d3d10b) will decrease coverage by 1.86%.
The diff coverage is 58.57%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1671      +/-   ##
==========================================
- Coverage   63.31%   61.45%   -1.87%     
==========================================
  Files         170      184      +14     
  Lines       14250    15781    +1531     
==========================================
+ Hits         9023     9698     +675     
- Misses       4292     5050     +758     
- Partials      935     1033      +98     
Flag Coverage Δ
kind-e2e-tests 53.16% <51.85%> (-2.23%) ⬇️
unit-tests 40.42% <36.59%> (-0.85%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cmd/antrea-agent/agent.go 0.00% <0.00%> (ø)
.../agent/apiserver/handlers/networkpolicy/handler.go 58.33% <ø> (ø)
...gent/controller/noderoute/node_route_controller.go 45.83% <0.00%> (-0.64%) ⬇️
pkg/agent/stats/collector.go 97.72% <ø> (ø)
pkg/agent/types/networkpolicy.go 83.33% <ø> (ø)
pkg/antctl/antctl.go 100.00% <ø> (ø)
pkg/antctl/transform/addressgroup/transform.go 0.00% <0.00%> (ø)
pkg/antctl/transform/controllerinfo/transform.go 0.00% <ø> (ø)
pkg/antctl/transform/version/transform.go 44.82% <ø> (ø)
...kg/controller/networkpolicy/antreanetworkpolicy.go 84.69% <ø> (-0.19%) ⬇️
... and 112 more

@srikartati srikartati changed the title [not for review]PR to validate tests after rebasing feature/flow-aggregator with master PR to merge feature/flow-aggregator with master Dec 19, 2020
@srikartati
Copy link
Member Author

/test-all

build/yamls/base/conf/antrea-agent.conf Show resolved Hide resolved
Comment on lines 153 to 162
if net.ParseIP(ipStr).To4() == nil {
ipStr = fmt.Sprintf("[%s]", ipStr)
isIPv6 = true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused by this change, first because of the comment at the beginning of the function:

	// TODO: remove hardcoding to IPv4 after flow aggregator supports IPv6
	isIPv6 := false

but also because of ipStr := ipfixCollectorIP.ipv4.String(), which would imply that ipStr is an IPv4 address

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes did not realize the comment is there. Yes there is a bug. I tried to carry over the existing v6 code on master, but missed a line. Will fix.

Comment on lines 141 to 149
// InterNodeFlows tests the case, where Pods are deployed on different Nodes
// and their flow information is exported as IPFIX flow records.
t.Run("InterNodeFlows", func(t *testing.T) {
// In worker nodes, Flow Exporter on Antrea Agent is not able to resolve
// the DNS name of Flow Aggregator Service. It is flaky depending on the
// underlay connectivity in the Kind cluster. Please note that CoreDNS Pods
// are deployed on master node because of connectivity issues between kube-apiserver
// and worker nodes.
skipIfProviderIs(t, "kind", "DNS resolution of Flow Aggregator Service is flaky on worker nodes")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that you are only skipping inter-node tests (InterNodeFlows and RemoteServiceAccess). Given that I am more inclined to think that this is another instance of issue #897?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happened with AntreaPolicy enabled case too whose tunnel is vxlan, so not sure if that is the root cause.

}
return strings.Contains(collectorOutput, srcIP) && strings.Contains(collectorOutput, dstIP), nil
})
assert.NoError(t, err, "IPFIX collector did not receive expected number of data records and timed out.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be require.NoError? If we don't receive the data records, should be bother with the validation after?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense.

@@ -6,7 +6,7 @@ function usage() {
--prometheus Deploy Prometheus service to scrape metrics from Antrea Agents and Controllers
--flow-collector Provide the IPFIX flow collector address to collect the flows from the Flow Aggregator service
It should be given in the format IP:port:proto. Example: 192.168.1.100:4739:udp
Please note that with this option we deploy the Flow Aggregator service along with the Antrea daemonset."
Please note that with this option we deploy the Flow Aggregator service along with the Antrea."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Please note that with this option we deploy the Flow Aggregator service along with the Antrea."
Please note that with this option we deploy the Flow Aggregator Service along with Antrea."

Comment on lines 435 to 434
if err = data.moveDeploymentToSpecificNode(flowAggregatorDeployment, flowAggregatorNamespace, masterNodeName()); err != nil {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right way to go:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the second point is a good suggestion. Let me quickly try that out instead of skipping.

@srikartati
Copy link
Member Author

/test-all

@srikartati
Copy link
Member Author

/test-all
/test-ipv6-e2e
/test-ipv6-only-e2e
/test-ipv6-conformance
/test-ipv6-only-conformance
/test-ipv6-networkpolicy
/test-ipv6-only-networkpolicy

@srikartati srikartati force-pushed the feature/flow-aggregator branch 4 times, most recently from 0fc44eb to 73caee1 Compare December 22, 2020 19:28
@srikartati
Copy link
Member Author

/test-e2e

- name: Build flow-aggregator Docker image
run: make flow-aggregator-ubuntu
- name: Push flow-aggregator Docker image to registry
# Will remove the feature/flow-aggregator branch later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stale comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment on lines +82 to +83
# TODO: Create path to two image artifacts when uploading artifacts, so multiple
# artifacts can be downloaded at once.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand this comment: do you mean bundle both images in a single artifact?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at this: https://github.com/actions/download-artifact#download-all-artifacts

They still seem to be different artifacts.

Comment on lines 288 to 297
VERSION="$CLUSTER" DOCKER_REGISTRY="${DOCKER_REGISTRY}" make flow-aggregator-ubuntu && break
done
cd ci/jenkins

if [ "$?" -ne "0" ]; then
echo "=== Antrea Image build failed ==="
echo "=== Antrea Image and Flow Aggregator Image build failed ==="
exit 1
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this code so I think we need @lzhecheng to review. Specifically I am unsure why we try to build the images twice. Also it seems that the [ "$?" -ne "0" ] test would apply to the cd command (last command run), not the image build.

Finally, maybe the error message should be "=== Antrea Image or Flow Aggregator Image build failed ==="

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we just compile once as a workaround? I added some code for that. Earlier code did not work. Will see if new code succeeds.

for image in "${COMMON_IMAGES_LIST[@]}"; do
docker pull $image
done
if $coverage; then
manifest_args="$manifest_args --coverage"
COMMON_IMAGES_LIST+=("antrea/antrea-ubuntu-coverage:latest")
COMMON_IMAGES_LIST+=("antrea/antrea-ubuntu-coverage:latest" "antrea/flow-aggregator:latest")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? we don't have a different image for coverage (for flow-aggregator)

"projects.registry.vmware.com/antrea/flow-aggregator:latest" should be added unconditionally to the COMMON_IMAGES_LIST

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of adding coverage, but not super familar with all components involving in it so gone for a simpler route of using the direct image considering time. Should be taken up as TODO.. hope that is ok.

In generate_manifest_flow_aggregator.sh, for kind environments I always use antrea/flow-aggregator:latest. Even with this change should "projects.registry.vmware.com/antrea/flow-aggregator:latest" be added always? I have done that but just curious on why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you misunderstood me - there is no need for antrea/flow-aggregator:latest here, this image is not used by any test

so projects.registry.vmware.com/antrea/flow-aggregator:latest should always be added, antrea/flow-aggregator:latest should never be added. All yaml manifests and tools should reference projects.registry.vmware.com/antrea/flow-aggregator:latest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only time antrea/flow-aggregator:latest is used is when pushing to dockerhub

isIPv6 := false
if err := data.setupLogDirectoryForTest(tb.Name()); err != nil {
tb.Errorf("Error creating logs directory '%s': %v", data.logsDirForTestCase, err)
if err := testData.setupLogDirectoryForTest(tb.Name()); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea for later: this should just call setupTest instead of duplicating the code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes added in above TODO

}

func (data *TestData) mutateFlowAggregatorConfigMap(ipfixCollector string) error {
// get flow aggregator config map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a very useful comment, that's the name of the function

Comment on lines 156 to 157
tb.Logf("Applying flow aggregator YAML with ipfix collector address: %s:%s:tcp", ipStr, ipfixCollectorPort)
faClusterIP, err := testData.deployFlowAggregator(fmt.Sprintf("%s:%s:tcp", ipStr, ipfixCollectorPort))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tb.Logf("Applying flow aggregator YAML with ipfix collector address: %s:%s:tcp", ipStr, ipfixCollectorPort)
faClusterIP, err := testData.deployFlowAggregator(fmt.Sprintf("%s:%s:tcp", ipStr, ipfixCollectorPort))
ipfixCollectorAddr := fmt.Sprintf("%s:%s:tcp", ipStr, ipfixCollectorPort)
tb.Logf("Applying flow aggregator YAML with ipfix collector address '%s'", ipfixCollectorAddr)
faClusterIP, err := testData.deployFlowAggregator(ipfixCollectorAddr)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 161 to 162
tb.Logf("Deploying flow exporter with collector address: %s:%s:tcp", faClusterIP, ipfixCollectorPort)
if err = testData.deployAntreaFlowExporter(fmt.Sprintf("%s:%s:tcp", faClusterIP, ipfixCollectorPort)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same suggestion as above

// TODO: remove this limitation after flow aggregator supports IPv6
skipIfIPv6Cluster(t)
skipIfNotIPv4Cluster(t)
skipIfProviderIs(t, "remote", "This test is not yet supported in jenkins e2e runs.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jenkins is using the "vagrant" provider apparently, as discussed offline

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

if err := data.deployAntreaFlowExporter(fmt.Sprintf("%s:%s:tcp", ipStr, ipfixCollectorPort)); err != nil {
return data, err, isIPv6
tb.Logf("Deploying flow exporter with collector address: %s:%s:tcp", faClusterIP, ipfixCollectorPort)
if err = testData.deployAntreaFlowExporter(fmt.Sprintf("%s:%s:tcp", faClusterIP, ipfixCollectorPort)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should only use the ClusterIP on Kind (since apparently there are DNS issue, which should be explained in a comment). On Jenkins we should use the default (DNS path), since this is what users will use by default and it's better to test for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Thought of doing this later once everything runs perfectly on jenkins. Added it now.

@srikartati srikartati force-pushed the feature/flow-aggregator branch 8 times, most recently from 2f301d8 to a4df2e5 Compare December 22, 2020 23:29
- Add the build system, scripts for manifest generation and corresponding
  workflow changes for Flow Aggregator.
- The main logic implementation of the flow aggregator that uses the go-ipfix
  library v0.4.2 with required unit tests.
- Agent side changes in Flow Exporter for integration with Flow Aggregator using
  DNS name resolution.
- Add e2e tests for flow aggregator and remove flow exporter tests.

Co-authored-by: dyongming@vmware.com
Co-authored-by: zyiou@vmware.com
Co-authored-by: stati@vmware.com
@srikartati
Copy link
Member Author

/test-e2e

@@ -244,6 +244,20 @@ function setup_cluster() {
fi
}

function copy_image {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, it is good to have a copy_image function to make deliver_antrea clean. However, it confuses me that this function doesn't include judgement for "Coverage" while "centos-7" is included. I think these 2 parameters should be treated equally. So one parameter IP is enough for this function.
Could you please explain why you design like this? I just woke up and maybe my mind is not clear enough...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lzhecheng you make a good point. If the tests don't pass and we need to amend this PR, we will change this function as well as per your suggestion. If the tests do pass however, I suggest we merge this in order to make the release, and @srikartati can take as an action item to improve the function implementation. Does that work for you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"coverage" is used in deliver_antrea (L328-L333). @antoninbas identified a bug with image name when coverage is not true; it should be projects.registry.vmware.com and not docker.io. Therefore we considered 4 args.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lzhecheng is wondering if we can reduce the number of parameters by accessing $COVERAGE directly in the function implementation. However, thinking more about this, I think it would make the function implementation dependent on which image we are pushing (antrea or flow-aggregator), which is probably not great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can improve CI after this PR is merged. I fully understand the pain to re-run all tests...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas sorry I didn't get your point. I think all logic of the loop in deliver_antrea will be in copy_image. And flow-aggregator image always gets delivered, right? "centos-7" and "coverage" take care of the original logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea was to have a function that support uploading any image when invoked, rather than a function that takes care of uploading all images at once in a single invocation. With your proposal, won't we need to duplicate logic inside of the function for the antrea image and for the flow-aggregator image?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, now I get your point. Yes, you are right. Such function is the one needed and I just try to move as much as logic to other functions to make deliver_antrea clean.
Ideally, we can put a copy_some_images in deliver_antrea and copy_image in copy_some_images but it is unnecessary in our case.

@srikartati
Copy link
Member Author

/test-conformance
/test-networkpolicy
/test-ipv6-e2e

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas antoninbas merged commit ba78f3a into antrea-io:master Dec 23, 2020
antoninbas pushed a commit that referenced this pull request Dec 23, 2020
- Add the build system, scripts for manifest generation and corresponding
  workflow changes for Flow Aggregator.
- The main logic implementation of the flow aggregator that uses the go-ipfix
  library v0.4.2 with required unit tests.
- Agent side changes in Flow Exporter for integration with Flow Aggregator using
  DNS name resolution.
- Add e2e tests for flow aggregator and remove flow exporter tests.

Co-authored-by: dyongming@vmware.com
Co-authored-by: zyiou@vmware.com
Co-authored-by: stati@vmware.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants