Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci:[CNI] Add windows CNIv1 datapath test #2016

Merged
merged 7 commits into from
Jun 21, 2023

Conversation

jpayne3506
Copy link
Contributor

@jpayne3506 jpayne3506 commented Jun 14, 2023

Reason for Change:

Improves the test coverage for windows CNIv1 clusters.

Issue Fixed:

Requirements:

Notes:

@jpayne3506 jpayne3506 added cni Related to CNI. ci Infra or tooling. labels Jun 14, 2023
@jpayne3506 jpayne3506 self-assigned this Jun 15, 2023
@jpayne3506
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jpayne3506 jpayne3506 marked this pull request as ready for review June 16, 2023 20:09
@jpayne3506 jpayne3506 requested a review from a team as a code owner June 16, 2023 20:09
@jpayne3506 jpayne3506 enabled auto-merge (squash) June 19, 2023 17:06
test/internal/k8sutils/utils_parse.go Show resolved Hide resolved
test/integration/datapath/datapath-win_test.go Outdated Show resolved Hide resolved
test/integration/datapath/datapath-win_test.go Outdated Show resolved Hide resolved
t.Log("Get Nodes")
nodes, err := k8sutils.GetNodeListByLabelSelector(ctx, clientset, nodeLabelSelector)
if err != nil {
require.NoError(t, err, "could not get k8s node list: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why require.error and t.fatal at other places ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left them in when testing and forgot to change. Will change to require.NoError for better reporting.

@@ -156,7 +169,7 @@ func MustCreateNamespace(ctx context.Context, clienset *kubernetes.Clientset, na
},
}, metav1.CreateOptions{})

if !apierrors.IsAlreadyExists(err) {
if apierrors.IsAlreadyExists(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So IsAlreadyExists(err) will return true if err is related to already existing resource.
This is bypassing all other errors. What we want :

  1. If there is already existing resource or nil error then return nil.
  2. Otherwise return the actual error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, should we rename the function from MustCreateNamespace -> NamespaceMustExist? To capture that we want to create the namespace or use the one that is existing?

!IsAlreadyExists(err) will never catch anything as well. As err would have to be nil for it to report false and then line 173 would be a false positive.

Copy link
Contributor Author

@jpayne3506 jpayne3506 Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add additional error checking to capture errors other than metav1.StatusReasonAlreadyExists

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be better to add additional error checking. Thanks !

test/internal/datapath/datapath-win.go Outdated Show resolved Hide resolved
t.Log("Successfully created customer windows pods")
} else {
// Checks namespace already exists from previous attempt
t.Log("Namespace already exists")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to check if pods exists as well or not ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should. No reason not to.

test/internal/datapath/datapath-win.go Outdated Show resolved Hide resolved
return nil
}

func invokeWebRequestPassedWindows(output string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not invoking web request, this is checking the result. So we wan change the name of the function appropriately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

test/internal/datapath/datapath-win.go Outdated Show resolved Hide resolved
@jpayne3506 jpayne3506 force-pushed the jpayne3506/datapathWin branch 2 times, most recently from c9d6a22 to 1478aa4 Compare June 20, 2023 01:07
if err != nil {
return errors.Wrap(err, fmt.Sprintf("Getting pod %s failed with %v", firstPod.Name, err))
}
logrus.Infof("First pod: %v", firstPod.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also print this pod IP addresses here (ipv4 && ipv6) so that we can know source and destination IP that's more clear.
I see you print ip address on second pod below:
logrus.Infof("Second pod: %v %v", secondPod.Name, secondPod.Status.PodIP)

test/internal/datapath/datapath_win.go Show resolved Hide resolved
@jpayne3506 jpayne3506 force-pushed the jpayne3506/datapathWin branch 2 times, most recently from b8eb182 to f567793 Compare June 20, 2023 18:40
@jpayne3506 jpayne3506 requested a review from vipul-21 June 20, 2023 18:40
spec:
containers:
- name: windows-container
image: mcr.microsoft.com/dotnet/framework/samples:aspnetapp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this in the initial review. Can we use a smaller image ? The size is way aroung 1.5 Gb.
This one: mcr.microsoft.com/windows/nanoserver:ltsc2022 size is around 100mb, but not sure if it resolves all the cmds you want to execute.

@jpayne3506 jpayne3506 force-pushed the jpayne3506/datapathWin branch 7 times, most recently from 11cdc4f to 761d570 Compare June 21, 2023 00:42
@jpayne3506 jpayne3506 merged commit 390977d into Azure:master Jun 21, 2023
@jpayne3506 jpayne3506 deleted the jpayne3506/datapathWin branch June 21, 2023 17:27
jpayne3506 added a commit that referenced this pull request Sep 12, 2023
* ci: Transfer files

* test: Working Datapath Test

* test: apierror Tests

* style: Datapath Package

* test: Deployment timing

* fix: Error check

* fix: Lint

(cherry picked from commit 390977d)
jpayne3506 added a commit that referenced this pull request Sep 17, 2023
* ci: Transfer files

* test: Working Datapath Test

* test: apierror Tests

* style: Datapath Package

* test: Deployment timing

* fix: Error check

* fix: Lint

(cherry picked from commit 390977d)
jpayne3506 added a commit that referenced this pull request Sep 18, 2023
* ci: Transfer files

* test: Working Datapath Test

* test: apierror Tests

* style: Datapath Package

* test: Deployment timing

* fix: Error check

* fix: Lint

(cherry picked from commit 390977d)
jpayne3506 added a commit that referenced this pull request Sep 22, 2023
* build azure-vnet-telemetry and azure-vnet-ipam in dropgz-test (#1846)

build azure-vnet-telemetry and azure-vnet-ipam in dropgz-test for parity with release image

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
(cherry picked from commit f619259)

* ci: disable kube-proxy for test clusters (#1965)

* disable kube-proxy for byocni cluster creation

* test config mapping

* shell pwd

* use CURDIR

* check current directory

* test with repo root dir

* test azp format

* test azp format

* test azp format

* change e2e steps to remove kube proxy

* fix load test update args

* fix ns and rg in update

* update ciliume2e

* fix kubectl cmd in load test

* adding new targets for no kube proxy

* remove cluster update

* update overlay e2e

* test behavior of load test

* test grep for azure-cns

* look for container deployment

* testing

* restart node variable check

* update if condition

* add skip node case

---------

Co-authored-by: tamilmani1989 <tamanoha@microsoft.com>
(cherry picked from commit 024819d)

* CI: [CNI] Replace the bash scripts for CNI load testing with golang test cases (#2003)

CI:[CNI] Replace the bash scripts with the golang test cases
(cherry picked from commit 008ae45)

* ci: [CNI] Move Nightly Cilium Pipeline test to ACN (#1963)

* CNS to be able to generate dualstack overaly CNI conflist (#1981)

* fix: Eliminating duplicate lines

* ci: Add update permission for ciliumidentity

* fix: Parameterize Image Registry

add retry to nnc update during scaledown (#1970)

* add retry to nnc update during scaledown

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>

* test for panic in pool monitor

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>

---------

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>

fix: reserve 0th IP as gateway for overlay on Windows (#1968)

* fix: reserve 0th IP as gateway for overlay on Windows

* fix: allow gateway to be updated

ci: windows profile container image (#1988)

Always use 0 for NC version in Overlay (#1979)

always use 0 for NC version in overlay

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>

[Vnet Scale - CNS]: Flattening CIDR ranges for Node NNC to a list (#1921)

* Read secondary CIDRs from VnetScale NNC

* fix comment

* update comment

* For VnetScale mode, Use 1st IP for def gateway instead of 0th for windows

* fix/add import

* address pr comments

* add comments

* address pr comments

* wrap error

* fix typo

* fix UT

fix: [NPM] check if policy exists in case of nil pointer (#1974)

fix: check for nil first

ci: disable kube-proxy for test clusters (#1965)

* disable kube-proxy for byocni cluster creation

* test config mapping

* shell pwd

* use CURDIR

* check current directory

* test with repo root dir

* test azp format

* test azp format

* test azp format

* change e2e steps to remove kube proxy

* fix load test update args

* fix ns and rg in update

* update ciliume2e

* fix kubectl cmd in load test

* adding new targets for no kube proxy

* remove cluster update

* update overlay e2e

* test behavior of load test

* test grep for azure-cns

* look for container deployment

* testing

* restart node variable check

* update if condition

* add skip node case

---------

Co-authored-by: tamilmani1989 <tamanoha@microsoft.com>

perf: [WIN-NPM] fast bootup (#1900)

* wip

* wip2

* use other apply DP func

* address comment about if statement

* finish bootup for both DPs

* fix lint

* fix lint 2

* fix lint 3

* longer UT timeout and add missing UTs for apply in background

tool: [NPM] script to clean up iptable chains (#1978)

tool: script to clean up NPM iptable chains

feat: [WIN-NPM] metrics for latencies and failures (#1959)

* implement metrics

* add npm prefix

* rename windows files

* metrics pkg UTs

* allow reinitializing prometheus metrics

* fix: hns wrapper should not throw error for empty SetPolicy values

* test: metric UTs in dataplane

* fix: record list endpoint latency always

* remove flaky UT

* feat: metric for max ipset members

* fix lint

* fix lint 2

* fix build

* fix lint 3

* simplify conditionals and protect against maxMembers becoming negative

* remove bottom 4 histogram buckets. start at 16 ms

* reset metrics for ipset UTs

* style: don't check for windows dp in *_windows.go files

* build: remove unused import

* test: reset windows metrics in UT

Remove SSH port 22 rule from aks-engine clusters (#1983)

ci: change overlaye2e stage to cilium-overlay (#1997)

* renaming overlaye2e for cilium

* update display names for stages

Initial getHomeAZ 404 changes (#1994)

* initial getHomeAZ 404 changes

* treat 404 as success

* address comments

CNS to be able to generate dualstack overaly CNI conflist (#1981)

fix: Parameterize Image Registry

add retry to nnc update during scaledown (#1970)

* add retry to nnc update during scaledown

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>

* test for panic in pool monitor

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>

---------

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>

fix: reserve 0th IP as gateway for overlay on Windows (#1968)

* fix: reserve 0th IP as gateway for overlay on Windows

* fix: allow gateway to be updated

ci: windows profile container image (#1988)

Always use 0 for NC version in Overlay (#1979)

always use 0 for NC version in overlay

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>

[Vnet Scale - CNS]: Flattening CIDR ranges for Node NNC to a list (#1921)

* Read secondary CIDRs from VnetScale NNC

* fix comment

* update comment

* For VnetScale mode, Use 1st IP for def gateway instead of 0th for windows

* fix/add import

* address pr comments

* add comments

* address pr comments

* wrap error

* fix typo

* fix UT

fix: [NPM] check if policy exists in case of nil pointer (#1974)

fix: check for nil first

ci: disable kube-proxy for test clusters (#1965)

* disable kube-proxy for byocni cluster creation

* test config mapping

* shell pwd

* use CURDIR

* check current directory

* test with repo root dir

* test azp format

* test azp format

* test azp format

* change e2e steps to remove kube proxy

* fix load test update args

* fix ns and rg in update

* update ciliume2e

* fix kubectl cmd in load test

* adding new targets for no kube proxy

* remove cluster update

* update overlay e2e

* test behavior of load test

* test grep for azure-cns

* look for container deployment

* testing

* restart node variable check

* update if condition

* add skip node case

---------

Co-authored-by: tamilmani1989 <tamanoha@microsoft.com>

perf: [WIN-NPM] fast bootup (#1900)

* wip

* wip2

* use other apply DP func

* address comment about if statement

* finish bootup for both DPs

* fix lint

* fix lint 2

* fix lint 3

* longer UT timeout and add missing UTs for apply in background

tool: [NPM] script to clean up iptable chains (#1978)

tool: script to clean up NPM iptable chains

feat: [WIN-NPM] metrics for latencies and failures (#1959)

* implement metrics

* add npm prefix

* rename windows files

* metrics pkg UTs

* allow reinitializing prometheus metrics

* fix: hns wrapper should not throw error for empty SetPolicy values

* test: metric UTs in dataplane

* fix: record list endpoint latency always

* remove flaky UT

* feat: metric for max ipset members

* fix lint

* fix lint 2

* fix build

* fix lint 3

* simplify conditionals and protect against maxMembers becoming negative

* remove bottom 4 histogram buckets. start at 16 ms

* reset metrics for ipset UTs

* style: don't check for windows dp in *_windows.go files

* build: remove unused import

* test: reset windows metrics in UT

Remove SSH port 22 rule from aks-engine clusters (#1983)

ci: change overlaye2e stage to cilium-overlay (#1997)

* renaming overlaye2e for cilium

* update display names for stages

Initial getHomeAZ 404 changes (#1994)

* initial getHomeAZ 404 changes

* treat 404 as success

* address comments

CNS to be able to generate dualstack overaly CNI conflist (#1981)

* fix: File Directory

* style: Comments

* Addressing Comments

---------

Co-authored-by: Paul Johnston <35265851+pjohnst5@users.noreply.github.com>
(cherry picked from commit 1514d95)

* ci:[CNI] Add windows CNIv1 datapath test (#2016)

* ci: Transfer files

* test: Working Datapath Test

* test: apierror Tests

* style: Datapath Package

* test: Deployment timing

* fix: Error check

* fix: Lint

(cherry picked from commit 390977d)

* fix: [CNI] CNI load test failing due to namespace already created (#2031)

fix: CNI load test failing due to namespace already created
(cherry picked from commit c10900e)

* ci:[CNI] Windows cniv1 load test pipeline (#2024)

CI:[CNI] Windows cniv1 load test pipeline
(cherry picked from commit e45ad21)

* ci: [CNI] Adding aks cluster creation steps for k8s e2e test (#2052)

* ci: [CNI] Adding aks cluster creation steps for k8s e2e test

* Add  validate step to the pipeline

* Adding the telemetry config to the cluster

(cherry picked from commit 846e508)

* ci:[CNI] Replace AKS-Engine Tests with k8s conformance tests (#2062)

* Initial Commit

* Add attempts to prevent flakyness

* Add taint for windows tests

* Add k8s e2e tests

* Testing vmSizes

* Artifact k8se2e binary

* Remove NPM E2E

* Add testing and increase processes

* Addressing comments

(cherry picked from commit 451c691)

* CI: Removing AKS engine related code (#2089)

(cherry picked from commit b45c2c7)

* feat: [dropgz] Dropgz for windows (#2075)

* feat: [dropgz] Dropgz for windows

* Removing the code for killing the process from dropgz for windows

(cherry picked from commit 7a41178)

* ci: Update dns tests for k8s conformance (#2104)

Update dns tests for k8s v1.26

(cherry picked from commit bbf2fd4)

* ci: adding cni package as a trigger (#2108)

(cherry picked from commit e6a8ea6)

* ci: add packages for submodule trigger (#2154)

(cherry picked from commit 4aecfd6)

* set mellanox reg key (#1768)

(cherry picked from commit fa2de6d)

---------

Co-authored-by: Evan Baker <rbtr@users.noreply.github.com>
Co-authored-by: Camryn Lee <31013536+camrynl@users.noreply.github.com>
Co-authored-by: Vipul Singh <vipul21sept@gmail.com>
Co-authored-by: Rajvi <107083915+rajvinar@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Infra or tooling. cni Related to CNI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants