Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cks: use HttpsURLConnection for checking api server #4639

Merged
merged 2 commits into from Mar 2, 2021

Conversation

shwstppr
Copy link
Contributor

@shwstppr shwstppr commented Feb 2, 2021

Description

Frobable fix for #4146 , #4637

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
@shwstppr
Copy link
Contributor Author

shwstppr commented Feb 2, 2021

@blueorangutan package

@blueorangutan
Copy link

@shwstppr a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✖centos8 ✔debian. JID-2629

@rohityadavcloud rohityadavcloud added this to the 4.14.1.0 milestone Feb 2, 2021
@rohityadavcloud
Copy link
Member

@blueorangutan test

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@@ -218,7 +221,10 @@ public static boolean isKubernetesClusterServerRunning(final KubernetesCluster k
boolean k8sApiServerSetup = false;
while (System.currentTimeMillis() < timeoutTime) {
try {
String versionOutput = IOUtils.toString(new URL(String.format("https://%s:%d/version", ipAddress, port)), StringUtils.getPreferredCharset());
URL url = new URL(String.format("https://%s:%d/version", ipAddress, port));
HttpsURLConnection con = (HttpsURLConnection)url.openConnection();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we configure it to ignore ssl cert validation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and get more static analysis reports as issues?

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
@shwstppr
Copy link
Contributor Author

shwstppr commented Feb 2, 2021

@blueorangutan package

@blueorangutan
Copy link

@shwstppr a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✖centos8 ✔debian. JID-2637

@nxsbi
Copy link

nxsbi commented Feb 2, 2021

Packaging result: ✔centos7 ✖centos8 ✔debian. JID-2637

If I want to test this, how do I do so? You will have to give instructions to obtain/install the packages in my environment.

@blueorangutan
Copy link

Trillian test result (tid-3471)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 38279 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4639-t3471-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Smoke tests completed. 82 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_07_deploy_kubernetes_ha_cluster Failure 3612.86 test_kubernetes_clusters.py
test_08_deploy_and_upgrade_kubernetes_ha_cluster Failure 0.05 test_kubernetes_clusters.py
test_09_delete_kubernetes_ha_cluster Failure 0.05 test_kubernetes_clusters.py
ContextSuite context=TestKubernetesCluster>:teardown Error 112.05 test_kubernetes_clusters.py

@shwstppr
Copy link
Contributor Author

shwstppr commented Feb 3, 2021

Packaging result: heavy_check_markcentos7 heavy_multiplication_xcentos8 heavy_check_markdebian. JID-2637

If I want to test this, how do I do so? You will have to give instructions to obtain/install the packages in my environment.

@nxsbi do you want to test the fix in 4.15 or 4.14 or it doesn't matter?

@nxsbi
Copy link

nxsbi commented Feb 3, 2021

Packaging result: heavy_check_markcentos7 heavy_multiplication_xcentos8 heavy_check_markdebian. JID-2637

If I want to test this, how do I do so? You will have to give instructions to obtain/install the packages in my environment.

@nxsbi do you want to test the fix in 4.15 or 4.14 or it doesn't matter?

@shwstppr I want to test in 4.15.

@shwstppr
Copy link
Contributor Author

shwstppr commented Feb 3, 2021

okay @nxsbi, I'll share the 4.15 packages with the fix soon

@shwstppr
Copy link
Contributor Author

shwstppr commented Feb 3, 2021

@nxsbi please find build packages here, http://download.cloudstack.org/testing/is-4637/
They are built against current 4.15 branch.

@blueorangutan
Copy link

Trillian test result (tid-3482)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 40661 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4639-t3482-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Smoke tests completed. 82 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_deploy_kubernetes_cluster Failure 3605.90 test_kubernetes_clusters.py
test_02_invalid_upgrade_kubernetes_cluster Failure 3622.01 test_kubernetes_clusters.py
test_03_deploy_and_upgrade_kubernetes_cluster Failure 0.08 test_kubernetes_clusters.py
test_04_deploy_and_scale_kubernetes_cluster Failure 0.08 test_kubernetes_clusters.py
test_05_delete_kubernetes_cluster Failure 0.08 test_kubernetes_clusters.py
test_07_deploy_kubernetes_ha_cluster Failure 0.07 test_kubernetes_clusters.py
test_08_deploy_and_upgrade_kubernetes_ha_cluster Failure 0.07 test_kubernetes_clusters.py
test_09_delete_kubernetes_ha_cluster Failure 0.07 test_kubernetes_clusters.py
ContextSuite context=TestKubernetesCluster>:teardown Error 107.84 test_kubernetes_clusters.py

@nxsbi
Copy link

nxsbi commented Feb 4, 2021

Tried to do a quick deployment with the new packages --
It failed again... Below are relevant lines from Management Server log.
Are there any additional log files I can look into or additional checks I can do? If yes, please advise..

2021-02-03 17:44:42,964 INFO [c.c.k.c.u.KubernetesClusterUtil] (API-Job-Executor-1:ctx-0078ab8c job-12195 ctx-65cc2bf7) (logid:4ac48e69) Waiting for Kubernetes cluster : k8test master node VMs to be accessible 2021-02-03 17:44:52,965 ERROR [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-1:ctx-0078ab8c job-12195 ctx-65cc2bf7) (logid:4ac48e69) Failed to setup Kubernetes cluster : k8test3 in usable state as unable to access master node VMs of the cluster 2021-02-03 17:44:59,052 INFO [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-1:ctx-0078ab8c job-12195 ctx-65cc2bf7) (logid:4ac48e69) Detached Kubernetes binaries from VM : k8test3-master in the Kubernetes cluster : k8test3 2021-02-03 17:45:02,995 INFO [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-1:ctx-0078ab8c job-12195 ctx-65cc2bf7) (logid:4ac48e69) Detached Kubernetes binaries from VM : k8test3-node-1 in the Kubernetes cluster : k8test3 2021-02-03 17:45:02,998 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-1:ctx-0078ab8c job-12195) (logid:4ac48e69) Complete async job-12195, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Failed to setup Kubernetes cluster : k8test3 in usable state as unable to access master node VMs of the cluster"}

NOTE -- I can login into the Master node via command line with SSH Key, and I can see that the cluster is actually functional, however it does nto report so in the UI (I ssh using the user Core, and then sudo su to root). IF I run the commands as core, I get access denied, as all files are owned by root with user/group as no access.

image

@nxsbi
Copy link

nxsbi commented Feb 4, 2021

@shwstppr - Tested inside Master node, with curl just to see if the certificate issue persists -- Yes it does
Note this is with the new package build you provided

image

@shwstppr
Copy link
Contributor Author

shwstppr commented Feb 4, 2021

@nxsbi can you please share output of,
cat /var/log/cloudstack/management/management-server.log | grep -i 4ac48e69

@nxsbi
Copy link

nxsbi commented Feb 4, 2021

@shwstppr

cat /var/log/cloudstack/management/management-server.log | grep -i 4ac48e69

Here it is. I have cleansed the log for IP, passwords, Keys etc. so you may see XXXXX.. For IP I only changed the management LAN IPs and external IPs. You will see them as 192.168.100.XXX where XXX is the actual last bit of correct IP or as 172.100.100.100
mgmt-4ac48e69cleansed.log

@rohityadavcloud rohityadavcloud modified the milestones: 4.14.1.0, 4.15.1.0 Feb 4, 2021
@nxsbi
Copy link

nxsbi commented Feb 4, 2021

@shwstppr
Management server is able to curl -k the public IP of the VR. But it throws error if using just curl (without -k)

image

@shwstppr
Copy link
Contributor Author

shwstppr commented Feb 4, 2021

@nxsbi service is not being able to connect to SSH with VR public IP and forwarded port (must be 2222), failing here

try (Socket socket = new Socket()) {
socket.connect(new InetSocketAddress(ipAddress, port), 10000);
masterVmRunning = true;
} catch (IOException e) {
if (LOGGER.isInfoEnabled()) {
LOGGER.info(String.format("Waiting for Kubernetes cluster : %s master node VMs to be accessible", kubernetesCluster.getName()));
}

Are you able to manually do SSH from the management server?
ssh -i /root/.ssh/id_rsa.cloud -p 2222 core@public_ip_of_network

Firewall and port forwarding rules must be automatically provisioned by the service in the cluster's network.
Service uses SSH over 2222 to 2222+n ports. SSH on worker nodes are done only during k8s version upgrade

@blueorangutan
Copy link

@shwstppr a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@shwstppr shwstppr marked this pull request as ready for review February 22, 2021 10:58
@shwstppr shwstppr closed this Feb 22, 2021
@shwstppr shwstppr reopened this Feb 22, 2021
@shwstppr shwstppr closed this Feb 23, 2021
@shwstppr shwstppr reopened this Feb 23, 2021
@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2803

@rohityadavcloud
Copy link
Member

LGTM

@rohityadavcloud
Copy link
Member

@shwstppr is this good to go?

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@shwstppr
Copy link
Contributor Author

@

@shwstppr shwstppr closed this Feb 25, 2021
@shwstppr shwstppr reopened this Feb 25, 2021
@shwstppr
Copy link
Contributor Author

shwstppr commented Feb 25, 2021

@weizhouapache possible for you to test this and see if this fixes curl -L issue you highlighted in #4146?

@shwstppr
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@shwstppr a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✖centos7 ✖centos8 ✔debian. JID-2833

@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2839

@rohityadavcloud
Copy link
Member

@blueorangutan test

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-3625)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 36155 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4639-t3625-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_affinity_groups.py
Intermittent failure detected: /marvin/tests/smoke/test_nic.py
Intermittent failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Smoke tests completed. 84 look OK, 2 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_nic Error 49.50 test_nic.py
test_01_migrate_VM_and_root_volume Error 60.96 test_vm_life_cycle.py
test_02_migrate_VM_with_two_data_disks Error 50.01 test_vm_life_cycle.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kubernetes cluster creation Error - Kubernetes cluster kubeconfig not available currently in Isolated Network
6 participants