Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExternalIntegration register function periodically fails #2129

Closed
hokiegeek2 opened this issue Feb 9, 2023 · 0 comments · Fixed by #2130
Closed

ExternalIntegration register function periodically fails #2129

hokiegeek2 opened this issue Feb 9, 2023 · 0 comments · Fixed by #2130
Assignees
Labels
bug Something isn't working

Comments

@hokiegeek2
Copy link
Contributor

Describe the bug
When metrics collection is enabled by specifying the following CLI args, a 403 error is periodically returned by the Kubernetes API call used to register the Arkouda and Arkouda metrics endpoints as Kubernetes services :

--ServerDaemon.daemonTypes=ServerDaemonType.INTEGRATION,ServerDaemonType.METRICS

The error was observed on one specific cluster when Arkouda is launched via bare-metal or slurm where CHPL_COMM=gasnet and CHPL_COMM_SUBSTRATE=udp. The concern is that this error will be encountered randomly.

To Reproduce
Start Arkouda with the following command:

export ARKOUDA_VERSION=2023.01.11
export NUMBER_OF_LOCALES=3

/opt/arkouda-$ARKOUDA_VERSION/arkouda_server -nl $NUMBER_OF_LOCALES \
                   --ExternalIntegration.systemType=SystemType.KUBERNETES \
                   --ServerDaemon.daemonTypes=ServerDaemonType.INTEGRATION,ServerDaemonType.METRICS \
                   --memTrack=true 

Expected behavior
Arkouda should startup normally and register both the Arkouda client-server and Arkouda metrics ports as Kubernetes services (example logging below):

2023-02-08:19:04:01 [ExternalIntegration] registerAsExternalService Line 446 DEBUG [Chapel] Registered endpoint via payload {"kind": "Endpoints","apiVersion": "v1", "metadata": {"name": "arkouda-external"}, "subsets": [{"addresses": [{"ip": "192.168.1.22"}],"ports": [{"port": 5555, "protocol": "TCP"}]}]} and endpointUrl https://192.168.1.25:6443/api/v1/namespaces/arkouda/endpoints
2023-02-08:19:04:01 [ExternalIntegration] registerWithExternalSystem Line 520 DEBUG [Chapel] Registered Arkouda with Kubernetes

...

2023-02-08:19:04:01 [ExternalIntegration] registerAsExternalService Line 446 DEBUG [Chapel] Registered endpoint via payload {"kind": "Endpoints","apiVersion": "v1", "metadata": {"name": "arkouda-external-metrics"}, "subsets": [{"addresses": [{"ip": "192.168.1.22"}],"ports": [{"port": 5556, "protocol": "TCP"}]}]} and endpointUrl https://192.168.1.25:6443/api/v1/namespaces/arkouda/endpoints
2023-02-08:19:04:01 [ExternalIntegration] registerWithExternalSystem Line 520 DEBUG [Chapel] Registered Arkouda with Kubernetes

Error Message
403

Is this a Blocking Issue
yes, prevents metrics-enabled Arkouda from running

Additional context
This bug has only been observed for bare-metal and slurm-launched Arkouda, not Arkouda-on-Kubernetes

@hokiegeek2 hokiegeek2 added the bug Something isn't working label Feb 9, 2023
@hokiegeek2 hokiegeek2 self-assigned this Feb 9, 2023
hokiegeek2 added a commit to hokiegeek2/arkouda that referenced this issue Feb 9, 2023
stress-tess pushed a commit that referenced this issue Feb 9, 2023
…Arkouda (#2130)

* fix for periodic 403 error for integration and metrics-enabled Arkouda #2129

* merge in changes from upstream master branch #2129
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant