Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.19.2 error while sniffing nodes #852

Closed
resoglas opened this issue Sep 7, 2020 · 15 comments
Closed

1.19.2 error while sniffing nodes #852

resoglas opened this issue Sep 7, 2020 · 15 comments
Assignees
Labels

Comments

@resoglas
Copy link

resoglas commented Sep 7, 2020

error while sniffing nodes

Description

Fresh installation of fusionauth-app:1.19.2 with fresh PostgreSQL 11 and a fresh ES 7.8.1 (7.6.2 was tried also) cluster fails.

Affects versions

  • FusionAuth 1.19.0-1.19.2 with Elasticsearsh 7.6.2, 7.8.1 clusters and PostgreSQL 11

Steps to reproduce

  1. kubectl apply -f https://download.elastic.co/downloads/eck/1.2.1/all-in-one.yaml
  2. Depoloy an example https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-deploy-elasticsearch.html with 3 nodes
  3. Deploy FusionAuth app
  4. Watch logs of FA app - successfully connects to PostgreSQL, to ES cluster, finishes kickstarting, and after little time throws the error;

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

fusionauth-deploy.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: fusionauth
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: fusionauth-cert
  namespace: fusionauth
spec:
  secretName: fusionauth-self-signed-tls
  dnsNames:
    - fusionauth.fusionauth.svc.cluster.local
    - fusionauth.fusionauth
    - fusionauth
  issuerRef:
    name: ca-issuer
    kind: ClusterIssuer
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: fusionauth-cm
  namespace: fusionauth
data:
  fusionauth.properties: |-
    database.url=jdbc:postgresql://{{postgresql_host}}:{{postgresql_port}}/{{postgresql_database}}
    database.username={{postgresql_user}}
    database.password={{postgresql_password}}

    search.type=elasticsearch
    search.servers=https://{{elasticsearch_user}}:{{elasticsearch_password}}@elasticsearch-es-http.elasticsearch:9200

    fusionauth-app.management-port=9010
    fusionauth-app.http-port=9011
    fusionauth-app.https-port=9013
    fusionauth-app.ajp-port=9019
    fusionauth-app.memory=512M
    fusionauth-app.cookie-same-site-policy=Lax
    fusionauth-app.runtime-mode=production
---
kind: Service
apiVersion: v1
metadata:
  namespace: fusionauth
  name: fusionauth-client
  labels:
    app: fusionauth
    type: ClusterIP
spec:
  type: ClusterIP
  ports:
    - port: 443
      targetPort: 9013
      protocol: TCP
      name: https
    - port: 80
      targetPort: 9011
      protocol: TCP
      name: http
  selector:
    app: fusionauth
---
apiVersion: v1
kind: Secret
metadata:
  name: fusionauth-kickstart
  namespace: fusionauth
data:
  kickstart.init: {{kickstart_init}}
  kickstart.application: {{kickstart_application}}
  kickstart.admin: {{kickstart_admin}}
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fusionauth
  namespace: fusionauth
spec:
  selector:
    matchLabels:
      app: "fusionauth"
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  template:
    metadata:
      name: fusionauth
      namespace: fusionauth
      labels:
        app: fusionauth
    spec:
      volumes:
        - name: ca-crt
          secret:
            secretName: fusionauth-self-signed-tls
            items:
              - key: ca.crt
                path: ca.crt
        - name: openjdk-security
          emptyDir: {}
        - name: fusionauth-config
          emptyDir: {}
        - name: config-volume
          configMap:
            name: fusionauth-cm
            optional: false
            items:
              - key: fusionauth.properties
                path: fusionauth.properties
        - name: kickstart-volume
          secret:
            secretName: fusionauth-kickstart
            items:
              - key: kickstart.init
                path: kickstart.json
              - key: kickstart.application
                path: requests/application-ihp.json
              - key: kickstart.admin
                path: requests/user-admin.json
      initContainers:
        - name: fusionauth-config
          image: {{image_fusionauth}}
          securityContext:
            runAsUser: 0
          volumeMounts:
            - name: ca-crt
              mountPath: /tmp/certs
            - name: openjdk-security
              mountPath: /tmp/security
            - name: fusionauth-config
              mountPath: /tmp/fa-config-merged
            - name: config-volume
              mountPath: /tmp/fa-config
          command:
            - sh
            - -c
            - keytool -importcert -noprompt -keystore /opt/openjdk/lib/security/cacerts -storepass changeit -file /tmp/certs/ca.crt;
              cp -R /opt/openjdk/lib/security/. /tmp/security/;
              cp -R /usr/local/fusionauth/config/. /tmp/fa-config-merged/;
              rm /tmp/fa-config-merged/fusionauth.properties;
              cp /tmp/fa-config/fusionauth.properties /tmp/fa-config-merged/fusionauth.properties
      containers:
        - name: fusionauth
          image: {{image_fusionauth}}
          volumeMounts:
            - name: openjdk-security
              mountPath: /opt/openjdk/lib/security
            - name: kickstart-volume
              mountPath: /usr/local/fusionauth/kickstart
            - name: fusionauth-config
              mountPath: /usr/local/fusionauth/config
          ports:
            - containerPort: 9011
              name: http
            - containerPort: 9013
              name: https
          env:
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: FUSIONAUTH_APP_URL
              value: http://$(POD_IP):9011
            - name: FUSIONAUTH_KICKSTART
              value: /usr/local/fusionauth/kickstart/kickstart.json
          resources:
            requests:
              memory: "512Mi"
            limits:
              memory: "512Mi"
          livenessProbe:
            httpGet:
              path: /
              port: http
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /
              port: http
          startupProbe:
            httpGet:
              path: /
              port: http
            failureThreshold: 20
            periodSeconds: 10

Platform

  • Kubernetes

Additional context

org.elasticsearch.client.sniff.Sniffer run
SEVERE: error while sniffing nodes
org.apache.http.ConnectionClosedException: Connection is closed
	at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:813)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
	at org.elasticsearch.client.sniff.ElasticsearchNodesSniffer.sniff(ElasticsearchNodesSniffer.java:105)
	at org.elasticsearch.client.sniff.Sniffer.sniff(Sniffer.java:209)
	at org.elasticsearch.client.sniff.Sniffer$Task.run(Sniffer.java:139)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
	at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: org.apache.http.ConnectionClosedException: Connection is closed
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.endOfInput(HttpAsyncRequestExecutor.java:356)
	at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:261)
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
	at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
	... 1 more
@robotdan
Copy link
Member

robotdan commented Sep 8, 2020

Thanks for the issue @resoglas - we've seen that error a few times as well. In your case, is it just noise in the logs, or is it causing the connection to Elasticsearch from FusionAuth to fail all together?

@robotdan robotdan self-assigned this Sep 8, 2020
@robotdan robotdan added the triage label Sep 8, 2020
@resoglas
Copy link
Author

resoglas commented Sep 8, 2020

It seems that connection fails all together. Users list gives an error:

Screenshot from 2020-09-08 21-52-13

@robotdan
Copy link
Member

robotdan commented Sep 9, 2020

Thanks @resoglas .

When testing with Docker locally and the new "sniffer" config, I had to tell Elasticsearch how to publish the port and host so it didn't use the internal Docker IP and port. Otherwise, it would connect over my specified connection, and then the node would tell the client about it's IP address which was not visible to FusionAuth.

For example, old start command:

docker run -p 9021:9200 -e 'discovery.type=single-node' docker.elastic.co/elasticsearch/elasticsearch:7.6.1"

New command , adding -e 'http.publish_host=localhost' and -e 'http.publish_port=9021'.

docker run -p 9021:9200 -e 'discovery.type=single-node' -e 'http.publish_host=localhost' -e 'http.publish_port=9021' docker.elastic.co/elasticsearch/elasticsearch:7.6.1

When running in Docker Compose, I didn't seem to need this when using the bridge network which makes sense I suppose. I don't know for sure how this translates to K8s.

@resoglas
Copy link
Author

Thanks @robotdan . I've dug down some deeper and found in ES logs the following messages repeating over and over received plaintext http traffic on an https channel, those "plaintext messages" are coming from FusionAuth pod, although as you can see in my configuration I use search.servers=https://{{elasticsearch_user}}:{{elasticsearch_password}}@elasticsearch-es-http.elasticsearch:9200. Could it be that the "sniffer" is ignoring https://?

@ceefour
Copy link

ceefour commented Sep 12, 2020

Also getting this after upgrading from 1.17.3 to 1.19.3 #857

@robotdan
Copy link
Member

I think this is due to the publish addresses of Elasticsearch. Here is a good article on the issue: https://www.elastic.co/blog/elasticsearch-sniffing-best-practices-what-when-why-how

See the "But we can fix that" section.

So if that fixes the issue, we can document this much better, or perhaps look into making this new Sniffer configuration optional as it doesn't play real nice with Docker.

@resoglas
Copy link
Author

resoglas commented Sep 14, 2020

So I have changed the http.publish_host to ${POD_NAME}.elasticsearch-es-default.elasticsearch.svc.cluster.local and tried connecting to the ES Cluster from another Pod using the following code:

'use strict'

const { Client } = require('@elastic/elasticsearch')
const {URL} = require('url');
const fs = require('fs')

const client = new Client({
    node: {
        url: new URL('https://user:password@elasticsearch-es-http.elasticsearch.svc.cluster.local:9200'),
    },
    ssl: {
        ca: fs.readFileSync('../app/ca.crt')
    },
    sniffOnStart: true,
    sniffInterval: 1000,
})

client.on('sniff', (err, result) => {
    console.log(result.body.nodes)
})

I have got a successful sniff response containing 3 nodes:

{
  LrIsZybiRQuDi4ab_a0QeQ: {
    name: 'elasticsearch-es-default-1',
    transport_address: '10.2.2.34:9300',
    host: '10.2.2.34',
    ip: '10.2.2.34',
    version: '7.8.1',
    build_flavor: 'default',
    build_type: 'docker',
    build_hash: '...',
    roles: [
      'data',
      'ingest',
      'master',
      'ml',
      'remote_cluster_client',
      'transform'
    ],
    attributes: {
      'ml.machine_memory': '...',
      'ml.max_open_jobs': '20',
      'xpack.installed': 'true',
      'transform.node': 'true'
    },
    http: {
      bound_address: [Array],
      publish_address: 'elasticsearch-es-default-1.elasticsearch-es-default.elasticsearch.svc.cluster.local/10.2.2.34:9200',
      max_content_length_in_bytes: ...
    }
  },
  rXYyTgCJSQmlHwbZ32257A: {
    name: 'elasticsearch-es-default-0',
    transport_address: '10.2.0.213:9300',
    host: '10.2.0.213',
    ip: '10.2.0.213',
    version: '7.8.1',
    build_flavor: 'default',
    build_type: 'docker',
    build_hash: '...',
    roles: [
      'data',
      'ingest',
      'master',
      'ml',
      'remote_cluster_client',
      'transform'
    ],
    attributes: {
      'ml.machine_memory': '...',
      'xpack.installed': 'true',
      'transform.node': 'true',
      'ml.max_open_jobs': '20'
    },
    http: {
      bound_address: [Array],
      publish_address: 'elasticsearch-es-default-0.elasticsearch-es-default.elasticsearch.svc.cluster.local/10.2.0.213:9200',
      max_content_length_in_bytes: ...
    }
  },
  zVUnh9VuRYy4mwb8RbDzDQ: {
    name: 'elasticsearch-es-default-2',
    transport_address: '10.2.0.212:9300',
    host: '10.2.0.212',
    ip: '10.2.0.212',
    version: '7.8.1',
    build_flavor: 'default',
    build_type: 'docker',
    build_hash: '...',
    roles: [
      'data',
      'ingest',
      'master',
      'ml',
      'remote_cluster_client',
      'transform'
    ],
    attributes: {
      'ml.machine_memory': '...',
      'ml.max_open_jobs': '20',
      'xpack.installed': 'true',
      'transform.node': 'true'
    },
    http: {
      bound_address: [Array],
      publish_address: 'elasticsearch-es-default-2.elasticsearch-es-default.elasticsearch.svc.cluster.local/10.2.0.212:9200',
      max_content_length_in_bytes: ...
    }
  }
}

And I am able to curl --insecure https://username:password@elasticsearch-es-default-2.elasticsearch-es-default.elasticsearch.svc.cluster.local:9200 successfuly from within the same Pod.

FusionAuth still seems to fail with the same error though... Is there something else I am missing or are there maybe more detailed logs I could find? Thanks!

P. S. Maybe this elastic/cloud-on-k8s#3182 is somewhat related

@robotdan
Copy link
Member

P. S. Maybe this elastic/cloud-on-k8s#3182 is somewhat related

Yes, thanks for the link - that looks to be the same issue for sure.

@resoglas
Copy link
Author

FusionAuth v1.18.8 seems to have no problem at all sniffing ES cluster using the following config (which is basically the same as with v1.19+):

    database.url=jdbc:postgresql://{{postgresql_host}}:{{postgresql_port}}/{{postgresql_database}}
    database.username={{postgresql_user}}
    database.password={{postgresql_password}}

    fusionauth-app.search-engine-type=elasticsearch
    fusionauth-app.search-servers=https://{{elasticsearch_user}}:{{elasticsearch_password}}@elasticsearch-es-http.elasticsearch:9200

    fusionauth-app.management-port=9010
    fusionauth-app.http-port=9011
    fusionauth-app.https-port=9013
    fusionauth-app.ajp-port=9019
    fusionauth-app.memory=512M
    fusionauth-app.additional-java-args=
    fusionauth-app.cookie-same-site-policy=Lax
    fusionauth.runtime-mode=production

ElasticSearch v7.8.1 nodes immediately respond with a successful index creation/updating message of fusionauth_user.

The question being - did FusionAuth versions prior to 1.19 were not sniffing for ES cluster nodes?

@resoglas
Copy link
Author

I have disabled the TLS configuration in ES cluster just to see that this is not a network error and now I am getting:

Sep 18, 2020 1:43:21 PM org.elasticsearch.client.sniff.Sniffer run
SEVERE: error while sniffing nodes
org.elasticsearch.client.ResponseException: method [GET], host [http://elasticsearch-es-default-1.elasticsearch-es-default.elasticsearch.svc.cluster.local:9200], URI [/_nodes/http?timeout=1000ms], status line [HTTP/1.1 401 Unauthorized]
{"error":{"root_cause":[{"type":"security_exception","reason":"missing authentication credentials for REST request [/_nodes/http?timeout=1000ms]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}}],"type":"security_exception","reason":"missing authentication credentials for REST request [/_nodes/http?timeout=1000ms]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}},"status":401}
	at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
	at org.elasticsearch.client.sniff.ElasticsearchNodesSniffer.sniff(ElasticsearchNodesSniffer.java:105)
	at org.elasticsearch.client.sniff.Sniffer.sniff(Sniffer.java:209)
	at org.elasticsearch.client.sniff.Sniffer$Task.run(Sniffer.java:139)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
	at java.base/java.lang.Thread.run(Thread.java:832)

I now strongly believe that the Sniffer is missing appropriate configurations for Sniffer Scheme (HTTP or HTTPS) and Authentication for Username and Password.

@robotdan
Copy link
Member

The question being - did FusionAuth versions prior to 1.19 were not sniffing for ES cluster nodes?

This is new in version 1.19.x.

I now strongly believe that the Sniffer is missing appropriate configurations for Sniffer Scheme (HTTP or HTTPS) and Authentication for Username and Password.

Interesting, we can take a look at this.

@robotdan
Copy link
Member

robotdan commented Sep 22, 2020

The sniffer config takes the rest client which we have already configured with credentials, so it seems this should be ok. We'll have to try to recreate.

Maybe related: elastic/kibana#42224

@robotdan
Copy link
Member

In 1.19.8 (#893) the sniffer is off by default. This should resolve the issue for you.

Please re-open if you encounter an error with the sniffer disabled.

@ceefour
Copy link

ceefour commented Jan 12, 2021

Thank you @robotdan :)

@robotdan
Copy link
Member

robotdan commented Feb 4, 2022

Closing, please re-open if this is still an issue.

@robotdan robotdan closed this as completed Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants