Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OTLP endpoints are not available #13495

Closed
kaniak274 opened this issue Sep 14, 2022 · 4 comments
Closed

[BUG] OTLP endpoints are not available #13495

kaniak274 opened this issue Sep 14, 2022 · 4 comments
Labels
component/otlp PRs and issues related to OTLP ingest [deprecated] team/agent-platform

Comments

@kaniak274
Copy link

Hello! I found a very strange issue that looks like a bug. Here is the problem:

Agent Environment

I'm using the latest version of the agent (from 2 days ago) with docker-compose:

d-agent:
    image: datadog/agent:latest
    ports:
      - "4317:4317"
      - "4318:4318"
    environment:
     - DD_API_KEY=""
     - DD_SITE=datadoghq.eu
     - DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true
     - DD_APM_ENABLED=true
     - DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_ENDPOINT="0.0.0.0:4317"
     - DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT="0.0.0.0:4318"
    volumes:
     - /var/run/docker.sock:/var/run/docker.sock
     - /proc/:/host/proc/:ro
     - /sys/fs/cgroup:/host/sys/fs/cgroup:ro

Also, I tried other images like 7.35.0 or 7.31.0 did not work on neither of them.

Describe what happened:

I tried the OTLP ingest using the provided documentation. The problem is that neither 4317 nor 4318 ports are not up. It does not look like a problem is within my configuration, unless I should do something that is not described in the docs. I also was searching for the 4317 & 4318 ports but no success either.

I started debugging a little bit and:

I opened the agent shell and executed lsof command inside it:

agent     81459 root    3u  IPv4 271604      0t0  TCP 127.0.0.1:5001 (LISTEN)
agent     81459 root    9u  IPv4 267814      0t0  TCP 127.0.0.1:5000 (LISTEN)
agent     81459 root   12u  IPv4 268716      0t0  TCP 127.0.0.1:32840->127.0.0.1:5001 (ESTABLISHED)
agent     81459 root   13u  IPv4 266826      0t0  TCP 127.0.0.1:32842->127.0.0.1:5001 (ESTABLISHED)
agent     81459 root   14u  IPv4 270646      0t0  TCP 127.0.0.1:5001->127.0.0.1:32840 (ESTABLISHED)
agent     81459 root   15u  IPv4 270647      0t0  TCP 127.0.0.1:5001->127.0.0.1:32842 (ESTABLISHED)
agent     81459 root   17u  IPv6 269700      0t0  UDP *:8125
agent     81459 root   18u  IPv4 267028      0t0  TCP 127.0.0.1:5001->127.0.0.1:32882 (ESTABLISHED)
agent     81459 root   19u  IPv4 270230      0t0  TCP 127.0.0.1:5001->127.0.0.1:32904 (ESTABLISHED)
agent     81459 root   21u  IPv4 273533      0t0  TCP 172.18.0.4:40906->34.107.172.23:443 (ESTABLISHED)
agent     81459 root   22u  IPv4 268744      0t0  TCP 172.18.0.4:45614->172.18.0.2:6379 (ESTABLISHED)
process-a 81460 root    7u  IPv4 269714      0t0  TCP 127.0.0.1:32882->127.0.0.1:5001 (ESTABLISHED)
process-a 81460 root   10u  IPv4 272488      0t0  UDP 127.0.0.1:44847->127.0.0.1:8125
process-a 81460 root   11u  IPv4 270659      0t0  TCP 127.0.0.1:6062 (LISTEN)
process-a 81460 root   13u  IPv4 271622      0t0  TCP 127.0.0.1:6162 (LISTEN)
process-a 81460 root   14u  IPv4 267040      0t0  TCP 172.18.0.4:45146->34.117.218.227:443 (ESTABLISHED)
trace-age 81462 root    7u  IPv4 272504      0t0  UDP 127.0.0.1:33628->127.0.0.1:8125
trace-age 81462 root    8u  IPv4 272505      0t0  TCP 127.0.0.1:32904->127.0.0.1:5001 (ESTABLISHED)
trace-age 81462 root    9u  IPv6 271635      0t0  TCP *:8126 (LISTEN)
trace-age 81462 root   10u  IPv6 271636      0t0  TCP *:5003 (LISTEN)

as you can see there is nothing running on the 4317 nor 4318.

Then I used Google and I found this issue. I started using 5003 port and it worked (the trace was visible in the Datadog UI) but that's probably not what I should do given that it's not described in the documentation ;).

Next, I started searching in the source code and I found this commit that removes the old variables. I was searching in the lastest main for DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_ENDPOINT & DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT but they are never used besides the tests.

Describe what you expected:

I expect that the agent will run something on these ports ;)

Steps to reproduce the issue:

Run the datadog-agent container.

Additional environment details (Operating System, Cloud provider, etc):

Docker, Linux

Thanks in advance for the help!

@kaniak274 kaniak274 changed the title [BUG] OTLP endpoints are not [BUG] OTLP endpoints are not available Sep 14, 2022
@mx-psi
Copy link
Member

mx-psi commented Sep 19, 2022

Thanks for reporting this @kaniak274! This took me a while to understand but I believe I see what happened now.

With this Docker compose file I am able to reproduce your issue:

Faulty Docker compose file (click to expand)
version: "3.9"
services:
  d-agent:
    image: datadog/agent:latest
    ports:
      - "4317:4317"
      - "4318:4318"
    environment:
     - DD_API_KEY=""
     - DD_SITE=datadoghq.eu
     - DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true
     - DD_APM_ENABLED=true
     - DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_ENDPOINT="0.0.0.0:4317"
     - DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT="0.0.0.0:4318"
    volumes:
     - /var/run/docker.sock:/var/run/docker.sock
     - /proc/:/host/proc/:ro
     - /sys/fs/cgroup:/host/sys/fs/cgroup:ro

The main OTLP endpoint fails to start, and I can see the following error in the logs:

CORE | ERROR | (pkg/otlp/collector.go:181 in func1) | Error running the OTLP pipeline: cannot start receivers: listen tcp: lookup tcp/4317": Servname not supported for ai_socktype

EDIT: It's easier to see this on the status; If I run docker exec <container name> agent status I can see at the very end:

====
OTLP
====

  Status: Enabled
  Collector status: Closed
  Error: Error running the OTLP pipeline: cannot start receivers: listen tcp: lookup tcp/4317": Servname not supported for ai_socktype

Everything else is as you describe: port 5003 is available, while the ports defined in the configuration are not. By removing the quotes, I am able to make it work:

Working Docker compose file (click to expand)
version: "3.9"
services:
  d-agent:
    image: datadog/agent:latest
    ports:
      - "4317:4317"
      - "4318:4318"
    environment:
     - DD_API_KEY=""
     - DD_SITE=datadoghq.eu
     - DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true
     - DD_APM_ENABLED=true
     - DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_ENDPOINT=0.0.0.0:4317  # <-- No quotes!
     - DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT=0.0.0.0:4318  # <-- No quotes!
    volumes:
     - /var/run/docker.sock:/var/run/docker.sock
     - /proc/:/host/proc/:ro
     - /sys/fs/cgroup:/host/sys/fs/cgroup:ro

The problem is that Docker compose will pass whatever is to the right of the equals sign to the application, so we got 4317" as the port (notice the quote) which made the setup fail.

Could you confirm if this solves the issue for you?

@kaniak274
Copy link
Author

I will check. Thank you!

@mx-psi mx-psi added the component/otlp PRs and issues related to OTLP ingest label Sep 19, 2022
@kaniak274
Copy link
Author

Ahh. You're right, I missed that! Thank you very much for help!

@mx-psi
Copy link
Member

mx-psi commented Sep 26, 2022

Awesome, I am glad that helped :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/otlp PRs and issues related to OTLP ingest [deprecated] team/agent-platform
Projects
None yet
Development

No branches or pull requests

3 participants