Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while processing transaction: error while sending transaction #1546

Closed
urosgruber opened this issue Mar 30, 2018 · 37 comments
Closed

Error while processing transaction: error while sending transaction #1546

urosgruber opened this issue Mar 30, 2018 · 37 comments
Assignees

Comments

@urosgruber
Copy link
Contributor

I was able to build agent from master. After running ./bin/agent/agent start -c ./bin/agent/dist/ I'm getting weird following error message

2018-03-30 19:57:38 CEST | ERROR | (worker.go:135 in process) | Error while processing transaction: error while sending transaction, rescheduling it: Post https://6-0-0-app.agent.datadoghq.com/api/v1/check_run?api_key=*************************54d58: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

There is also no connection visible on dashboard. One other thing is version
Agent 6.0.0 - Commit: - Serialization version: but master is on 6.1.1.

Anyone can help here debug what is going on.

@arbll
Copy link
Member

arbll commented Mar 30, 2018

Hi @urosgruber!

The version is given at compile time by the build script using the last git tag. 6.0.0 probably correspond to the default value for the version :

var agentVersionDefault = "6.0.0"

First thing on top of my head : did you clone the repository or downloaded a zip ?

If you did indeed clone the repo could you tell me how you compiled the agent ?

@arbll arbll self-assigned this Mar 30, 2018
@urosgruber
Copy link
Contributor Author

urosgruber commented Mar 30, 2018

I cloned it and then

invoke deps
invoke agent.build --build-include=log,process

I also tried the zip version but complained something about git-rev so I thought that clone is the right way.

@urosgruber
Copy link
Contributor Author

Anything new around here? I can run some debugging if it helps?

@truthbk
Copy link
Member

truthbk commented May 15, 2018

@urosgruber as far as your the log message you're seeing that looks like there was some kind of networking issue preventing the transactions (payloads) from being pushed to datadog at certain times. The forwarder, once the transaction fails re-enqueues it and is attempted later.

Are the errors in the log intermittent?

Could you please submit a flare to the support team so we can take a look at the logs and try to decide if this is indeed just a networking issue?

Thanks!

@urosgruber
Copy link
Contributor Author

@truthbk I've manage to build agent manually but on the latest 6.2. Agent connects successfully and data is also visible on DD dashboard. Not sure if the issue was fixed with latest version or it was just develop build that cause this problem. I'll check if 6.2 develop build also have similar issue.

@urosgruber
Copy link
Contributor Author

@arbll I think error is now gone with 6.2.x Nothing was changed except I did pull from git and build with same command.

@karthikdialpad
Copy link

I am facing the same issue in kubernetes environment, below is the error message
I am using 6.10.1 version of datadog-agent image

{"log":"[ AGENT ] 2019-05-09 11:46:47 UTC | ERROR | (pkg/forwarder/worker.go:142 in process) | Error while processing transaction: error while sending transaction, rescheduling it: Post https://6-10-1-app.agent.datadoghq.com/api/v1/check_run?api_key=****************}: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\n","stream":"stdout","time":"2019-05-09T11:46:47.771863625Z"}

I tried to send the flare but I am getting the error
Error: Post https://6-10-1-flare.agent.datadoghq.com/support/flare?api_key=***********************}: dial tcp: lookup 6-10-1-flare.agent.datadoghq.com: Temporary failure in name resolution

@Le0exxx
Copy link

Le0exxx commented Aug 2, 2019

I have the same error as well, I am using the latest docker image and pure docker launch like below,

docker run -d --name dd-agent -v /var/run/docker.sock:/var/run/docker.sock:ro -v /proc/:/host/proc/:ro -v /cgroup/:/host/sys/fs/cgroup:ro -e DD_API_KEY=xxxxx datadog/agent:latest

2019-08-02 02:37:50 UTC | CORE | ERROR | (pkg/forwarder/worker.go:142 in process) | Error while processing transaction: error while sending transaction, rescheduling it: Post https://6-13-0-app.agent.datadoghq.com/intake/?api_key=*************************: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

@joaquin386
Copy link

joaquin386 commented Aug 5, 2019

I try to deploy datadog/agent:latest-jmx to AWS ECS fargate and keep getting the following errors:

@ocervell
Copy link

ocervell commented Oct 2, 2020

Getting the same issue on Google Kubernetes Engine (version 7.21.0), any idea what might be the cause ?

@ameyapanse
Copy link

ameyapanse commented Nov 18, 2020

@joaquin386 @ocervell Were you able to fix this ? I'm getting the same errors on ECS (EC2).

Edit : This was fixed. I was setting env vars for proxy. Hence the timeouts.

@thiagolsfortunato
Copy link

Any news?

@windingroad100hf
Copy link

windingroad100hf commented May 17, 2021

@ocervell We're seeing this in our Kubernetes clusters as well. Were you able to fix this? We get the following error, which originates in worker.go not the domain_forwarder.go. If other users are seeing similar logs, then it looks like this may be a problem with the retry routine itself (or at least how users have it configured).

2021-05-17 17:55:25 UTC | PROCESS | ERROR | (pkg/forwarder/worker.go:174 in process) | Too many errors for endpoint 'https://process.datadoghq.com/api/v1/container': retrying later
2021-05-17 17:55:36 UTC | PROCESS | ERROR | (pkg/forwarder/domain_forwarder.go:133 in retryTransactions) | Dropped 3 transactions in this retry attempt: 0 for exceeding the retry queue payloads size limit of 15728640, 3 because the workers are too busy

@huy-hoang-mox
Copy link

I also face the same issue:
2021-05-11 22:51:00 UTC | PROCESS | ERROR | (pkg/forwarder/worker.go:178 in process) | Error while processing transaction: error while sending transaction, rescheduling it: Post "https://orchestrator.datadoghq.com/api/v1/orchestrator": EOF

@PHameete
Copy link

Same issue here. Datadog agent version 7.28.1 on AWS EKS 1.19.8

Error while processing transaction: error while sending transaction, rescheduling it: Post "https://orchestrator.datadoghq.com/api/v1/orchestrator": EOF

@sdy15
Copy link

sdy15 commented Jul 15, 2021

I am facing the same issue in Datadog agent version 7.29.0.Sent the flare to support.

UTC | CORE | ERROR | (pkg/forwarder/worker.go:179 in process) | Too many errors for endpoint 'https://7-29-0-app.agent.datadoghq.com/api/v1/series?api_key=***************************': retrying later

@ianhundere
Copy link

ianhundere commented Oct 4, 2021

same issue here:
2021-10-04 20:39:52 UTC | PROCESS | ERROR | (pkg/forwarder/worker.go:183 in process) | Error while processing transaction: error while sending transaction, rescheduling it: Post "https://orchestrator.datadoghq.com/api/v1/orchestrator": EOF

@TwistedLogic
Copy link

Same issue here:

2021-10-04 16:35:33 UTC | SAGENT | ERROR | SyncForwarder.sendHTTPTransactions final attempt: error while sending transaction, rescheduling it: Post "https://6-0-0-app.agent.datadoghq.eu/api/v1/check_run?api_key=***************************77453": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

2021-10-05 00:36:29 UTC | SAGENT | ERROR | SyncForwarder.sendHTTPTransactions final attempt: error while sending transaction, rescheduling it: Post "https://6-0-0-app.agent.datadoghq.eu/api/beta/sketches?api_key=***************************77453": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

I'm using the Datadag Lambda Extension installed using the Datadog Serverless Plugin, and having a few errors like those.

@davidlbyrne
Copy link

same issue here, I think something is wrong with their API and they dont want to surface it.

2021-10-04T14:37:20.259-07:00
2021-10-04 21:37:20 UTC | SAGENT | ERROR | SyncForwarder.sendHTTPTransactions final attempt: error while sending transaction, rescheduling it: Post "https://6-0-0-app.agent.datadoghq.com/api/v1/series?api_key=***************************5ccc2": dial tcp 3.233.149.59:443: i/o timeout

@RicHincapie
Copy link

RicHincapie commented Oct 15, 2021

PROCESS | ERROR | (pkg/forwarder/worker.go:179 in process) | Too many errors for endpoint 'https://process.datadoghq.com/api/v1/container': retrying later

Error while processing transaction: error while sending transaction, rescheduling it: Post "https://7-31-0-app.agent.datadoghq.com/intake/?api_key=***************************bf62f": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

This is my error. I am using iptables ruling at DOCKER-USER chain level to Firewall my exposed containers. I believe the problem is related to it, because probably the packets are not reaching datadog server.

The thing is ifconfig | grep -i dropped shows that no packages have been dropped from eth0 or docker0.

What are the ports that need to be opened for out bounding packets?

@kaypeter87
Copy link

kaypeter87 commented Nov 22, 2021

We are also running into an issue with the sidecar container for the datadog-agent and getting the following errors on our ECS cluster:

2021-11-22 18:26:28 UTC | CORE | ERROR | (pkg/forwarder/worker.go:179 in process) | Too many errors for endpoint 'https://*-**-*-app.agent.datadoghq.com/api/v1/check_run?api_key=***************************': retrying later

Are there any updates to this issue? It is causing confusion with our logs in ECS.

@Suchit8
Copy link

Suchit8 commented Dec 8, 2021

A number of my datadog agents are working fine but a few new installations and the agents which were restarted recently facing similar issues. Error logs attached.

2021-12-08 04:33:18 EST | CORE | ERROR | (pkg/forwarder/transaction/transaction.go:108 in func4) | TLS Handshake failure: net/http: TLS handshake timeout 2021-12-08 04:33:18 EST | CORE | ERROR | (pkg/forwarder/worker.go:183 in process) | Error while processing transaction: error while sending transaction, rescheduling it: Post "https://*-**-*-app.agent.datadoghq.com/intake/?api_key=******************************": net/http: TLS handshake timeout

@djmitche
Copy link
Contributor

It looks like this issue has become a bit of a random collection of connection issues (and some build issues).

In general, DataDog support is best suited to handle these situations: they can look at the circumstances, examine a flare, and make recommendations for the most successful configurations. I'll close this issue up now.

@PHameete
Copy link

@djmitche I've frequently contacted support about such issues, but after the initial generic reply they do not get back to me.

1 similar comment
@PHameete
Copy link

@djmitche I've frequently contacted support about such issues, but after the initial generic reply they do not get back to me.

@ianhundere
Copy link

ianhundere commented Jan 5, 2022

reacgi

In general, DataDog support is best suited to handle these situations: they can look at the circumstances, examine a flare, and make recommendations for the most successful configurations. I'll close this issue up now.

reaching out now :)

specifically for the following two errors:
CLUSTER | ERROR | (pkg/forwarder/worker.go:179 in process) | Too many errors for endpoint 'https://orchestrator.datadoghq.com/api/v1/orchestrator': retrying later

CLUSTER | ERROR | (pkg/forwarder/worker.go:183 in process) | Error while processing transaction: error while sending transaction, rescheduling it: Post "https://orchestrator.datadoghq.com/api/v1/orchestrator": EOF

edit: i provided logs to them via flare...will update when i hear back.

@ianhundere
Copy link

@djmitche and the response

"Thank you for verifying. I have confirmed with our engineers that this is a known issue that is being tracked. I will mark this as closed for now, but feel free to reach back out to Support if you have any questions on this. Thank you for helping us to improve our product!"

so maybe this issue should be opened back up considering it's a known issue being tracked internally ?

@djmitche
Copy link
Contributor

djmitche commented Jan 7, 2022

As I mentioned, this has become a bit of a "collector" issue for a bunch of different problems with similar error messages. I think the internal tracking will be sufficient, and there's no need to re-open this issue.

@ianhundere
Copy link

ah, thanks for clarifying. we'll ignore these logs for now.

@ianhundere
Copy link

@djmitche could you provide some open issues that mirror these logs?

@vsaldivia
Copy link

image

Check the API KEY!!

@ajithkumar999
Copy link

ajithkumar999 commented Jul 6, 2022

I am using datdaog lambda extension to forward my AWS Lambda logs directly to datadog. I am also getting the same error.
I used AWS SAM to install datadog lambda extension to my lambda's.

yyyy-MM-dd HH:mm:ss UTC | DD_EXTENSION | ERROR | SyncForwarder.sendHTTPTransactions final attempt: error while sending transaction, rescheduling it: Post "https://6-0-0-app.agent.datadoghq.com/api/*/*?api_key=***************************6c232": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

2022-07-01 13:24:32 UTC | DD_EXTENSION | ERROR | telemetry.Proxy: http: proxy error: context canceled

Can someone please help me out of this.

@denbon05
Copy link

denbon05 commented Nov 8, 2022

image
image
Still error

@nmahavad
Copy link

image image Still error

i am following this ..

@rossigee
Copy link

rossigee commented Sep 10, 2023

Too many errors for endpoint... Retrying later is perhaps one of the worst error messages I've come across in recent times. It is useless, confusing and causes massive issue report threads like this.

The required action here seems pretty clear to me. Datadog engineering team need to fix the error handling to provide a bit more information. That would save the Datadog support team, me and all these other users in the above thread and related threads a whole bunch of time and headaches!

@rossigee
Copy link

So, it seems the error indicates that a 'circuit breaker' has considered your target host to be on a blacklist/blockedList... 🤔

	// Run the endpoint through our blockedEndpoints circuit breaker
	target := t.GetTarget()
	if w.blockedList.isBlock(target) {
		requeue()
		log.Errorf("Too many errors for endpoint '%s': retrying later", target)
	} else if err := t.Process(ctx, w.Client); err != nil {
		w.blockedList.close(target)
		requeue()
		log.Errorf("Error while processing transaction: %v", err)
	} else {
		w.blockedList.recover(target)
	}

@rossigee
Copy link

(sorry for the noise, I just realised I posted the above comments on an unrelated issue report! 🤦)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests