Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build failures on nightly #14

Closed
rfay opened this issue Aug 14, 2023 · 11 comments
Closed

Build failures on nightly #14

rfay opened this issue Aug 14, 2023 · 11 comments

Comments

@rfay
Copy link
Member

rfay commented Aug 14, 2023

@julienloizelet would you mind taking a look and seeing if you can sort out why nightlies are failing from directory? I'm not sure why the run doesn't give us a bit more info. Seems to fail on ddev restart, https://github.com/ddev/ddev-mongo/actions/runs/5853744817/job/15875363058

@julienloizelet
Copy link
Collaborator

Hi @rfay ,
I'm surprised that I didn't receive any alert from GitHub for this failed job: I subscribed for "all activities" notification, so I expected to be notified. Maybe there is another setting somewhere.

I compared some other add-on tests and I saw that we are using here ddev restart >/dev/null 2>&1 for the directory test.

For all other add-on I looked at, this is ddev restart >/dev/null (or even just ddev restart).

I guess this is probably not the root cause but I modified into ddev restart >/dev/null, and I'll keep an eye out.

Maybe we could add temporary some verbosity to debug by adding a secret ACTIONS_STEP_DEBUG (true) like explained here: https://docs.github.com/en/actions/monitoring-and-troubleshooting-workflows/enabling-debug-logging#enabling-step-debug-logging (only the owner can do that if I understand correctly)

@rfay
Copy link
Member Author

rfay commented Aug 15, 2023

I'm surprised that I didn't receive any alert from GitHub for this failed job: I subscribed for "all activities" notification, so I expected to be notified. Maybe there is another setting somewhere.

I've had this problem with repositories as well, and haven't sorted it out. Repos created? by someone else don't send me their failures. I guess maybe it's something we could open an issue about. Certainly baffles me.

You can debug this on Github hardware by going to https://github.com/ddev/ddev-mongo/actions/workflows/tests.yml and clicking the button in the upper right "Run Workflow" and check "Debug with tmate". It will let you ssh into the machine and see what actually happens. I guess there's inadequate docs on that approach.

image

@rfay
Copy link
Member Author

rfay commented Aug 15, 2023

However, as we see, the test ran OK for you and the nightly worked last night :( I guess we just need more output from the test to understand what's happened.

@julienloizelet
Copy link
Collaborator

julienloizelet commented Aug 16, 2023

Yes, it fails only sometimes.

To debug this, I added another tmate session step that will be available only in case of failure.

I also increase the frequency of the cron so that failure should happen more frequently.

If I'm here during a failed test, I will debug with tmate and try some ddev logs or whatever.

[EDIT]: damned, I was able to enter in a failed workflow : https://github.com/ddev/ddev-mongo/actions/runs/5873818472/job/15927617320 but, as the bats teardown is deleting all the project (ddev delete -Oy ${PROJNAME} >/dev/null 2>&1), I was not able to run some ddev logs command in the project.

Do you have any idea of what we should looking as the project and files have been removed by the teardown method ?

Maybe I can just create a test-ci.bats that will not remove anything in the teardown if we detect a failed test. (not sure how to do that...)

@rfay
Copy link
Member Author

rfay commented Aug 16, 2023

Just remove the teardown for now I'd say. It properly tears everything down at the start, so it doesn't do any harm. Thanks for chasing this!

@julienloizelet
Copy link
Collaborator

Removing the teardown gives some progress but, as the following test (install from release) is run after a failed one, some content are wiped and we are loosing information.

By the way, I saw the original error was (https://github.com/ddev/ddev-mongo/actions/runs/5875221214/job/15931158670#step:11:14) :

Could not check for updates. This is most often caused by a networking issue.
# ssh-agent container is running: If you want to add authentication to the ssh-agent container, run 'ddev auth ssh' to enable your keys.
# Failed to restart testmongo: container(s) failed to become healthy before their configured timeout or in 120 seconds. This may be just a problem with the healthcheck and not a functional problem. (container 'mongo-express' exited, please use 'ddev logs -s mongo-express' to find out why it failed)

And when I went in the tmate session I ran ddev logs -s mongo-express and it returns:

Waiting for mongo:27017...
/docker-entrypoint.sh: connect: Connection refused
/docker-entrypoint.sh: line 14: /dev/tcp/mongo/27017: Connection refused
Wed Aug 16 06:07:52 UTC 2023 retrying to connect to mongo:27017 (2/5)
Welcome to mongo-express

But this is maybe the log after the second test...

Finally, I created a specific debug action with only the "install from directory" test and will run it manually to see if I can understand more.

@julienloizelet
Copy link
Collaborator

With the new specific debug action,
we can see that failed is coming from mongo-express that tried 5 times to connect without success:

(result of ddev logs -s mongo-express --tail 500 below)

Waiting for mongo:27017... /docker-entrypoint.sh: connect: Connection refused /docker-entrypoint.sh: line 14: /dev/tcp/mongo/27017: Connection refused Wed Aug 16 06:23:40 UTC 2023 retrying to connect to mongo:27017 (2/5) /docker-entrypoin
t.sh: connect: Connection refused /docker-entrypoint.sh: line 14: /dev/tcp/mongo/27017: Connection refused Wed Aug 16 06:23:41 UTC 2023 retrying to connect to mongo:27017 (3/5) /docker-entrypoint.sh: connect: Connection refused /docker-en
trypoint.sh: line 14: /dev/tcp/mongo/27017: Connection refused Wed Aug 16 06:23:42 UTC 2023 retrying to connect to mongo:27017 (4/5) /docker-entrypoint.sh: connect: Connection refused /docker-entrypoint.sh: line 14: /dev/tcp/mongo/27017: 
Connection refused Wed Aug 16 06:23:43 UTC 2023 retrying to connect to mongo:27017 (5/5) /docker-entrypoint.sh: connect: Connection refused /docker-entrypoint.sh: line 14: /dev/tcp/mongo/27017: Connection refused Welcome to mongo-express 
------------------------ Mongo Express server listening at http://0.0.0.0:8081 ESC[31mServer is open to allow connections from anyone (0.0.0.0)ESC[39m ESC[31mbasicAuth credentials are "admin:pass", it is recommended you change this in you
r config.js!ESC[39m /node_modules/mongodb/lib/server.js:265 process.nextTick(function() { throw err; }) ^ Error [MongoError]: failed to connect to server [mongo:27017] on first connect at Pool.<anonymous> (/node_modules/mongodb-core/lib/t
opologies/server.js:326:35) at Pool.emit (events.js:314:20) at Connection.<anonymous> (/node_modules/mongodb-core/lib/connection/pool.js:270:12) at Object.onceWrapper (events.js:421:26) at Connection.emit (events.js:314:20) at Socket.<ano
nymous> (/node_modules/mongodb-core/lib/connection/connection.js:175:49) at Object.onceWrapper (events.js:421:26) at Socket.emit (events.js:314:20) at emitErrorNT (internal/streams/destroy.js:92:8) at emitErrorAndCloseNT (internal/streams
/destroy.js:60:3)

No idea why but this is a beginning.

@julienloizelet
Copy link
Collaborator

Maybe related: mongo-express/mongo-express-docker#39

@julienloizelet
Copy link
Collaborator

julienloizelet commented Aug 16, 2023

Hi @rfay ,

it seems that adding entrypoint: [sh, -c, "sleep 5s && tini -- /docker-entrypoint.sh mongo-express"] as a workaround make it work (found in the above related issue).

Will let test running for some days to see if it is ok.

It seems that for some reason, mongo-express sometimes tries to connect when the mongo container isn't ready.
This is why the solution is to add 5 seconds before trying to connect.

@rfay
Copy link
Member Author

rfay commented Aug 16, 2023

Wow, nice work.

I'll bet that it has to do with healthcheck. ddev-mongo has one,

healthcheck:
test: ["CMD-SHELL", "mongo --eval 'db.runCommand(\"ping\").ok' localhost:27017/test --quiet"]
timeout: 60s
- so it won't become "ready" until it actually is.

But mongo-express has a depends-on which should actually do the job, it should wait for mongo. There are periodic problems in docker-compose with depends-on and there have been some things going on there lately, https://github.com/docker/compose/releases

I guess a wait in start is not the biggest problem for a gui element here. Congrats on the great work on this.

@julienloizelet
Copy link
Collaborator

Solved by 6651175

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants