Skip to content
This repository has been archived by the owner on Nov 20, 2019. It is now read-only.

FreeBSD CI-builds are failing #20

Closed
josteink opened this issue Apr 29, 2015 · 20 comments
Closed

FreeBSD CI-builds are failing #20

josteink opened this issue Apr 29, 2015 · 20 comments
Assignees

Comments

@josteink
Copy link
Member

As can be seen on via Jenkins: http://dotnet-ci.cloudapp.net/job/dotnet_coreclr_freebsd_debug/

Looking into the first failing build, we have this log:

Cloning the remote Git repository
Cloning repository git://github.com/dotnet/coreclr.git
 > git init /mnt/tmpdrive/j/workspace/dotnet_coreclr_freebsd_debug # timeout=10
ERROR: Error cloning remote repo 'origin'
ERROR: Error cloning remote repo 'origin'

Failure to clone the repository could hint at the disks being full. But subsequent builds are failing for other reasons. That latest build gets to 16%, then suddenly Jenkins seems to disconnect and call it a day, so obviously something else is going on too.

@mmitche Looking at the build log It seem's you're already on this? :)

@mmitche mmitche self-assigned this Apr 29, 2015
@mmitche
Copy link
Member

mmitche commented Apr 29, 2015

Yeah I was looking at this yesterday. I went to go create a new node, so I captured the old one and created two new versions. Either it wasn't really working that well before and we just got lucky, or something happened in the deprovisioning and capture, because the temporary drive stopped working within a few minutes.and started giving IO errors.

@mmitche
Copy link
Member

mmitche commented Apr 30, 2015

I've updated to the latest version of waagent, which has correct provisioning for the temporary disk. Should be up and running quite soon.

@josteink
Copy link
Member Author

josteink commented May 1, 2015

Looking at the latest build they still don't seem entirely stable:

Odd that they would chug along nicely to begin with and then suddenly stop working...

@mmitche
Copy link
Member

mmitche commented May 1, 2015

@josteink I think we might actually be good here. The last few failures on the 30th were real failures, but @ajensenwaud fixed this last night. Woohoo!

@josteink
Copy link
Member Author

josteink commented May 1, 2015

Ooh right. Nice of him then :)

@mmitche
Copy link
Member

mmitche commented May 1, 2015

Actually looks like you're right about the debug one. It appears that for whatever reason, it didn't update it's workspace directory when I updated the value in Jenkins. It was still using the primary drive, which has very little space. I'll figure out what's going on.

@josteink
Copy link
Member Author

josteink commented May 2, 2015

Done any changes since last time? They're all looking good now. Feel free to close this issue at your own discretion.

@mmitche
Copy link
Member

mmitche commented May 2, 2015

We are all good now, and the build will fail on FreeBSD failures now!

@mmitche mmitche closed this as completed May 2, 2015
@ghost
Copy link

ghost commented May 22, 2015

@mmitche, @josteink, now that dotnet/coreclr#1030 is merged, should the tests be enabled for FreeBSD as well?

Related: https://github.com/dotnet/coreclr/issues/1010.

@mmitche
Copy link
Member

mmitche commented May 22, 2015

@jasonwilliams200OK Yep will do so today

@josteink
Copy link
Member Author

Currently 4 of the tests are failing. Will enabling the tests on the CI-build cause all PRs from now to turn red until we get those fixed?

If so I would advice to delay enabling the tests until we have the 4 failing tests fixed.

(My 2 norwegian øre)

@mmitche
Copy link
Member

mmitche commented May 22, 2015

@josteink Yes. we should have them all passing before we enable them in the CI

@ghost
Copy link

ghost commented May 22, 2015

👍
@joshfree, US 💰 for you! 😉

@josteink
Copy link
Member Author

@mmitche I looked slightly deeper into this, and it may be that enabling tests are safeish.

Can you take a look at the findings and weight in your opinion?

https://github.com/dotnet/coreclr/issues/1010#issuecomment-104756763

@mmitche
Copy link
Member

mmitche commented May 22, 2015

@josteink I got two failures on an exploratory run:

http://dotnet-ci.cloudapp.net/job/dotnet_coreclr_freebsd_debug/lastFailedBuild/console

The following test(s) failed:
threading/WaitForSingleObject/WFSOExMutexTest/paltest_waitforsingleobject_wfsoexmutextest. Exit code: 1
threading/WaitForSingleObject/WFSOExThreadTest/paltest_waitforsingleobject_wfsoexthreadtest. Exit code: 1

PAL Test Results:
Passed: 819
Failed: 2

@josteink
Copy link
Member Author

Ok. If so I still advice against enabling them. Was worth a shot though. :)

@ghost
Copy link

ghost commented May 23, 2015

How about we exclude these two tests for FreeBSD target something what these guys have done: dotnet/llilc#598? I mean only 0.2% of tests are failing. I think it is a bad idea to hold back the majority.

@josteink
Copy link
Member Author

Currently we're investigating the cause for those performing badly in #1044, but if those investigations turn up inconclusive or without any resolution, I'm inclined to agree with @jasonwilliams200OK .

@mmitche
Copy link
Member

mmitche commented May 26, 2015

@jasonwilliams200OK Yeah that seems reasonable. If it seems like the values in the tests are unreasonable (they were probably chosen somewhat arbitrarily in the ancient past) then we could also alter them.

@ghost
Copy link

ghost commented May 26, 2015

@mmitche, 👍

A very valuable and yet interesting discussion on this subject is going on here https://github.com/dotnet/coreclr/issues/1044 between @josteink and @bsdjhb (FreeBSD kernel dev). It seems like there is more to just the timer values which need adjustment. Hopefully, the root issue will be identified soon.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants