Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helix CI failure #27782

Closed
BruceForstall opened this issue Nov 9, 2019 · 4 comments

Comments

@BruceForstall
Copy link
Member

@BruceForstall BruceForstall commented Nov 9, 2019

https://dev.azure.com/dnceng/public/_build/results?buildId=419280

"Run Test Pri1 R2R Windows_NT arm checked" job

Message:

##[Warning 1]
There was a failure in sending the provision message: A timeout occurred while sending request to the remote provider. 

in log:

F:\workspace.1\_work\1\s\.packages\microsoft.dotnet.helix.sdk\5.0.0-beta.19556.10\tools\Microsoft.DotNet.Helix.Sdk.MultiQueue.targets(59,5): error : The response contained an invalid status code 404 Not Found [F:\workspace.1\_work\1\s\tests\src\helixpublishwitharcade.proj]
##[error].packages\microsoft.dotnet.helix.sdk\5.0.0-beta.19556.10\tools\Microsoft.DotNet.Helix.Sdk.MultiQueue.targets(59,5): error : (NETCORE_ENGINEERING_TELEMETRY=Helix) The response contained an invalid status code 404 Not Found
F:\workspace.1\_work\1\s\.packages\microsoft.dotnet.helix.sdk\5.0.0-beta.19556.10\tools\azure-pipelines\AzurePipelines.MultiQueue.targets(30,5): error : Value cannot be null. (Parameter 'source') [F:\workspace.1\_work\1\s\tests\src\helixpublishwitharcade.proj]
##[error].packages\microsoft.dotnet.helix.sdk\5.0.0-beta.19556.10\tools\azure-pipelines\AzurePipelines.MultiQueue.targets(30,5): error : (NETCORE_ENGINEERING_TELEMETRY=Helix) Value cannot be null. (Parameter 'source')
@BruceForstall

This comment has been minimized.

Copy link
Member Author

@BruceForstall BruceForstall commented Nov 9, 2019

@MattGal

This comment has been minimized.

Copy link
Contributor

@MattGal MattGal commented Nov 9, 2019

I have been summoned. Taking a peek

@MattGal

This comment has been minimized.

Copy link
Contributor

@MattGal MattGal commented Nov 9, 2019

This is an interesting one. The 404 is because no console log actually got created.

Log: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-coreclr-refs-heads-master-ca5199f186bb4e3188/JIT.Generics/bf415cfd-89f8-483b-91d0-195403ded657.log?sv=2018-03-28&sr=c&sig=b%2Fq%2Fp0kU5uSdNkE4iTyK7DDUH%2BLGW0e95%2B4US%2FaM79I%3D&se=2019-11-17T22%3A24%3A04Z&sp=rl

Basically the machine downloads this blob (more than once)
https://helixde107v0xdeko0k025g8.blob.core.windows.net/helix-job-cd48e688-8deb-45e5-8496-6d890c91e2885f65750917946dda9/7404f0d4-dea4-4937-9f70-dfed91394fb1.zip

and gets this exception:

2019-11-07 16:39:43,315: ERROR: executor(206): _download_and_unpack: Exception "File is not a zip file" seen downloading: 

Traceback (most recent call last):
  File "C:\dotnetbuild\scripts\helix\executor.py", line 196, in _download_and_unpack
    self._unpack(partial_file_path, contents_root, is_workitem_payload)
  File "C:\dotnetbuild\scripts\helix\executor.py", line 245, in _unpack
    unzipped_file_paths = self._get_unpacked_file_paths(file_path, contents_root)
  File "C:\dotnetbuild\scripts\helix\executor.py", line 454, in _get_unpacked_file_paths
    with zipfile.ZipFile(file_path) as zfile:
  File "C:\Python\lib\zipfile.py", line 770, in __init__
    self._RealGetContents()
  File "C:\Python\lib\zipfile.py", line 811, in _RealGetContents
    raise BadZipfile, "File is not a zip file"
BadZipfile: File is not a zip file

(The subsequent null ref is fixed in later Arcade versions and isn't super important here)

The fact that this works at all is amazing; the machines on windows.10.arm64.open are running Python 2 scripts and have had no updates or interventions beyond rebooting for almost a year. My guess, considering that this zip file is NOT a bad zip file, is over-zealous windows defender interacting with Python 2 zip APIs.

Regardless, if it keeps happening we can take the affected machine(s) offline but there's not much use in trying to fix this until we reimage the windows.10.arm64.open machines (which I believe we will start next Tuesday) as they're very crusty and need both reimaging, python3, and newer Helix scripts.

@BruceForstall

This comment has been minimized.

Copy link
Member Author

@BruceForstall BruceForstall commented Nov 9, 2019

@MattGal ok, I'll close this then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.