Skip to content
This repository has been archived by the owner on Jan 24, 2023. It is now read-only.

Documentation, operating system edition for windows nodes. Changes? #88

Open
bremnes opened this issue Nov 19, 2017 · 23 comments
Open

Documentation, operating system edition for windows nodes. Changes? #88

bremnes opened this issue Nov 19, 2017 · 23 comments
Assignees

Comments

@bremnes
Copy link

bremnes commented Nov 19, 2017

Is this a request for help?: Yes, documentation


Context Created a new Kubernetes cluster/environment. The same docker images that works in our current/old environment doesn't work in the new and gives us the following error:

Error response from daemon: container my-test-container encountered an error during CreateContainer: failure in a Windows system call: The operating system of the container does not match the operating system of the host ...

The images have been built towards the aspnetcore:2.0 for windows/nano server.

When we created our cluster ~45 days ago we got regular Windows 2016 datacenter edition virtual machines as agent nodes. This was confirmed by remoting in to them now and seeing a full desktop experience. Remoting to the agents in the new cluster gives us a command prompt. Not sure if it's maybe server core?

Is there any documentation/update log available explaining what's been done behind the scenes the ACS? And can we specify os image when creating a new cluster via portal, cli or ARM template? Is this related to the 1709/fall creators update?

Old nodes:
OS Name: Microsoft Windows Server 2016 Datacenter
OS Version: 10.0.14393 N/A Build 14393

New nodes:
OS Name: Microsoft Windows Server Datacenter
OS Version: 10.0.16299 N/A Build 16299

@JackQuincy
Copy link
Contributor

@JiangtianLi can you speak to this?

@bremnes
Copy link
Author

bremnes commented Nov 20, 2017

I did some more testing today and found that the 1709-images were working, but images without it crashed with the exception shown above. So even images like microsoft/aci-helloworld:windows isn't working anymore.

I would like there to be a possibility of choosing node pool(s) os/edition. As it is now we aren't really getting stable infrastructure if suddenly the configuration vary wildly from environment to environment based on when you accidentally happened to create the cluster. Like now for instance we have to make sure that everybody got fall creators update in addition to do some changes to all the dockerfiles. As for the VSTS hosted build agent? I have no idea if it supports 1709 layers, so that might be another thing we have to adjust for in the wake of this.

The remoting experience I had matches with this blog post, so for sure the agent os/edition has changed:
https://blogs.msdn.microsoft.com/freddyk/2017/11/01/1709-and-nav-on-docker/

@bremnes
Copy link
Author

bremnes commented Nov 21, 2017

Attempting to change the base image to microsoft/aspnetcore:2.0-nanoserver-1709 on one of our dockerfiles didn't work out as we now get an error complaining about access being denied to a file (same as when running the base image itself, see below).

To narrow it down, I created a fresh standalone VM from the market place based on the "Windows Server, version 1709 with Containers" template to compare with one of the Kubernetes agents. I remoted into them both to run pure docker commands seeing what could be wrong.

According to docker version and OS Name and OS Version from systeminfo | findstr /C:"OS" they apparently seem to have the same configuration.

On Standalone VM these images work:
docker run microsoft/aspnetcore:2.0-nanoserver-1709
docker run microsoft/iis:windowsservercore-1709

Doesn't work, but is from what I understand expected given the 1709 update:
docker run microsoft/aspnetcore:2.0

Error response from daemon: container 64e00888b063e10f59841fb3ff68a321199a6e4cb6f73a224b7b9dd2b3340208 encountered an error during CreateContainer: failure in a Windows system call: The operating system of the container does not match the operating system of the host.

Kubernetes Windows node works:
docker run microsoft/iis:windowsservercore-1709

Doesn't work:
docker run microsoft/aspnetcore:2.0-nanoserver-1709

docker: failed to register layer: re-exec error: exit status 1: output: remove \?\C:\ProgramData\docker\windowsfilter\eecc1639c6223893c5fef33bcc29aae8f969ed7acd1b3f45f51a765d3ba494fd\UtilityVM\Files\Windows\System32\diagtrack.dll: Access is denied.

docker run microsoft/aspnetcore:2.0

(same as for Standalone)

Am I missing something here? If anybody could tell me where I've been making an error it would be great - either when creating the cluster or making a wrong assumption in the debugging session shown above. Alternatively if anybody is able to confirm and/or reproduce the bug. As it is now we aren't able to use ACS Kubernetes, which is quite unfortunate in my opinion.

(This question was originally targeting documentation. As windows containers is in preview it's an implicit contract that things might change. But it would be nice if large changes like changing agent nodes operating system edition were documented somehow. With the things mentioned in this comment, I'm thinking that there might be a bug.)

@JiangtianLi
Copy link
Contributor

@bremnes You are right. ACS Engine has switched to RS3 Windows, which uses 1709 and Server Core. The documentation is not yet fully up-to-date and I am working on that. In general, the only change would be to use container image with 1709 tag, which requires the current workload to be built/upgraded/refactored to 1709 image.

As to microsoft/aspnetcore:2.0-nanoserver-1709, I can docker run -it microsoft/aspnetcore:2.0-nanoserver-1709 cmd on a 1.8.2 k8s windows cluster without issue. Which version of k8s are you using?

@bremnes
Copy link
Author

bremnes commented Nov 21, 2017

@JiangtianLi From the cluster created through ARM template:

Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.9", GitCommit:"19fe91923d584c30bd6db5c5a21e9f0d5f742de8", GitTreeState:"clean", BuildDate:"2017-10-19T16:55:06Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Same for another cluster which was created through the Portal. It's not possible to select version through the Portal wizard it seems like. Or?

Just to make sure we are on the same page, this is ACS and not ACS-engine. How about your 1.8.2 cluster? I'm not sure how the ACS and ACS-engine is synched on the k8s version or if they are "gated" to the ACS project.

@JiangtianLi
Copy link
Contributor

@bremnes I am using ACS-Engine so I can choose version. Portal uses default version. I will need to try ACS.

@bremnes
Copy link
Author

bremnes commented Nov 21, 2017

@JiangtianLi Just FYI, I opened another issue involved in the creation of the ACS cluster - #89. Not sure if that could've had some impact on the (mis)configuration of the agent nodes in this issue.

@JiangtianLi
Copy link
Contributor

@bremnes I created a Windows k8s cluster from Azure portal in UKWest. I deployed a workload with yaml from https://raw.githubusercontent.com/JiangtianLi/Examples/master/windows/basic/simpleweb.yaml and container was successfully created. Then I used microsoft/aspnetcore:2.0-nanoserver-1709 image and had the same error as yours. I also tried
docker run -it microsoft/aspnetcore:2.0-nanoserver-1709 cmd
and it failed.
It appears that microsoft/aspnetcore:2.0-nanoserver-1709 container image has some problem on RS3.

@JiangtianLi
Copy link
Contributor

@PatrickLang Is this a known issue?

@PatrickLang
Copy link

PatrickLang commented Nov 28, 2017

@JiangtianLi Can you help clarify? I'm confused because you said it was working in #88 (comment)

Did you change something and now you're getting this error (copied from @bremnes ) ?

Doesn't work:
docker run microsoft/aspnetcore:2.0-nanoserver-1709

docker: failed to register layer: re-exec error: exit status 1: output: remove \?\C:\ProgramData\docker\windowsfilter\eecc1639c6223893c5fef33bcc29aae8f969ed7acd1b3f45f51a765d3ba494fd\UtilityVM\Files\Windows\System32\diagtrack.dll: Access is denied.

@JiangtianLi
Copy link
Contributor

JiangtianLi commented Nov 28, 2017

@PatrickLang Yes, docker run microsoft/aspnetcore:2.0-nanoserver-1709 worked in the windows node created by ACS-Engine before. But in the new ACS windows node I just created, it didn't work and has the same error as @bremnes. The difference I can think of:

  1. With ACS-Engine I used D2_V3 while with ACS I used D2_V2.
  2. With ACS-Engine I used k8s version 1.8.2 while with ACS it used default 1.7.9.
  3. ACS-Engine cluster was created for a few days and I may have run a few workload and pull a few other images while ACS cluster was freshly new and I only ran simpleweb workload and didn't pull other images.

I'll need to re-create ACS-Engine cluster with the same parameters but I am guessing it is the container image has some conflict with the customized setup on k8s windows node.

@JiangtianLi
Copy link
Contributor

@PatrickLang I created another ACS cluster because I deleted the cluster 2 hours ago. However, docker run -it microsoft/aspnetcore:2.0-nanoserver-1709 cmd succeeded this time. Seems the issue is random. I'll keep the current cluster and create another one to see if it repro.

@JiangtianLi
Copy link
Contributor

@PatrickLang I created another ACS cluster, the same VM, the same VM, but I couldn't repro the issue. The image appears to be recently updated:
microsoft/aspnetcore 2.0-nanoserver-1709 8a080e5ebae7 16 hours ago

@bremnes Can you repro the issue now?

@bremnes
Copy link
Author

bremnes commented Nov 29, 2017

@JiangtianLi I just tried a new cluster in UK West and was able to reproduce the diagtrack.dll error. ACS Kubernetes cluster with 1 master and 1 windows agent (DS2_V2_Standard).

@ghost
Copy link

ghost commented Nov 30, 2017

Just created (with acs-engine v0.9.4) a kubernetes 1.8.2 cluster in westeurope.
Turns out that our microsoft/windowsservercore based containers that used to work on a Kubernetes 1.6 now fail to deploy on the new Kubernetes 1.8.2 Windows nodes with the same error message as above:

The operating system of the container does not match the operating system of the host

While checking, I found out that Kubernetes has an issue finding out what Docker version is used on the Windows nodes:
look here

I'm worried by the "docker://Unknown" mentions.

Also note that Windows Kernel versions were unknown in Kubernetes 1.6.6 without any issue.

@JiangtianLi
Copy link
Contributor

@odauby with acs-engine v0.9.4, the Windows node uses RS3 and is compatible with microsoft/windowsservercore:1709 container image (https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility).

@bremnes
Copy link
Author

bremnes commented Dec 4, 2017

@JiangtianLi @PatrickLang, have you guys had any progress in regards to the nanoserver-1709 error? If it's random I can try again until I hit fortune, but the 7-8 clusters I've tried so far haven't worked.

@JiangtianLi
Copy link
Contributor

@bremnes From my side, the error does not consistently repro so I will need to get a failed cluster. Does this repro with other nanoserver images? Is it possible for your to share out the cluster or collect trace for us?
@PatrickLang Do you know what HCS trace or else should be collected?

@bremnes
Copy link
Author

bremnes commented Dec 5, 2017

@JiangtianLi I tested now by creating 6 clusters and they all fail on pulling the nanonserver image with the diagtrack.dll error. See this gist for the script I used.

We created a support ticket through Azure where your colleague confirmed that he was able to reproduce it as well (#117112717221074). It might be easier to go through him, but if you want to you can have all the cluster information for one of the clusters we created now (deleted the rest) - just let me know where you want the information.

@JiangtianLi
Copy link
Contributor

@bremnes As discussed in another thread, this issue appears to be a race condition when pulling two nanoserver images at the same time. I have looped in the container folks for further solution.

@PatrickLang
Copy link

PatrickLang commented Mar 15, 2018

Darren's still working on a fix, will be sending PR to moby/moby

@jakkaj
Copy link
Member

jakkaj commented Apr 18, 2018

Hi all - is there any update for this? We're having this issue as of today.
jakkaj/aspnanottest is a public Docker hub image that can replicate this. Cluster created with only one Windows node (no Linux nodes).

acs-engine v0.15.2

kubelet, 51742k8s9000 Failed to pull image "jakkaj/aspnanottest": rpc error: code = Unknown desc = failed to register layer: re-exec error: exit status 1: output: remove \?\C:\ProgramData\docker\windowsfilter\8be6cb4949d1271d5145ca143874a1ef1254dbd3394af7723bd56f64fd5a791f\UtilityVM\Files\Windows\System32\NetSetupApi.dll: Access is denied.

@PatrickLang
Copy link

@jakkaj the access denied issue is tracked here: moby/moby#36092 , fix in the works for Docker-EE within a few weeks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants