Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to download container image from private repository #904

Closed
smokedlinq opened this issue Jun 15, 2017 · 46 comments
Closed

Unable to download container image from private repository #904

smokedlinq opened this issue Jun 15, 2017 · 46 comments
Assignees

Comments

@smokedlinq
Copy link

I have a SF (Microsoft.Azure.ServiceFabric.WindowsServer.5.6.220.9494) running the ClusterConfig.Unsecure.DevCluster configuration.

Docker was installed prior and was working and running these same images. Before installed SF I removed all running containers and cleaned up images and private repository logins.

I am trying to deploy an application pointing to my private repo, e.g. myrepo.azurecr.io/sf/myapp

I am able to use the repository credentials specified in the manifest to login to the repo from docker cli.

When SF tries to deploy the container it states:

Error event: SourceId='System.Hosting', Property='Download:1.0:1.0'.
There was an error during download.Failed to download container image myrepo.azurecr.io/sf/myapp

In the admin log I see this sequence of events:

End(BeginDownloadAndActivate): Error=HostingDeploymentInProgress, VersionedServiceTypeId={MyAppType_App10:MyAppPkg:MyAppType,1.0:1.0:131420102687941111}, ActivationContext=551bf757-1e64-45ca-9812-b99f3875df69, ServicePackagePublicActivationId=d87665d6-1421-4c9d-8d36-73442c5d7b80, SequenceNumber=185
DownloadContainerImages returned 0xd00000e5
Failed to import docker image error 0xd00000e5.
EndSendRequest for image history Error 0xd00000e5
DownloadContainerImages returned 0xd00000e5
Failed to import docker image error 0xd00000e5.
EndSendRequest for image history Error 0xd00000e5
80b671e9b3c9184bbd86d2f150c58135:131419843158391101:131419843669317074 failed to send message AddInstance to node 5101db1125ead8d47d6f93321d3eb754:131419843160891129 with error FABRIC_E_TIMEOUT

ServiceManifest

<EntryPoint>
  <ContainerHost>
    <ImageName>myrepo.azurecr.io/sf/myapp</ImageName>
  </ContainerHost>
</EntryPoint>

ApplicationManifest

<Policies>
  <ContainerHostPolicies CodePackageRef="Code">
    <RepositoryCredentials AccountName="myrepo" Password="mysecret" PasswordEncrypted="false" />
  </ContainerHostPolicies>
</Policies>

Also, it appears that dockerd is not running though I saw in a previous debug log that the docker process manager started dockerd successfully, but then exited with error code 1 which the log said was ok. I haven't seen this happen again in the debug log as of yet.

One other thing to note, the image is rather large at about 9gb

Update: I installed docker on another host, logged into the private repo, pulled the image and then installed SF and was able to deploy the same manifests and run the container successfully. If the image does not exist in docker prior to SF trying to pull it then it fails. with the errors above.

dockerd process is not running successfully if the image doesn't exist

Update: I wiped the images docker rmi $(docker images -q) from my local development workstation running Windows Server 2016, using the local dev SF deployment and deploying the application causes docker to download the image from my private repo. I did the same process on the broken 2016 container host but the main difference is that server is running the Core OS so there is no GUI, the docker host that is working is my 2016 development workstation so it's local to Visual Studio.

Is my core container host broke, unsupported, or what? I am mostly at this point interested in how do I find out what's wrong with it so if it's something I did I don't do it again.

@RajeetN
Copy link

RajeetN commented Jun 16, 2017

Server core host should be fine and work. Is this host deployed on Azure or local onebox? My suspicion is that image download is timing out, but I could be wrong and would need traces to diagnose the issue further.

@smokedlinq
Copy link
Author

Local one box. If you can point me to docs on enabling tracing I'll gladly do that and provide it.

@smokedlinq
Copy link
Author

Just for fun I deployed another Windows Server 2016 core server, had docker running on it, installed service fabric configuration same as my full GUI version and the dockerd process isn't getting started.

PS C:\> docker -H localhost:2375 images
error during connect: Get http://localhost:2375/v1.26/images/json: dial tcp [::1]:2375: connectex: No connection could be made because the target machine actively refused it.

@felschr
Copy link

felschr commented Jun 26, 2017

I get the same "Failed to download container" error but on an Azure cluster (I tried Windows preview and Ubuntu).
I'm using the CI/CD integration to deploy my ASP.NET Core application with multiple services. Locally I am able to run the services using docker-compose from within Visual Studio.
When I try to run it in my local SF cluster I get a similar error:

Error event: SourceId='System.Hosting', Property='Download:1.0:1.0'.
There was an error during download.Container deployment is not supported on the node.

I have the latest preview installed though.

@RajeetN
Copy link

RajeetN commented Jun 26, 2017

Are you using Server2016DataCenter-WithContainers image? The error you are seeing indicates we are not able to find docker on the node.

@felschr
Copy link

felschr commented Jun 27, 2017

@RajeetN No, I only have the preview SF SDK & Docker for Windows installed on my local machine.
I didn't know I need a special OS. This explains why it doesn't work locally.
The problem on Azure remains, though., where I used the Server2016DataCenter-WithContainers.

@mani-ramaswamy
Copy link

Can you please retry with the 5.7 bits (non-preview) and tell us what you find? With Server2016DataCenter-WithContainers, it should work.

@felschr
Copy link

felschr commented Sep 18, 2017

I've retried this with Azure Service Fabric 5.7 with Linux.
Can I run Linux containers on a Windows Service Fabric cluster? If so, I'll try that, too.

@RajeetN
Copy link

RajeetN commented Sep 18, 2017

No you cannot run Linux containers on windows Service Fabric clusters today.

@felschr
Copy link

felschr commented Sep 18, 2017

Ok, in that case I'm still blocked by this issue.

@prasadker
Copy link

Can you RDP on your machine and check if Docker is running?
Also, you mentioned Docker compose - can you paste the compose file here? Service Fabric supports only few compose directives for now.

@felschr
Copy link

felschr commented Sep 18, 2017

Here are my compose files:

docker-compose.yml

services:
  activities:
    image: '[mydomain].azurecr.io/activities@[SHA]'
  calendar:
    image: '[mydomain].azurecr.io/calendar@[SHA]'
  usermanagement:
    image: '[mydomain].azurecr.io/usermanagement@[SHA]'
version: '2.0'

docker-compose.override.yml

version: '2'

services:
  usermanagement:
    environment:
      - ASPNETCORE_ENVIRONMENT=Development
    ports:
      - "8001:80"

  activities:
    environment:
      - ASPNETCORE_ENVIRONMENT=Development
    ports:
      - "8002:80"

  calendar:
    environment:
      - ASPNETCORE_ENVIRONMENT=Development
    ports:
      - "8003:80"

Well, it's a Linux cluster, so I guess I cannot RDP into it, but I'll try if I can access it via SSH somehow.

@prasadker
Copy link

The directives look ok, but SF only supports docker compose version 3, so I don't know if it is rejecting checking the version - https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-docker-compose#supported-compose-directives

Yes please SSH into the node using the cluster FQDN and port 3389 for node 0, 3390 for node 1 and so on.

@felschr
Copy link

felschr commented Sep 20, 2017

When I change the versions to 3 I get the same error.

@prasadker Thanks for the quick info on how to access the nodes via SSH.
Here are the results of docker ps on the node:

felix_schroeter@LinuxNode000000:~$ docker ps
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.30/containers/json: dial unix /var/run/docker.sock: connect: permission denied
felix_schroeter@LinuxNode000000:~$ sudo su
root@LinuxNode000000:/home/felix_schroeter# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
root@LinuxNode000000:/home/felix_schroeter#

And here is the result of ps -aux | grep "[d]ocker":

root@LinuxNode000000:/home/felix_schroeter# ps -aux | grep "[d]ocker"
root       6811  0.0  0.3 275856  7068 ?        Ssl  Sep19   0:42 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc
root     105012  0.1  1.7 699400 34704 pts/7    Ssl+ Sep18   3:02 /usr/bin/dockerd -H localhost:2375 -H unix:///var/run/docker.sock --pidfile /mnt/sfroot/sfdocker.pid

@felschr
Copy link

felschr commented Sep 20, 2017

Is docker-compose supposed to be isntalled?:

root@LinuxNode000000:/home/felix_schroeter# docker-compose ps
The program 'docker-compose' is currently not installed. You can install it by typing:
apt install docker-compose

@mani-ramaswamy
Copy link

If you are operating against a SF cluster, you shouldn't need that. You will be using instructions @ https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-docker-compose.

Since many issues have been discussed, to clarify exactly where you are blocked - you are trying to run (Linux) containers on a (Linux) SF cluster in Azure, correct?

@felschr
Copy link

felschr commented Sep 21, 2017

@mani-ramaswamy Yes, Linux containers on a Linux Service Fabric cluster.
I was using the official Service Fabric docker-compose VSTS release task to deploy.

It's an ASP.NET Core project created with Docker support via Visual Studio.
I already got it running on a Linux Docker swarm cluster on Azure.

@mani-ramaswamy
Copy link

Can you try connecting to the cluster via CLI, and then running the install script below?

sfctl cluster select --endpoint http://:19080
sfctl application upload --path --show-progress
sfctl application provision --application-type-build-path
sfctl application create --app-name fabric:/appName--app-type --app-version

And if you don't have a default service in the app manifest, then create the service as well.
sfctl service create --name fabric:/appName/serviceName --service-type --stateless --instance-count 1 --app-id --singleton-scheme

If you share your manifest files, I can try it on my cluster and see what's going on. Are you able to get it to work with a public docker hub image?

@mani-ramaswamy
Copy link

And to install sfctl, use the instructions @ https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cli.

@ericmaia
Copy link

ericmaia commented Oct 5, 2017

I have an issue that may be related. I am getting the "failed to download" error when I try to deploy a SF Container project to my local cluster. I have noticed that when the local cluster starts up, the Docker engine dies. If Service Fabric is trying to use Docker to pull and install the image, then it would fail. I have already posted an issue in Docker For Windows (docker/for-win#1175) but would it be helpful for me to add more details here, or open a new issue for Service Fabric?

Edit: never mind, I just got a comment on that issue saying that SF does not support containers on Windows 10.

@Marusyk
Copy link

Marusyk commented Jan 10, 2018

So, is it possible to run Linux container in Service Fabric on Windows? I'm getting:

There was an error during download.Failed to download container image
my image

 <ContainerHost>
        <ImageName>rabbitmq:3.6-management</ImageName>
 </ContainerHost>

@mani-ramaswamy
Copy link

No, to run Linux containers, you need a SF Linux cluster at this time.

@Marusyk
Copy link

Marusyk commented Jan 10, 2018

@mani-ramaswamy ok, thanks. After switch Docker to the Windows container I'm also getting

There was an error during download.Failed to download container image
my image micdenny/rabbitmq-windows

https://hub.docker.com/r/micdenny/rabbitmq-windows/

What is the problem?

@mani-ramaswamy
Copy link

It could be timing out. Can you try the instructions @ https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-get-started-containers#configure-container-image-download-time and increase the time out and report back?

@mani-ramaswamy
Copy link

Please also ensure that you are using docker EE (and not the CE variant)

@Marusyk
Copy link

Marusyk commented Jan 11, 2018

@mani-ramaswamy Thanks for response

By default, the Service Fabric runtime allocates a time of 20 minutes to download and extract container images

I tried very small containers and got error immediately after start.

Please also ensure that you are using Docker EE (and not the CE variant)

I'm using Docker CE on my dev machine.
Documentations says:

Prerequisites

  • Install Docker CE for Windows so that you can run containers on Windows 10.

Docker EE for Windows requires Windows Server 2016 or later. I'm not developing on Windows Server

@mani-ramaswamy
Copy link

Service Fabric cannot run Windows containers on Windows 10 locally at present. This will be fixed in an upcoming release.

@muskanaul
Copy link

@mani-ramaswamy Any idea on how soon it will be fixed?

@mani-ramaswamy
Copy link

We're presently testing internal builds with this - the next minor version update of SF (6.2) will have this fixed.

@rn-3
Copy link

rn-3 commented Mar 30, 2018

Getting the same error SF running on local cluster (6.1.467.9494).
Host is a VM running Windows Server 2016 & the guest Container is Windows Server Core.
The error is reported immediately so this is definitely not a time-out issue.

@mani-ramaswamy
Copy link

To re-confirm, the VM is running Windows Server 2016 Datacenter with containers, and the guest container is Windows Server 2016 Datacenter Server Core. The reason I ask is that Windows containers aren't compatible across releases - thus, Windows Server 2016 container aren't compatible with Windows Server 2016 version 1709 hosts and vice versa.

@harahma @RajeetN

@mani-ramaswamy
Copy link

BTW, Docker for Windows (CE) isn't supported, if you were using that. You need EE on the Server.

@rn-3
Copy link

rn-3 commented Mar 30, 2018 via email

@rn-3
Copy link

rn-3 commented Mar 30, 2018 via email

@rn-3
Copy link

rn-3 commented Apr 18, 2018

@mani-ramaswamy @RajeetN
Any update on this issue?

@RajeetN
Copy link

RajeetN commented Apr 18, 2018

@rn-3, do you have version tag specified for your image?

@rn-3
Copy link

rn-3 commented Apr 18, 2018

microsoft/windowsservercore:latest

@RajeetN
Copy link

RajeetN commented Apr 18, 2018

alright, this is not about download from private repository that the original issue reported was about then? Could you please share the traces from your machine? If this is a cluster deployed on azure, please send us the cluster resource name and region.

@rn-3
Copy link

rn-3 commented Apr 18, 2018

I'm getting the same error - 'Failed to download container image hub.docker.com/r//'
I'm using a local cluster on a VM. I'm seeing these errors in the Service Fabric Explorer, Where/how do I get traces?

@rn-3
Copy link

rn-3 commented Apr 18, 2018

I tried deploying again using the docker-compose file and it seems like the Container's have created & started...
Initially, I tried using the Visual Studio Solution & Project template...

@ddobric
Copy link

ddobric commented Jun 8, 2018

I'm also getting same error when using locally (dev cluster) the local registry.
I.E.:

<CodePackage Name="Code" Version="1.0.0">
    <EntryPoint>
      <!-- Follow this link for more information about deploying Windows containers to Service Fabric: https://aka.ms/sfguestcontainers -->
      <ContainerHost>
        <ImageName>mydockerapp:v1</ImageName>
      </ContainerHost>
    </EntryPoint>
    <!-- Pass environment variables to your container: -->
    <!--
    <EnvironmentVariables>
      <EnvironmentVariable Name="VariableName" Value="VariableValue"/>
    </EnvironmentVariables>
    -->
  </CodePackage>

@artisticcheese
Copy link

This shall be SF issue. I deploy this to 3 different SF servers and only one of them showing this issue. All are Windows 2016 LTSC and image is exactly the same, so it can not be mismatch etc.

@mikkelhegn
Copy link

@harahma

@harahma
Copy link

harahma commented Jun 21, 2018

@aryamsft Can you take look at it?

@artisticcheese
Copy link

@harahma This is relevant to the issue in other thread with bunch of files created in temporary folder on that server. microsoft/service-fabric-issues#1122

@Shrirang97
Copy link

Now can we deploy containers in windows 10 1909 machine?

@gkhanna79 gkhanna79 transferred this issue from microsoft/service-fabric-issues Apr 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests