Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lease-shell breaks with remote server returned 404 once provider service gets restarted (.manifest.deployments track breaks as well) #87

Open
andy108369 opened this issue Apr 11, 2023 · 5 comments
Assignees
Labels
P2 repo/provider Akash provider-services repo issues

Comments

@andy108369
Copy link
Contributor

andy108369 commented Apr 11, 2023

lease-shell breaks with remote server returned 404 once provider service gets restarted.

.manifest.deployments track breaks as well.

internally tracked https://github.com/ovrclk/engineering/issues/538

This issue appeared in akash 0.16.4 through provider-services 0.2.1.

This issue gets resolved if I revert this commit akash-network/node@1ab8ee6

looks like the ctx is not getting updated with the active leases (upon provider restart) for IsActive to work.


This commit might be also related to manifest.deployments is reporting 0 now (or mainnet4 upgrade-related [provider-services 0.1.0]):

$ curl -sk https://provider.provider-2.prod.ewr1.akash.pub:8443/status | jq '.manifest.deployments'
0

$ curl -sk https://provider.provider-2.prod.ewr1.akash.pub:8443/status | jq '.cluster.inventory.active | length'
60

$ curl -sk https://provider.provider-2.prod.ewr1.akash.pub:8443/status | jq '.cluster.leases'
60

Update: 23 Jan 2023

Akash Provider reports:

@andy108369 andy108369 added repo/provider Akash provider-services repo issues P2 labels Apr 11, 2023
@troian troian removed their assignment Apr 11, 2023
@troian troian added P2 and removed P2 labels Apr 11, 2023
@andy108369
Copy link
Contributor Author

andy108369 commented Oct 12, 2023

workarounds

One can simply add openssh server to their deployment and their public keys to keep a permanent SSH access to the deployment.

For Ubuntu-based image

Make sure to set your public ssh key in SSH_PUBKEY

    image: ubuntu:22.04
    env:
      - 'SSH_PUBKEY=ssh-rsa AAAAB3NzaC1yc...'
    command:
      - "sh"
      - "-c"
    args:
      - 'apt-get update;
      apt-get install -y --no-install-recommends -- ssh;
      mkdir -p -m0755 /run/sshd;
      mkdir -m700 ~/.ssh;
      echo "$SSH_PUBKEY" | tee ~/.ssh/authorized_keys;
      chmod 0600 ~/.ssh/authorized_keys;
      exec /usr/sbin/sshd -D'
    expose:
      # HTTP/HTTPS port
      - port: 80
        as: 80
        to:
          - global: true
      # SSH port
      - port: 22
        as: 22
        to:
          - global: true

Ollama + SSHD example

https://gist.githubusercontent.com/andy108369/b633153179e08cae4115957a2d294643/raw/888e0b9ccb713d81c3e05d23a1e533323bc2a080/ollama-ssh.yaml

For alpine-based image

Make sure to set your public ssh key in SSH_PUBKEY

    image: alpine:3.18.4
    env:
      - 'SSH_PUBKEY=ssh-rsa AAAAB3NzaC1yc...'
    command:
      - "sh"
      - "-c"
    args:
      - 'apk update;
      apk add openssh-server;
      ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key -N "";
      ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key -N "";
      mkdir -m700 ~/.ssh;
      echo "$SSH_PUBKEY" | tee ~/.ssh/authorized_keys;
      chmod 0600 ~/.ssh/authorized_keys;
      exec /usr/sbin/sshd -D'
    expose:
      # HTTP/HTTPS port
      - port: 80
        as: 80
        to:
          - global: true
      # SSH port
      - port: 22
        as: 22
        to:
          - global: true

And to combine the sshd dameon with running the app(s), one can simply add them one by one:

      app1 &
      app2 &
      exec /usr/sbin/sshd -D'

To figure what one has to run (and how) in a specific image:

docker pull <image>
docker image history <image> --no-trunc --format '{{.CreatedBy}}' | grep -E '^WORKDIR|^ENTRYPOINT|^CMD|^USER'

@SGC41
Copy link

SGC41 commented Dec 11, 2023

Would be nice with a fix for this...
a lot of customers, have a bad experience because of it.

@anilmurty
Copy link

anilmurty commented Jan 14, 2024

Added this to the "Up Next" list on the product/ eng roadmap https://github.com/orgs/akash-network/projects/5/views/1

@rekpero
Copy link

rekpero commented Jan 26, 2024

Hey team, fixing this issue quickly would really help us out at Spheron. We've got a bunch of users struggling to connect shell for their keys or to check status, and it's becoming a bit of a headache. Could we get this sorted out as soon as possible? We're more than happy to give it a test run even before it goes live on the main provider code. Thanks a bunch for jumping on this quickly!

@brewsterdrinkwater
Copy link
Contributor

April 2nd, 2024

  • This will be addressed via GRPC migation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 repo/provider Akash provider-services repo issues
Projects
Development

No branches or pull requests

7 participants