Skip to content
This repository has been archived by the owner on Mar 15, 2024. It is now read-only.

Containerized ALS terminates itself unexpectedly #598

Closed
har7an opened this issue Aug 23, 2023 · 4 comments
Closed

Containerized ALS terminates itself unexpectedly #598

har7an opened this issue Aug 23, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@har7an
Copy link

har7an commented Aug 23, 2023

Summary

TL;DR: When using ALS, packaged into a container, with neovim, the client is attached to the current buffer briefly but terminates itself shortly afterwards with an exit code of 1.

Here is the Containerfile I use to build the container:

FROM registry.fedoraproject.org/fedora:38

RUN dnf install -y --nodocs -x nodejs-full-i18n \
    nodejs-npm python3-ansible-lint yamllint ansible git-core && \
    npm install -g @ansible/ansible-language-server && \
    rm -rf /var/cache/dnf

ENTRYPOINT [ "/usr/local/bin/ansible-language-server" ]

nvim then attaches to the container with the --stdio parameter. I'm calling the container through the following shell wrapper:

#!/usr/bin/env bash
set -euo pipefail

TTY="-it"
[[ -t 1 ]] || TTY="-i"

exec podman run --rm "$TTY" \
    --security-opt label=disable \
    --network none \
    --log-driver none \
    --detach-keys "" \
    -v "$HOME:$HOME" \
    -w "$PWD" \
    localhost/ansible-language-server:latest "$@"

I have tried various parameter combinations to podman, but none of them had much of an effect. I also tried building the container above with specific versions of ALS starting from 1.1.0, but all show the same behavior. When removing --rm from the container wrapper, the container remains on my PC and I can see the log (which show pretty much exactly what's inside the nvim logs below). I also see that the container quits with an exit code of 1, but I see no reason for that. The last message written to stdout is always the "rpc.receive" with a very long array "data" full of numbers.

I'm pretty sure the combination of running LSP as a podman container from a toolbx container isn't the problem here because I do the same thing for other LSPs as well (in particular, Python, Lua and R) and they work just fine. I also ruled my nvim configuration as culprit by using a "minimal" config supplied by the nvim-lspconfig project which showed the same behavior.

Since I don't use VSCode I attached the nvim trace log below instead. I hope this helps! I'll happily provide additional info if required.

Extension version

N/A

VS Code version

N/A

Ansible Version

$ podman run --rm -it --entrypoint ansible localhost/ansible-language-server:latest --version
ansible [core 2.14.8]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.11/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.11.4 (main, Jun  7 2023, 00:00:00) [GCC 13.1.1 20230511 (Red Hat 13.1.1-2)] (/usr/bin/python3)
  jinja version = 3.0.3
  libyaml = True

OS / Environment

  • Host OS: Fedora CoreOS 38.20230722.3.0
  • Environment: Fedora 38 toolbx container
  • podman: 4.5.1
  • ansible-lint: 6.13.1
  • yamllint: 1.32.0

Relevant log output

My log is too long, attaching as file instead.
@har7an har7an added bug Something isn't working new Use track issue requiring triage labels Aug 23, 2023
@har7an
Copy link
Author

har7an commented Aug 23, 2023

Here is the log:
als-error.log

@dcunited001
Copy link

I've found that it's pretty hard to ensure the local paths on a project match the remote paths in the container's bind-mount. So I end up with a bunch of timeouts, where the LSP server in the container is being asked to find a file outside its file system.

So my startup looks like this:

docker run --rm -i \
--volume type=bind,src=/home/me/src/ansible-role-gitlab,dst=/root/project \
dc/lsp-docker \
ansible-language-server --stdio"

And i'm getting events like this:

(:timed-out :textDocument/hover :id 7 :params
            (:textDocument
             (:uri "file:///home/me/src/ansible-role-gitlab/tasks/main.yml")
             :position
             (:line 1 :character 85)))

It's really not worth it to try to connect to ALS running in a container, unless vim has something like docker-tramp.el. Tramp allows you to transparently work with remote files as though they're local. Docker-tramp allows you to start a container then have emacs start processes on the container as though everything is local.

My notes are here, but they're a bit scattered: Emacs: Using lsp-docker from eglot.

I was getting that error at several points. Basically it means the container process is shutting down immediately. This could be for a variety of reasons. For me, it happened when I rebuilt the container, but the workdir didn't exist.

@ssbarnea ssbarnea removed the new Use track issue requiring triage label Sep 20, 2023
@vRoussel
Copy link

vRoussel commented Mar 4, 2024

I have the same issue, did you end up finding a solution @har7an ?

@har7an
Copy link
Author

har7an commented Mar 4, 2024

I've found that it's pretty hard to ensure the local paths on a project match the remote paths in the container's bind-mount. So I end up with a bunch of timeouts, where the LSP server in the container is being asked to find a file outside its file system.

That's only true if you mount host paths under different directories inside your container. I generally launch containers with the git repo of the current file bind-mounted (and keeping the PWD of the parent process), or, if there is no git repo, fall back to either mounting the PWD or all of HOME, depending on how I expect to use it. But whatever I do: I keep the paths intact (i.e. something like -v "$PWD:$PWD" -w "$PWD"). So what you describe is not an issue for me and I don't need something like emacs tramp either.

In the meantime, I have found the error but forgot to report back (thanks for the bump @vRoussel). It appears that the ansible LSP is built on top of a typescript LSP-template offered my Microsoft. This template (and apparently all LSPs derived from it) has a very annoying property: It receives the PID of whoever started it as a startup parameter and, if it cannot find that PID itself, kills itself shortly afterwards (See neovim/neovim#14504 for a discussion in neovim).

Since the process runs in a container, it has, by default, no access to the hosts PID namespace. So it doesn't find the PID and stops right there. There are (at least) two solutions to this problem:

Break sandboxing

As the name implies, this shouldn't be your favored option, but it will get you going if all else fails: Run the LSP container with --pid=host. This will expose the whole PID namespace of the host to the container, so the LSP finds the PID it was passed on startup and lives happily ever after.

Override the processId parameter

This keeps PID namespace isolation intact and works by overriding the startup parameters passed to the Ansible LSP. In short, set the processId parameter to whatever your languages equivalent to the JSON "null" (~) is.

In my case, being a nvim user with nvim-lspconfig, here's my LSP config:

Note: When copy-pasting this, keep in mind that on_attach is a table defined elsewhere in my case.

require("lspconfig").ansiblels.setup({
    on_attach = on_attach,
    enabled = true,
    -- See: https://github.com/neovim/neovim/issues/14504#issuecomment-833940045
    before_init = function(init_params, _)
        init_params.processId = vim.NIL
    end,
})

Hope this helps!

@har7an har7an closed this as completed Mar 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

4 participants