Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues whilst trying to start a "fresh" server. #162

Open
eclipsek20 opened this issue Dec 30, 2021 · 4 comments
Open

Issues whilst trying to start a "fresh" server. #162

eclipsek20 opened this issue Dec 30, 2021 · 4 comments

Comments

@eclipsek20
Copy link

eclipsek20 commented Dec 30, 2021

Hello. I get this error and it seems that I cant fix it:

2021 Dec 30 (03:01:56 PST)
app/models/minecraft/node.rb:52:in `block in pid'

Error getting Minecraft pid: MCSW API network exception: Failed to open TCP connection to 46.101.196.60:5000 (Connection refused - connect(2) for "46.101.196.60" port 5000)

2021 Dec 30 (03:01:55 PST)
app/models/minecraft.rb:57:in `properties'

Error getting Minecraft properties: MCSW API network exception: Failed to open TCP connection to 46.101.196.60:5000 (Connection refused - connect(2) for "46.101.196.60" port 5000)

I am running fra1/c-4 with paper 1.18.1
On a sidenote why is it connecting to port 5000 and not 22?

@eclipsek20 eclipsek20 changed the title Issues whilst trying to create a server. Issues whilst trying to start a "fresh" server. Dec 30, 2021
@eclipsek20
Copy link
Author

#143 Seems like other people have had this issue, and it resolved itself automatically. If this is the case then this should be investigated for a root cause, this may cause damage in the long-term.

@eclipsek20
Copy link
Author

Welp this seems dead

@kenny-evitt
Copy link

@eclipsek20 This has happened to me before too (I think), and like for you, it also "resolved itself automatically".

I'm pretty sure the maintainer would like to investigate and determine the root cause, but that's hard and they probably don't have the time or energy to spare – especially given the other open issues they'd probably also like to investigate (and new features that would probably be more fun to work on too).

I don't think this would likely cause any significant damage long-term. There should be snapshots of the server (DigitalOcean 'droplet') saved in your DigitalOcean account.

The kind of 'infrastructure orchestration' that Gamocosm does is (or can) be tricky. Handling Minecraft too (and multiple versions, and mods) is much harder still. It's pretty reasonable to expect some hiccups occasionally. (I haven't used it in several months but everything worked first try for me earlier today – that's pretty amazing actually!)

In case someone else (maybe me) wants to look at this, it looks like the code mentioned in the errors might just need to be retried?

The port 5000 is for the Minecraft Server Wrapper, a separate API service (but also part of the larger Gamocosm project/service) that interacts directly with Minecraft. Maybe that wrapper service is just (sometimes) a little slow starting up but the code that throws this error is only trying to connect one time? It looks like the default timeout is just four (4) seconds.

So the port 5000 isn't for connecting to an SSH server. (But I think the default SSH port for the droplets isn't 22 either, for a (minor) security boost.)

@eclipsek20 If you don't mind, would you edit your original comment and format the error info as a code block please? That'd make it a little easier to read for anyone that ends up working on this.

@Raekye
Copy link
Member

Raekye commented Aug 28, 2022

Hello, dropping in for my sporadic maintenance of Gamocosm 😅 (Of course, you may have long moved on, but I think these issues serve as good information for others/history)

On the issue

Firstly, as @kenny-evitt mentions, port 5000 is for the Minecraft server wrapper, Gamocosm's lightweight process on users' servers/droplets to provide an HTTP API for basic functions - so that, for example, you can cleanly stop a Minecraft server from the control panel.

In your case, for example, Gamocosm seems have successfully built the Digital Ocean server/droplet. However, the actual Minecraft server process isn't started automatically (* 1); there is an HTTP API endpoint for that, which under normal circumstances Gamocosm can connect to.

So why can't Gamocosm connect here? Again, as @kenny-evitt mentions, it's quite difficult to investigate the root cause for something that interacts with as much infrastructure as Gamocosm. Not only is it non-deterministic, but all things considered, with all the (arguably relatively few) users of Gamocosm, this seems quite rare (it may be consistent at some time for some user like you, but if I just sit down and start creating servers, I haven't seen the issue myself (* 2)).

Note that the "create/resume" server process on Gamocosm looks like this:

  1. Call Digital Ocean API to create a droplet.
  2. Start a worker to continuously query the Digital Ocean API for when the droplet has finished being created on Digital Ocean's side.
  3. Start a worker to try (and retry) to establish an SSH connection with the droplet (* 3).
  4. Start a worker to SSH and install/update stuff.
  5. Start a worker to start the Minecraft server process via the Minecraft server wrapper API.

Going by the messages we have, Gamocosm has finished step 4 and tried to do step 5. However, for whatever reasons, it cannot connect to the Minecraft server wrapper API. Again, as @kenny-evitt notes, it should be harmless from a data-safety point of view. On top of what he says about there being snapshots of the server, the "only" issue is that the Minecraft server can't be started. In the worst case scenario, you can always access the droplet directly, via SSH or Digital Ocean's control panel, to recover your data (if this isn't possible, there is something very wrong with Digital Ocean, far out of Gamocosm's scope).

(* 1): You may wonder why Gamocosm doesn't set things up so that the Minecraft server starts automatically with the Digital Ocean droplet. This is primarily so that the Minecraft server can be stopped cleanly by writing "stop" to it's standard input. I believe sending the signal SIGINT or SIGTERM triggers the same shutdown process, but unless it's since been added, Minecraft isn't documented as such (that sending a signal to trigger shutdown is "clean").

(* 2): I have seen this error message a long time ago. In my experience in the past, Digital Ocean droplets could be finicky with their network connectivity for up to several minutes after Digital Ocean reports that it is ready. Now adays, I only occasionally run into an issue SSHing into a new droplet. And in your case, Gamocosm should have already long SSHed into and done stuff with the droplet, so network connectivity definitely should be there.

(* 3): As mentioned previously, there used to be issues with network connectivity even after Digital Ocean says the droplet is ready.

On debugging the issue

If anyone encounters this issue, is comfortable SSHing, and is willing to try to debug it, I would ask for the following things:

  • SSH into the server.
  • Run ps aux | grep mcsw. Hopefully you can notice the Minecraft server wrapper process.
  • If the wrapper is running, try curl localhost:5000. You should get a JSON response with the wrapper version.
    • Presumably this should succeed... not exactly sure off the top of my head what should be tested if this fails, but it would be good to know as a stepping stone.
  • If the wrapper is not running, check the output of journalctl -u mcsw (hit capital G to scroll to the bottom).
    • Presumably, there will be some message about why it has failed to start. This would explain why Gamocosm can't connect to it.

On fixing the issue

Again @kenny-evitt has noted - the default timeout for communicating with the Minecraft server wrapper is just 4 seconds. It has been set low, since, for example, when you click "pause" on Gamocosm's control panel, the Gamocosm makes its HTTP request to the wrapper API synchronously. So loading the next Gamocosm page would hang until the request is finished, and intuitively requests shouldn't take more than 4 seconds unless something else is wrong. So I limited it to 4 seconds.

Currently, Gamocosm just tries once to start the Minecraft server via the wrapper API after finishing SSH setup. Given that the issue is "harmless" (just annoying/inconvenient), and that my best (though not very good/thorough) is ~random connectivity issues, I believe the best fix is to have step 5 above repeat a few seconds later if it fails. That should be able to be done Soon(tm).

Other future work

I'm not a fan of the Minecraft server wrapper in the sense it's another piece of infrastructure that's also a point of failure (as evidenced here). Last time I was working on Gamocosm (December of last year), I started looking into more proper/full-fledged, control panels. At least with those 3rd party control panels, they'd (presumably) be better maintained.

On this past hiatus

I've given many excuses before about me not actively maintaining Gamocosm 😅 Most recently, I was last working on Gamocosm at the end of 2021. I was cleaning up a lot of stuff, and at some point I tried containerizing the Gamocosm deployment, which also had the potential of simplifying CI and even development environments. However, after I had nearly everything set up, and after a lot of pain and struggling, I found out that it's not (practically) possible to run start rootless podman containers with systemd on boot -.- That kinda hard-tilted me off Gamocosm, and also put it in an awkward state because my config was inbetween the old and new setup (without and with containers).

Yesterday and today I've gone back and fixed my config, fully porting the setup to use containers. In the end, the containers (puma web server, sidekiq server/workers, database, redis * 2) just need to be run as root. I accepted that while it's conceptually nicer to be able to run these containers as unprivileged users, in practice, short of a vulnerability in Linux's containerization, containers are arguably better isolated than running aforementioned processes as unprivileged users (but not in a container). This is because a rogue process in a container can really only access the minimal things inside the container, whereas a rogue, unprivileged process running directly on the host can still read many system files and interact with other processes.

Other notes

Once again, as @kenny-evitt notes, Gamocosm changes the SSH port from the SSH default of 22, to 4022. Fundamentally, this doesn't improve security; it may only protect weak setups from simple attacks. (And arguably, with SSH on port 22, users will likely see more rogue login attempts, which may scare those who are unfamiliar.)

And also much thanks to @kenny-evitt (and everyone else) for being understanding of my very slow responses/maintenance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants