Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add alternative light weight ruby backend without dependencies #10

Open
wants to merge 3 commits into
base: master
from

Conversation

@pachacamac
Copy link

commented Sep 4, 2019

Hey there @bilde2910 thanks for building Hauk! Really like the idea of having a self hosted real time location share service! However since I'm only using it rarely and don't have PHP & Memcached on my server I felt like quickly hacking together a drop in replacement backend that consists of a single Ruby file and has almost no dependencies (except http://sinatrarb.com/). It stores sessions in memory and saves/loads them to/from a json file when the server gets restarted.

It runs standalone with or without SSL or can run behind an existing webserver such as nginx.

So was wondering if you would like to merge this somehow in.

Cheers!

@bilde2910

This comment has been minimized.

Copy link
Owner

commented Sep 4, 2019

Hi @pachacamac, thanks a lot for your contribution! I've been meaning to figure out a way to avoid using PHP as a backend, or at least have an alternative to it that scales better. PHP is easy to work with and has wide compatibility, but it's not without issues, so your Ruby implementation is a welcome one.

I took a brief look of the code and it looks good, but I have a few suggestions/recommendations.

The original PHP code generates a random session ID and has a separate function to convert that ID to a link ID which is used in the public share URL. I see you chose to generate nicer, readable IDs for the link ID, which is fine, but would it be possible to make it so that instead of sessionID == linkID, that ID is either generated from a hash of the session ID e.g. hash(sessionID) == linkID, or the two IDs be generated separately from each other and be connected to each other via a map/dictionary e.g. sessionID = randHexString(); linkID = niceLinkString(); sessions[sessionID] = linkID? This is mostly as a security precaution - if the two IDs are the same, or if the session ID can easily be derived from the link ID, it's possible to hijack a share by taking the public link ID, using it as the session ID in a malicious client, who can then submit wrong location updates to the shared URL. If the IDs are separate, then this cannot happen. Location data could be stored in the sessions array using the link ID as a key. The session ID is only used when new locations are pushed, the share is stopped, or other administrative tasks. Storing these changes at the right place in the sessions array would require the corresponding link ID, which can be derived via either the hash, or the session->link ID dictionary, depending on which approach is chosen.

Additionally, some minor things:

  • Can the config be moved to a separate JSON file, which is just read on startup?

  • Since we have two backends now, I will rename the existing backend folder to "backend-php". Can you rename your backend2 folder to "backend-ruby"?

Finally, I'd like to ask what your thoughts are on dropping the sessions.json file and keeping all sessions entirely in memory. The reason I use memcached in the PHP backend is that not only are shares meant to be temporary, but updates to the backend could change the structure of data that is stored there; if the data is transient then there is less chance that a version conflict could cause issues with the session data, and makes it easier to completely flush all data. If you think keeping sessions on disk when not running is better, could there be an option to disable that behavior, e.g. if the file name in the config is an empty string, data is only kept in memory and not read from/written to file?

I appreciate your contribution and look forward to hearing your thoughts.

@pachacamac

This comment has been minimized.

Copy link
Author

commented Sep 4, 2019

Hey @bilde2910 thanks for the nice feedback! Let me go through your points one by one:

alternative to it that scales better.

I'm not so sure this version scales better. Would be interesting to do some load testing and compare. Usually Webrick (built in ruby webserver) isn't the best choice for scaling but I wanted to keep dependencies at a minimum so that it's an alternative that requires minimum setup and can just be dropped in quickly for when setting up php/memcached seems like hassle.

session id vs friendly id

Makes sense. Thought at first this would be prevented by the password but I get what you mean. Fixed that by creating a session id out of a friendly id by hashing it together with a secret salt. Lines 110 and 124.

Can the config be moved to a separate JSON file, which is just read on startup?

Sure can and now is.

Renaming backends

Did that, also updated dockerfile.

Dropping sessions

I added a new config that makes it possible to select if sessions should be stored or not. Furthermore there is a before hook that takes care of session expiration before every request to emulate the memcached session expiration feature.

PS: Right now I copied your frontend files to serve them from the ruby backend. Might be better if we move the frontend into its own folder outside of the backend so that the same frontend code is used by both backends?

@bilde2910

This comment has been minimized.

Copy link
Owner

commented Sep 5, 2019

Thanks for the quick patches!

I'm not so sure this version scales better.

Consider an example where there is one active share, with one other person watching that share in real-time. The sharing interval is set to 1 second, meaning we get 2 HTTP requests per second; one from each user.

The problem with PHP is that, as far as I know, PHP does not cache anything on the file system by default. PHP has an opcache extension that does some form of caching, but I'm not sure how this works entirely, and it is disabled by default. If my understanding is correct, that means for every request, all the scripts and include files are read from disk. Each request requires four files to be parsed (post.php/fetch.php, inc.php, config.php, memcache.php/memcached.php). At 2 req/s, that is 8 files read per second. If you have group shares, or multiple people watching, that I/O load starts increasing pretty quickly. It would be interesting to run some load testing against the PHP backend and monitor IOPS and I/O performance, particularly if Hauk is run from a traditional HDD.

Edit, Sept 10: I did load testing that would indicate that I am mistaken regarding the above paragraph. Results here.

Your Ruby implementation starts and runs its own web server, and when a request is received from a client, it is handled entirely in memory. There is no reads/writes to/from disk involved. That should mean that your implementation is mostly CPU-bound, the cap for which I'd assume is generally much higher than I/O load.

PHP is admittedly a poor choice for the backend for this project. The ideal solution would be to have some kind of solid backend (a native one in Rust, for example), with websocket support. With websockets, each client watching the map would only result in a single connection being maintained instead of a new request being fired every second. This also cuts down on data usage quite dramatically. I want to make such a backend, but the only language I'm skilled enough in to pull that off in is Java (which honestly doesn't seem like a very good alternative). I've never worked with Rust before, but it would be nice if I can find some examples to base it on and then try my best at implementing a solid backend. That would probably make your Ruby backend obsolete, though.

Might be better if we move the frontend into its own folder outside of the backend so that the same frontend code is used by both backends?

Definitely agree on this one. The frontend code could be in its own frontend folder in the repository root. Would need some small changes to the Dockerfile, but should be quite straight-forward.


The rest of your changes look good. Aside from testing your backend, there is one remaining problem that I would like to resolve if this is merged. There is currently another issue open, whose resolution will result in some significant changes to the backend. When #7 is implemented, the backend must be capable of handling multi-user shares. The protocol and associated backend code for that change is not finalized yet, but when it's merged, it would need to be covered in the Ruby implementation as well. I'm new to Ruby, so I might need your help to implement that part once it's done.

@pachacamac

This comment has been minimized.

Copy link
Author

commented Sep 5, 2019

That should mean that your implementation is mostly CPU-bound, the cap for which I'd assume is generally much higher than I/O load.

Good point. That's for sure. But you could serve PHP from an in-memory file system to achieve roughly the same if you really wanted to. Another option would be something like this.

I was actually thinking about websocket but wanted to stay 100% compatible with your android app (I suck at android development). If you want to build websocket support into the android app I'm more than happy to help you build you a websocket backend in Ruby or JavaScript for it. Its really not hard. You can see how this would roughly look like e.g. here. I was wanting to look into Rust but didn't get the time yet. As far as I know one advantage of Rust would be that you could compile completely self contained binaries without dependencies.

But regarding the "old" backend I'm also happy to help you make the needed changes to get #7 working in the Ruby code once the spec is fix.

@bilde2910

This comment has been minimized.

Copy link
Owner

commented Sep 5, 2019

That sounds good. The first priority is to make the in-browser map client and associated JavaScript work with WebSockets. The call to fetch.php could, in addition to the current JSON data, return a JSON field wsURL that is either a URL or null, depending on whether the server prefers using WebSockets to communicate. If e.g. wsURL == "./ws/fetch", the browser establishes a WebSocket connection to the relative path ./ws/fetch and receives updates every X seconds from the server as JSON data. If wsURL == null, the JavaScript will instead send an HTTP request every X seconds to get the latest update, like it does presently. That will let the same frontend code with both the PHP backend and a Ruby backend that supports WS.

As for the Android app, I need to do more consideration first. It's possible that I can return a WS URL as part of the create.php handshake, but I need to do more research into how WS works in Android, how to implement it, whether it can be done without a lot of libraries, and how to e.g. gracefully handle disconnects.

@bilde2910

This comment has been minimized.

Copy link
Owner

commented Sep 10, 2019

I've completed the backend on #7, renamed the backend folder, split the static assets out to frontend and updated the Dockerfile on my side. You may want to rebase your fork to the updated master branch of this repo, so you have the updated backend and structure to work on top of.

As for the changes to this patch - the structure is a bit reworked. I suppose the most important part is the changes to the API. All responses must return a header X-Hauk-Version: 1.1 (version number changing for future releases) from now on. Beyond that, please check the API code for a better explanation than what I will be able to provide here.

The interal data structure has also changed significantly. I can provide an in-depth explanation here if you wish.

@bilde2910

This comment has been minimized.

Copy link
Owner

commented Sep 10, 2019

I did some load testing on the PHP backend. Running the httpd on localhost and flooding the it at just over 700 req/s, the server had no problems keeping up apart from the CPU maxing out, though this could also be due to starting a lot of simultaneous HTTP requests. I/O and IOPS metrics seemed to indicate that the only disk activity going on was Apache writing to its log file. There no measurable disk read activity during the test.

To verify this, I ran the same flood against a server running both its OS and Hauk from traditional, spinning rust HDDs, using the Hauk Docker image. There was an initial spike of disk read activity, but this settled down to 0 B/s within a few seconds. So there is definitely caching going on, which is good.

Seems I was mistaken about PHP not caching and reading a lot of files per second. This thread is getting some attention from elsewhere on the web, so I'll be updating my previous comment with the result of this test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.