New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agnostic S3 Integration #1192

Closed
Gorian opened this Issue Dec 21, 2018 · 28 comments

Comments

3 participants
@Gorian
Copy link

Gorian commented Dec 21, 2018

Describe the feature you'd like
I'd like an "agnostic" S3 implementation

Describe the benefits this feature would bring to BookStack users
Currently, S3 is limited ONLY to amazon, that I can tell

Additional context
There are other S3 implementations besides Amazon. As it stands, my options for storage with bookstack are, as follows:

  • File (local only)
  • FTP (old, complicated to make highly available, etc)
  • "s3" (Amazon only, meaning it's limited ONLY to paying a cloud provider)
  • Rackspace (rackspace only, again limited to paying a cloud provider)

I want a better solution for storing uploads, etc. than local storage and FTP, and as it happens, I am running a full ceph cluster, complete with a rados gateway, 3 separate instances, load balanced by a highly available, 3-node traefik cluster, to ensure that my rados gateway (and thus, it's s3 api as well) is always up and available, and 3+ nodes in my ceph cluster, with multiple TBs of redundant storage. I have created a user and a bucket, and they are available via the s3 API (http://docs.ceph.com/docs/mimic/radosgw/s3/) - yet, no matter what I do, going so far as working with a friend to add some custom PHP code, BookStack still insists on using on amazonaws.com for s3...

@armouredking

This comment has been minimized.

Copy link

armouredking commented Dec 23, 2018

It looks like Bookstack is using a fairly barebones S3 plugin - league/flysystem-aws-s3-v3. Looking at the GitHub here and here for details.

It seems that there are other options that can be added (endpoint?) to provide the functionality like Rocket.Chat does as mentioned in #1195 and as you desire. Specifically with Minio, Issue #139 and using other endpoints Issue #127; see the post 3 posts down.

I haven't been able to fully test this yet myself but I plan to try soonish.

@armouredking

This comment has been minimized.

Copy link

armouredking commented Dec 23, 2018

So, I have to go to bed as work calls early tomorrow for me, but I've been able to determine a few things:

Manually establishing an "endpoint" parameter in config/filesystems.php will change the Laravel error messages. With this parameter defined, the URL changes to "$bucketname.$endpointaddress", which if you're using SSL will fail due to CN unmatched. Not using SSL, with Minio's case, it still fails as the endpoint should have a directory tail (like a subfolder) but even explicitly defining it in $endpointaddress causes it to be dropped off. And if you define a bucket, it tries to add the bucket as if it's an Amazon S3 address (like a subdomain). And if you leave $bucketname=$null or $false, it fails to perform because it's expecting a bucket name.

More to the point though, looking through ImageService.php in the Bookstack app itself, it still seems like with S3 defined it's hardcoded looking for the Amazon address (see lines 419-441). The Flysystem docs, under "Compatible Storage Protocols", seem to indicate the endpoint setup should work and at least from other bug reports already noted in the post above, people are getting the functionality. They just don't include their configs or patches and I'm terrible at coding.

@Gorian

This comment has been minimized.

Copy link
Author

Gorian commented Dec 23, 2018

Thanks for looking into this @armouredking! I'm not much of a developer either. I'm a systems engineer who can do enough coding in non-scripting languages to get into trouble :P

@armouredking

This comment has been minimized.

Copy link

armouredking commented Dec 23, 2018

If you want to try building something yourself, you could clone the github repo of Bookstack and exchange the league/flysystem plugin for coraxster/flysystem since it seems to support pathing and endpoints. Not sure if there are other requirements or patches that would be necessary for CEPH.

You'd need to make the changes using filesystem.php and probably present a different driver than S3, also unsure if you'd have to make changes to ImageService.php as well since it's hardcoded. A bit much work for me so I'll probably hold off on using Bookstack again. I wanted to use Minio since I'm trying to migrate over to Docker/Swarm/Kubernetes (using the linuxserver/bookstack deployment) and having all the data somewhere else is appealing rather than in the docker volumes. Not really interested in using Amazon's actual service though.

@Gorian

This comment has been minimized.

Copy link
Author

Gorian commented Dec 26, 2018

Yeah, exactly. What would you recommend besides BookStack though? It has all the features I'm looking for EXCEPT for S3 integration. And yeah, I don't have to pay for s3 either, I have 7TBs= of local redundant storage for a reason, no need to pay someone else for it (approximately $164.74/month for paying for equivalent storage from amazon - not really in my budget)

@armouredking

This comment has been minimized.

Copy link

armouredking commented Dec 28, 2018

I've been clubbing a group of tools, which is always what you have to do when you settle. I left Bookstack originally due to some SQL issues (which the author has corrected to my knowledge, not knocking the project as it's single dev) and have been keeping an eye on it waiting for it to mature a bit. I took all my Bookstack content and put it into a WP install with some Envato plugins. WP is annoying as well, and for the majority of the content I'd prefer it in an app like Bookstack.

I have nearly a 100TB of local storage using Joyent ZFS, so I understand where you're coming from. Tape libraries are a requirement, not an advisement, and uploading online is impossible on a single user budget. Not even regarding using lol American internet speeds.

@ssddanbrown

This comment has been minimized.

Copy link
Member

ssddanbrown commented Dec 28, 2018

Hi Both,
Thanks for giving BookStack a try and thanks @armouredking for looking into this.

Just spent a couple of hours looking at this myself and managed to get BookStack going with minio. I'll list the steps below and hopefully the same will apply for ceph. Some changes to BookStack config/filesystems.php file are needed so watch out as these changes may cause trouble or be discarded when updating. Once someone confirms this works for them I'll make these changes properly officially configurable via the .env file.

Minio Steps

As used in examples below, my minio instance base URL is http://172.21.0.2:9000. Where you see this url change it to be your own base URL.

  1. Create a bucket for your content. In these steps my bucket will be called bookstack.
  2. Edit your BookStack .env file and set the following:
STORAGE_TYPE=s3
STORAGE_S3_KEY=<MINO_ACCESS_KEY>
STORAGE_S3_SECRET=<MINO_SECRET_KEY>
STORAGE_S3_BUCKET=bookstack #(Bucket Name)
STORAGE_URL=http://172.21.0.2:9000/bookstack #(Minio public URL + '/' + bucket name)
  1. Edit the BookStack file config/filesystems.php, Find the section starting as 's3' => [ and replace it with the below, changing the endpoint value to be your base minio instance URL:
        's3' => [
            'driver' => 's3',
            'key'    => env('STORAGE_S3_KEY', 'your-key'),
            'secret' => env('STORAGE_S3_SECRET', 'your-secret'),
            'region' => env('STORAGE_S3_REGION', 'your-region'),
            'bucket' => env('STORAGE_S3_BUCKET', 'your-bucket'),
            'endpoint' => 'http://172.21.0.2:9000',
            'use_path_style_endpoint' => true,
        ],
  1. Since Minio does not support Object ACLs, to set uploaded images as publicly fetchable, you'll need to set a policy on the bucket. This can be applied just to images, since file uploads (page attachments) have controlled access thanks to being less frequent. The path prefix for images should be uploads/images/. Set this to read-only then save the policy.

image

  1. Test uploading images, saving drawings & saving attachments.

As said above, Please confirm if this works for you and I'll make the config changes to be officially & properly configurable. The hard-coded s3 URL's mentioned above should not matter, These are only utilised as a backup when a STORAGE_URL is not set.

@Gorian

This comment has been minimized.

Copy link
Author

Gorian commented Dec 28, 2018

@ssddanbrown Thanks for looking into this! I will test this out and see if it works with Ceph :)

@Gorian

This comment has been minimized.

Copy link
Author

Gorian commented Dec 28, 2018

@ssddanbrown @armouredking This seems to work for Ceph using the rados gateway and s3 API for images and drawings. However, it seems to have issues uploading via attachments, both selected via the file browser, and dragged and dropped. I've tested with .pdf, .doc, and .jpg. Attempting to upload an attachment completely breaks bookstack when running via artisan - the server stops responding, and the process have to be killed manually, as ^C in artisan doesn't kill the process. Running via nginx+php-fpm doesn't crash but it just hang at upload indefinitely.

I downloaded https://cyberduck.io/ and used it to connect to the same ceph s3 endpoint, with the same credentials, and it uploaded the same files just fine, so I'm inclined to think it's some bookstack issues, though I could be wrong. I tried scouring all the logs I could find, but I couldn't find any issues, error messages or anything else.

@armouredking

This comment has been minimized.

Copy link

armouredking commented Dec 29, 2018

@ssddanbrown Not having much luck with a Minio implementation.

Following direction, I can get it working using IP addresses and port numbers. However, it fails whenever name resolution is involved. There are no entries in the Laravel log, and the NGINX/PHP logs are also bereft. Step by step:

Edit .env file as noted above, using ADDR:PORT syntax.
Edit the filesystems.php file, again using the endpoint parameter in an ADDR:PORT syntax.
Edit Minio bucket to have ACL as noted.

At this point, I can browse the Bookstack app (using ADDR:PORT) and upload pictures. I can even see them in the image selector. Additionally, using the Minio dashboard I can see that the folder has been created, the image is available, and thumbnail folders are also created. Excellent.

Now, going back into the .env file and the filesystems.php file, I change the ADDR:PORT syntax to HTTPS://DOMAIN.NAME (there are no ports, due to reverse proxy via HaProxy).

I return to the website, this time at domain.name and what do you know - the pictures I uploaded are still visible. Not unexpected: after all, they didn't go anywhere and the only thing that has changed is the link address. Excellent.

Here's where trouble starts. Going back to image upload, the first problem encountered is that the image no longer shows up in the picker. I get a blank UI, as if no pictures (including the visible uploaded ones that can be seen on a page) exist. Attempting to upload a new picture appears to function but eventually will time out. When I say "appears to function" I mean I can select a picture, tell it to upload, watch the little progress bar go to 100%, and then it stops. Using inspector/console in Edge and Firefox, I see the following error occur:
Unhandled promise rejection Error: "Request failed with status code 504"
exports https://domain.name/dist/app.js?version=v0.24.3:51:102989
exports https://domain.name/dist/app.js?version=v0.24.3:57:1314
exports https://domain.name/dist/app.js?version=v0.24.3:51:102003

These do not appear to actually correlate to a specific file, so I'm assuming there's some backend magic going on to make a conglomerate file and I'm assuming lines 51 and 57 might tell me something about what's trying to happen but if those are placement ids those are long lines.

Refreshing the page will now display the following:

503 Service Unavailable
No server is available to handle this request.

HaProxy stats also agrees:
archives | docker_bookstack | DOWN | 665 | 1 | 1 | 665
-- | -- | -- | -- | -- | -- | --
archives | BACKEND | DOWN | 665 | 0 | 0 | 867

If I wait about 3-5 minutes, the server will come back (no reboot required). HaProxy stats will show the service returning with DOWN (1/2) and then back UP. The part that is confusing me though, is that the uploaded image is available in the bucket. As can be seen below, there are two images: the one done via ADDR:PORT (everything worked as expected) and the one uploaded via DOMAIN.NAME (where the server/docker app ingloriously ceases to respond for a time with no error message outside of the 504 reported in Inspector):

image

I'm at a loss as to how to proceed from here, seeing as there are no error messages outside of what inspector tells me. This works for other apps (Rocket.Chat) which goes through the same reverse proxies, so I don't know where it's being hung up.

Unrelated side curiosity -- it's not Dec 12th, and the clocks don't think it is. Why did the upload folder show 12-Dec? Is that just a "make it appear in order" thing?

@Gorian

This comment has been minimized.

Copy link
Author

Gorian commented Jan 7, 2019

any update on this?

@ssddanbrown

This comment has been minimized.

Copy link
Member

ssddanbrown commented Jan 7, 2019

@Gorian I'm afraid no update from my side.
I'd love to help solve this but these are both fairly complex setups that are a little hard to replicate to find the two unique issues above.

@Gorian

This comment has been minimized.

Copy link
Author

Gorian commented Jan 7, 2019

Fair enough, I understand that. I'm not really a php dev or I'd try to do more to help :(

If you setup minio or something and use the above setup, does file attachment uploading work for you?That's the only thing broken for me - image and drawings work fine.

@ssddanbrown

This comment has been minimized.

Copy link
Member

ssddanbrown commented Jan 7, 2019

@Gorian Yeah, Pretty sure I tested attachments.

Does the attachment file itself end up in storage or does it not even upload?
Also, Are you using a docker image or something for ceph? Just wondering how to get a similar instance up quickly without digging too far into the world of ceph.

@Gorian

This comment has been minimized.

Copy link
Author

Gorian commented Jan 8, 2019

@ssddanbrown it doesn't upload, it just... freezes. If i'm running via artisan, it actually crashes and I have to manually kill the php instances. If I run via nginx, it just sits there at "uploading" indefinitely.

I'm not :/ You might be able to find some sort of testing install for ceph? I'll look around when I get a chance. (Un)fortunately, this is my homelabprod, so I'm actually running a 4 node baremetal install of ceph.

@ssddanbrown

This comment has been minimized.

Copy link
Member

ssddanbrown commented Jan 8, 2019

Spent a couple of hours tonight trying to get ceph set up on a virtual machine but could never get the gateway going. Probably because i was trying to install too much on a single host.

Just had a go with ceph-nano as an alternative and that got up and running right away. BookStack worked fine, images and attachments, by following the same steps as using minio.

Even tested by serving BookStack via artisan. This is an odd one.

@armouredking

This comment has been minimized.

Copy link

armouredking commented Jan 11, 2019

@Gorian @ssddanbrown Seems like it's behaving the same way Minio is for me when I define a domain name instead of a address:port combination. The promise rejection error is the one I can always duplicate by watching the debug/developer window in the browser, but unfortunately no logs anywhere provide any amplifying information even when you tell the loggers to scream.

@ssddanbrown

This comment has been minimized.

Copy link
Member

ssddanbrown commented Jan 11, 2019

Seems like it's behaving the same way Minio is for me when I define a domain name instead of a address:port combination.

@armouredking Is this just for the STORAGE_URL value and filesystems.s3.endpoint value? If so I'll wack up an ceph-nano or minio instance on a VPS and give it a go with a full hostname.

@armouredking

This comment has been minimized.

Copy link

armouredking commented Jan 12, 2019

Yes. At least, those are the only two definitions I'm aware of. If Storage_URL and Endpoint are a domain name (with or without ports), I can't get it to work. If they're an IP address:port combination instead, it has no issues. I wanted initially to blame this on the network / proxy since it's the common link between, but again - this setup I have works with other apps (Rocket.Chat is running in an Ubuntu snap behind HaProxy which talks to the Minio docker, also via HaProxy), and with HaProxy in debug log mode it doesn't give me anything during the aborted image upload.

So then my next thought is, I know Bookstack - specifically Laravel - has had issues with trusting proxies in the past, but the APP_URL parameter is supposed to fix that right? Are there other trust issues at work here? That was the best I could cobble together with Googlefu regarding the promise error.

         |-Bookstack
HaProxy -|
         |-Minio

Bookstack and Minio are both running in Docker containers on the same machine. The address:port combo works either by using the defined docker networks and ports or by using their passthroughs, ie "dockerhost:passthroughport". Also, as noted before, if you do all this stuff with address:port setup and then switch it over to the external domain names (and define APP_URL for the proxy) you can still view the pictures on the website after you do this. The problem is just that the image picker window will not be able to see the pictures (it just shows dark grey lines that you can click on which will tell you image details like name and such, and deleting sort of works but doesn't update the image picking screen when you delete - the gray line will still be there even though the image is removed from the database, and you can't see the image ever) and you're not able to upload new ones (because of the promise error/bookstack stalls out).

I've tried using the console to see if any errors pop up on the image picker screen because they're not visible but it gives me nothing. No missing resources, no inaccessible links, nothing. The picker screen loads but the images don't.

@Gorian

This comment has been minimized.

Copy link
Author

Gorian commented Jan 12, 2019

Ah, that's something I didn't think about. I am using a DNS name only, no port. I have my s3 gateway sitting behind a load balancer, so IP:PORT isn't an option.

@ssddanbrown

This comment has been minimized.

Copy link
Member

ssddanbrown commented Jan 13, 2019

Thanks for the extra info @armouredking.

I've just done a test of running minio on an external server (With a set domain/host name) then tested BookStack use of it over the internet. All worked fine as it did locally.

I guess sticking every behind reverse proxies is the next thing to test.

@Gorian

This comment has been minimized.

Copy link
Author

Gorian commented Jan 15, 2019

@ssddanbrown I rolled back to a snapshot of a fresh install, re-added edited, and everything is now working great with ceph. Idk what went wrong there, maybe some user error on my end or something lol :P
I did make sure to use a port, per @armouredking's comment. Attachments, images, and drawings all work now

@ssddanbrown

This comment has been minimized.

Copy link
Member

ssddanbrown commented Jan 15, 2019

@Gorian That's awesome! I'll mark the updates to the config/filesystems.php file for the next release so the required changes can be stored in the .env file and I'll update the documentation with some info for others that need this.

@armouredking Just want to check, Are you still looking to resolve your issues? I know you mentioned you had moved away from BookStack. Happy to keep assisting if your are still looking to resolve (Tried to setup a proxy with bookstack & minio the other night but had docker issues, will have to try again) otherwise I'll focus on other bits.

@Gorian

This comment has been minimized.

Copy link
Author

Gorian commented Jan 16, 2019

@ssddanbrown You rock man! Thank you so much for working to resolve this!

@armouredking

This comment has been minimized.

Copy link

armouredking commented Jan 17, 2019

Been sick, haven't had much opportunity to experiment. I haven't used BSA in a while, I moved off a bit ago. Was attempting to return to it now that it's more mature.

@ssddanbrown No, I cannot resolve the issue. The promise error remains when talking across proxies.

Just for S&G, I attempted to redo everything from scratch including the DB and the proxy. Still no dice. Also verified this is happening when only Minio is proxied, so I think without any other leads I have to put the blame on Laravel proxies (again). I think the response is getting mangled or silently dropped somehow, though I have no idea why. Again, this setup works in other apps, it just doesn't work with Bookstack.

Latest test environment:
Docker running HaProxy, Minio, Bookstack. Minio has an external address via HaProxy, Bookstack does not. No APP_URL is defined. I have attempted endpoints of https://domain.name and https://domain.name:443. Same problems as mentioned above. The image DOES appear in Minio (so the upload occurs), but Bookstack does nothing.

bsa1
Picture uploads, but does not go to selector. Cannot be selected. Bar goes from 0 and cancel upload to Remove with no bar, but does not indicate progress. I was lazy and did not wait for the promise error to actually show up for this picture (it takes a few minutes).
bsa2
Picture can be seen in Minio, and other folders are created as well. So some logic somewhere is working, for whatever reason the message does not appear to get back to Bookstack that it's done?
bsa3
As can be seen here, the database also includes it.

Previous testing environments:
HaProxy running on OPNSense (firewall). Minio and Bookstack running on Docker. Minio and Bookstack proxied externally through HaProxy.
Working: Minio and Bookstack are defined as addr:port. This only allows traffic on localnet though.
Not working: Anything else. Files uploaded while using addr:port are still visible but they cannot be added or deleted.

For the sake of completeness I also tested using the "local" driver and as expected images and files upload and work inside of BookStack. I can actually select and manipulate them using the image picker.

@Gorian

This comment has been minimized.

Copy link
Author

Gorian commented Jan 17, 2019

@armouredking can you get to the URL of an uploaded file from your browser? Can you post your .env config?

@armouredking

This comment has been minimized.

Copy link

armouredking commented Jan 17, 2019

Yes, as noted way back:

Now, going back into the .env file and the filesystems.php file, I change the ADDR:PORT syntax to HTTPS://DOMAIN.NAME (there are no ports, due to reverse proxy via HaProxy).

I return to the website, this time at domain.name and what do you know - the pictures I uploaded are still visible. Not unexpected: after all, they didn't go anywhere and the only thing that has changed is the link address. Excellent.

Here's where trouble starts. Going back to image upload, the first problem encountered is that the image no longer shows up in the picker. I get a blank UI, as if no pictures (including the visible uploaded ones that can be seen on a page) exist. Attempting to upload a new picture appears to function but eventually will time out. When I say "appears to function" I mean I can select a picture, tell it to upload, watch the little progress bar go to 100%, and then it stops. Using inspector/console in Edge and Firefox, I see the following error occur:

My .env is virgin and looks exactly like @ssddanbrown since, well, it's virgin. I have to copy and paste what @ssddanbrown put into this thread or use the documentation / .env.sample files since the provided .env only includes database, app_url, and mail settings. With my initial testing I used the more full featured documentation examples for the environment file since they're well commented but the current test instance is just the provided defaults (if you provide no environment variables to the Docker image the LSIO version gives you a barebones .env in the /config volume to fill out yourself) with copy and paste in from here.

I'm pretty well convinced at this point, based on Googlefu regarding promise errors and the behaviour I'm witnessing, that Bookstack is not communicating properly across the proxy. Clearly Bookstack is telling Minio to upload a file, Minio is doing it, but the message that Minio has done it never seems to make it back to Bookstack. Indeed, I'd argue that Bookstack is sending messages out but never getting a return it acknowledges or accepts. Host headers could be a culprit, but I'm not seeing that via HaProxy's debugging log. Elements will work following that, and one workaround to verify interoperability I've already done is editing the database directly. Of course, this is a horrible idea to attempt in production, it just shows from a backend point of view that Minio and SQL are not the problem. From there it's a game of cat and mouse.

ssddanbrown added a commit that referenced this issue Jan 20, 2019

@ssddanbrown

This comment has been minimized.

Copy link
Member

ssddanbrown commented Jan 20, 2019

As of next release (due today) you'll now be able to set the s3 'endpoint' by setting a STORAGE_S3_ENDPOINT option in your .env file. Setting this will automatically enable the 'use_path_style_endpoint' option.

Will update the documentation here with details of setting this.

I'll close this since we've confirmed in this thread that this implementation will be compatible with other s3 API services. I'll also close #1195 since we've confirmed minio to work with BookStack in some manner. @armouredking I've opened up #1230 to focus on the minio-behind-proxy issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment