Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cloudflare R2 file storage as an optional feature #465

Merged
merged 11 commits into from
Jul 19, 2024

Conversation

jlvallelonga
Copy link
Contributor

Adds Cloudflare R2 as option for storing files with Active Storage.

Task from here: https://github.com/AllYourBot/hostedgpt/wiki/Draft-project-plan

Changing ActiveStorage config from Postgres to Cloudflare's S3 alternative. Regardless of what we learn from our load test, we need to get these big files out of the database)

Starting this as an optional change, but happy to modify this to make it a requirement or change it to be the default if we want.

@krschacht
Copy link
Contributor

I’ll leave comments inline on the PR, but just to comment on the overall direction. I think it’s absolutely right to implement this as a configurable option via feature flag, but keep the default to be postgresql. For anyone running the app just for themselves, postgres is simpler setup. This S3-like option is really just for larger deployments.

@@ -10,6 +10,8 @@ services:
POSTGRES_PASSWORD: secret
volumes:
- hostedgpt_pgdata:/var/lib/postgresql/data
ports:
- 5433:5432
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? The docker compose has been working for me without this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary, but useful if you want to connect to the postgres instance that's running in docker from your local machine via tools like postico. Happy to remove it if you want. I was just using it to look at my database

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that makes sense. I think I used to have a couple ports exposed and then someone submitted a PR to remove them saying: it’s exposing ports which are conflicting with my local services and those ports arent being used.

So i think it helps ppl for us to expose less

access_key_id: <%= ENV["CLOUDFLARE_ACCESS_KEY_ID"] || Rails.application.credentials.dig(:cloudflare, :access_key_id) %>
secret_access_key: <%= ENV["CLOUDFLARE_SECRET_ACCESS_KEY"] || Rails.application.credentials.dig(:cloudflare, :secret_access_key) %>
region: auto
bucket: <%= ENV["CLOUDFLARE_BUCKET"] || Rails.application.credentials.dig(:cloudflare, :bucket) %>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these should be Setting. and then the “ENV || *.credential” happens inside options.yml

@@ -36,9 +36,10 @@
# Disable request forgery protection in test environment.
config.action_controller.allow_forgery_protection = false

# Store uploaded files on the local file system in a temporary directory.
# Store uploaded files in the db for testing
Copy link
Contributor

@krschacht krschacht Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just delete this comment, especially when it’s already out of sync with storage.yml. :) People can just check storage.yml if they want to know how test is storing files.

I’m definitely in the camp of “comment very little unless it’s really needed to clarify because inevitably the comments will get out of sync from the code.” This is one of those comments that was generated by rails with the original “rails new” and I haven’t gone through the codebase and removed those generated comments which have become irrelevant.

But actually, aside from this comment, don’t you want to remove active_storage.service from this file entirely since it’s now within application.rb?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah agreed with the comments. If the code explains itself without too much trouble then no need for further explanation with comments.

For this file I purposely left it here thinking that we should override anything that would be set in the application.rb config file here since I don't think we want to upload anything to cloud storage during tests. At least that's my initial thought. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we don't want to upload to cloud storage during tests. But since the default is postgres, we shouldn't need to override this for test and instead can use the postgres default in the test environment. I think it's a good thing that tests will use the postgres adapter because if we ever inadvertently break that adapter we'll notice failing tests.

config.active_storage.service = :test
config.active_storage.variant_processor = :mini_magick # vips is better but does not work on Mac Apple Silicon so using this in dev & test
# vips is better but does not work on Mac Apple Silicon so using this in dev & test
config.active_storage.variant_processor = :mini_magick
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s worth reverting this. Even though it makes the line really long, which is a bit annoying, the few times I do add comments I try to bias towards having them on the actual line they’re explaining rather than their own line, only because it slightly decreases the frequency that someone updates a line without reading the comment.

I feel like I stated that in a confusing way. All I’m trying to say is: if the comment is on its own line, it’s more likely to get out of sync. So let’s keep it connected to the mini_magick thing even though it makes the line really long.

README.md Outdated Show resolved Hide resolved
if ENV["CLOUDFLARE_STORAGE_FEATURE"] == "true"
config.active_storage.service = :cloudflare
else
config.active_storage.service = :local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rename “local” to “database”, for clarity. It looks like I changed the backend for “local” within storage.yml, since it was previously referring to local file system, but failed to give it a more logical name. oops. :)

@@ -44,6 +44,7 @@ gem "tiktoken_ruby", "~> 0.0.6"
gem "solid_queue", "~> 0.2.1"
gem "name_of_person"
gem "actioncable-enhanced-postgresql-adapter" # longer paylaods w/ postgresql actioncable
gem "aws-sdk-s3", require: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not familiar with require: false, I just looked it up and it sounds like when this is false the you need to explicitly set “require …” in the file that uses it. Is that the intent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. Maybe rails requires it explicitly internally. Just going based on the guides here: https://guides.rubyonrails.org/active_storage_overview.html#s3-service-amazon-s3-and-s3-compatible-apis

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting, maybe keep it then until you run into problems and then you'll either solve the problem by doing an explicit "require ..." or removing this flag.

Gemfile options is one thing I always mean to go deep on and truly understand, but I've still not done it yet. So I get confused myself on the various version specifications (or lack there of) and other options. Now that I know about "require false" I could probably switch a few gems to use that since there are some I only use in a single file within the app.

@@ -33,6 +33,13 @@ class Application < Rails::Application
config.time_zone = "Central Time (US & Canada)"
config.eager_load_paths << Rails.root.join("lib")

# Active Storage
if ENV["CLOUDFLARE_STORAGE_FEATURE"] == "true"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't use Feature. here because application hasn't finished starting yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh, shoot. Hmm. I think then still add it to Feature and options.yml, just in case other parts of the app need to check it, and then here just repeat the “ENV || application…” which you put in the options.yml

jlvallelonga and others added 2 commits July 17, 2024 07:58
Co-authored-by: Keith Schacht <krschacht@gmail.com>
@@ -1,9 +1,27 @@
# TODO: remove test, local, and local_public options from this file
# figure out how to migrate existing records to handle that change
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ran into some issues with existing database records when I changed local to database so I left the existing test, local, and local_public here for now and added this TODO. I don't plan to leave this here, but will come back tomorrow and finish. I can either revert the name changes or do a database migration to try to update existing data with the new names. I would definitely want someone with more test data to try the migration if that's the route we take though since I have almost nothing in my db at this point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. And for trying things out in development, it really helps to load the fixture data into your dev database. I used to have the seeds file configured to do this but it caused some issues in production. This is an old commit back when it was still working:

https://github.com/AllYourBot/hostedgpt/blob/f66ee3b8d2bac67acfe7f8a174f29930d7176af6/db/seeds.rb

I think this should work as is, although it may be missing a few more recent database tables so check schema.rb and maybe add the missing items to the array. Note: the array is ordered so tables referenced by other tables are loaded first.

Copy link
Contributor

@krschacht krschacht Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, after you update this file, you run bin/rails db:seed to load it. If that doesn't work, you maybe have to run bin/rails db:drop; bin/rails db:prepare. I vaguely remember having some issues with referential integrity.

So, if that still doesn't work, then maybe copy and paste the referential integrity hack from this version of the seeds file:
https://github.com/AllYourBot/hostedgpt/blob/3de36588ca922a22edd232c70fe27967576e78ad/db/seeds.rb

I really probably should have preserved this "load fixtures" trick but moved it into a rake task. If you get this working, maybe when you're done with this PR we tee up another PR that turns this into a clean rake task. The only reason I removed this is because it was causing some issues with the various production hosting environments due to stricter postgres permissions, but this is quite helpful in development.

@jlvallelonga
Copy link
Contributor Author

I think this is all good, but I'm working on running the seeds to verify. I was able to create some files on the main branch and then checkout this branch, see it breaking, and then migrate to see it working again. If we're ok with that then feel free to merge, but otherwise I'll try to have the seeds working before I sleep tonight or tomorrow

@krschacht
Copy link
Contributor

krschacht commented Jul 18, 2024 via email

@krschacht
Copy link
Contributor

@jlvallelonga I gave this one more read through before merging in. I didn't love the special-casing of this one feature being ENV only (and needing to explain that) so I figured out how to make Feature... work within application.rb. You can see it in this commit, if you're curious: 8873ede (#465)

While I was in there, I did a close read-through of all the optional features we cover within the README and I did copy editing of that to try and streamline it a bit. While doing this, I decided to remove any mention of the rails credentials file. We'll still support it, but I'll consider that an advanced thing for people who already know how rails credentials work. For everyone else, it's easiest and more standard to simply set ENV.

@krschacht krschacht changed the title Cloudflare R2 file storage Add Cloudflare R2 file storage as an optional feature Jul 19, 2024
@krschacht krschacht merged commit 8c821e9 into main Jul 19, 2024
6 checks passed
@krschacht krschacht deleted the feature/cloudflare_r2_object_storage branch July 19, 2024 12:53
@jlvallelonga
Copy link
Contributor Author

@jlvallelonga I gave this one more read through before merging in. I didn't love the special-casing of this one feature being ENV only (and needing to explain that) so I figured out how to make Feature... work within application.rb. You can see it in this commit, if you're curious: 8873ede (#465)

While I was in there, I did a close read-through of all the optional features we cover within the README and I did copy editing of that to try and streamline it a bit. While doing this, I decided to remove any mention of the rails credentials file. We'll still support it, but I'll consider that an advanced thing for people who already know how rails credentials work. For everyone else, it's easiest and more standard to simply set ENV.

nice! definitely more clean that way

@krschacht
Copy link
Contributor

First week, first PR merged in 🍾 🎉 It’s always nice to get something full cycle. Now we’re off!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants