Skip to content

fix: Return GCS signed URL instead of direct URL #2143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

markperfect
Copy link
Contributor

The Chainlit data layer stores files in Google Cloud Storage. However, an error occurs when a user attempts to access those files from a private GCS bucket.

The issue stems from the sync_upload_file method, which returns the direct public URL f"https://storage.googleapis.com/{self.bucket.name}/{object_key}". A direct URL will return a 403 Forbidden error for non-publicly accessible buckets.

  1. This PR modifies the sync_upload_file method to return a signed URL instead of a direct URL. Now, the application consistently uses authenticated signed URLs throughout, which will work with private buckets.
  2. This PR creates unit tests for GCS.py, which didn't have any previously.

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. backend Pertains to the Python backend. security labels May 1, 2025
@willydouhard
Copy link
Collaborator

The signed URL is supposed to be short lived. That's why we store the object key and sign URLs on the fly when we get a thread for instance.

@markperfect
Copy link
Contributor Author

Yes, exactly. We want a short-lived URL for anything that is private for security purposes.

This PR only updates the sync_upload_file method to return a signed URL immediately after upload, so that initial access (e.g., right after upload) works even for private buckets. For all subsequent access, we are still storing the object key and generating signed URLs on the fly, as before. This change prevents 403 errors during the initial file upload flow, as seen here:

image

Copy link
Contributor

@hayescode hayescode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a lot of unrelated changes here. Also GCS has existed for a year and private access is common so I wonder how anybody has this working before?

@markperfect
Copy link
Contributor Author

markperfect commented Jul 1, 2025

The Chainlit codebase and LLMs in general are an area of active development. A lot of work is still needed to make Chainlit robust in production, as demonstrated by the lack of comprehensive test coverage.

Since there were only a couple of active maintainers on the repo, many features have been neglected due to prioritization and capacity. Now that we have additional maintainers, we can hopefully speed up the development cycle for the package that we collectively value.

From what I can tell, the Chainlit data layer was released 6 months ago, and there haven’t been many major contributions since then. My intuition also tells me that GCS is less likely to be used than either AWS or Azure

@markperfect markperfect requested a review from hayescode July 2, 2025 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Pertains to the Python backend. security size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants