New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write Session Data to Redis #57562
Write Session Data to Redis #57562
Conversation
The first step in our efforts to transfer user session data from cookies to a centralized data store is to provision the data store in Cloudformation, so we can begin writing data to it. We have quite a few options for how exactly we want to provision this, but broadly speaking we can choose either Serverless or On-Demand nodes, and if we choose On-Demand we also must choose both the size and quantity of nodes we want. According to Google Analytics, we've had 28M users over the last 90 days, and 84M over the last year, which works out to between 45 and 60 million users on average over an arbitrary 200-day period (200 days being our current session duration configuration). At ~300 bytes per session on average, that's 13.5-18 GB of data we'd ideally like to store. Storing 18GB of data in Severless ElastiCache would cost ~$1620/month; even 13GB would still be ~$1170/month. Alternatively, we could use three of the smallest memory-optimized cache node (cache.r7g.large) and get 39.21GB of storage for ~$480/month, or even just use two for 26.14GB of storage at ~$320/month. An even more targeted solution could be to use a larger number of smaller nodes; 6 cache.t4g.medium nodes would allow us to store 18.5GB for ~$280/month. Throughput is less of an issue for us than storage. According to CloudWatch, we average ~800M requests per week, 15M/hour, and under 4,200/second. At an average of 300 bytes stored per session and one GET plus one PUT per request, that's 2.5 Gigabits of needed network performance for on-demand nodes (well under the 12.5 promised by cache.r7g.large nodes), or ~$25/month in ECPUs for a Serverless model. This PR proposes we provision a single cluster consisting of three cache.r7g.large nodes, with encryption, automatic failover, and multi-zone availability all enabled. That should give us plenty of room for growth without too much unnecessary expense. The next step in our broader efforts to enable this feature is to begin writing data to redis so we can get a more accurate idea of actual usage, at which point we can revisit this decision.
Notably, however, do not yet begin relying on Redis for *reading* session data. We want to get an accurate evaluation of what real-world usage of this new Redis cluster will look like, and also benefit from some load testing in advance of an actual switchover. To that end, this PR proposes that we begin writing all session data to Redis as a side effect of using the existing cookie-based store. This new functionality will be gated behind a DCDO flag and wrapped in error handling logic to minimize the chances of this experiment affecting the user experience. Follow-up to #57558
…inspecting the Redis cluster to which we are mirroring data
…s-based-session-store
This reverts commit b3a7370.
@@ -113,7 +113,7 @@ cat <<JSON > $FIRST_BOOT | |||
}, | |||
"cdo-secrets": { | |||
<% if frontends -%> | |||
"session_store_server": "redis://${SessionStoreGroup.PrimaryEndPoint.Address}:${SessionStoreGroup.PrimaryEndPoint.Port}", | |||
"session_store_server": "rediss://${SessionStoreGroup.PrimaryEndPoint.Address}:${SessionStoreGroup.PrimaryEndPoint.Port}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that because we have enabled encryption for the underlying Redis cluster, we have to specifically use a "redis secure" connection here, hence rediss
rather than redis
.
I very briefly tried to find a way to automatically determine whether or not this is necessary based on the CloudFormation output, but couldn't. I decided to just quickly hardcode it instead; please let me know if you think it would be worth me spending a bit more time on this!
We don't yet have a standard way to take the Physical ID of a Resource provisioned in CloudFormation and pass that configuration value to our Ruby web application servers. Specifically, we don't have an easy way to populate The very janky workaround I've come up with is to publish an AWS Stack-specific Secret in the template and then use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Would there be value in reading the session from Redis in order to evaluate that half of usage too? Or is it pretty much 1:1 with writes?
Suresh, can you clarify how the changes to
Is it just a problem with updating the value once it's been initially set? My expectation is that whenever we spin up a new frontend, it will pick up these changes.
In my local testing, it seems like session reads and writes are in fact exactly 1:1; every single request incurs exactly one write and one read. I'm certainly not opposed to adding some logic to also read just for thoroughness, but I think just doubling the number of operations generated as a result of this change should accurately capture the number of operations we'll need for full usage. |
|
I'll also plan to resolve https://app.honeybadger.io/projects/3240/faults/106697134 once this PR is merged, since it removes the relevant code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Follow-up to #57562, which started writing session data to Redis with the goal of giving us a chance to analyze real-world usage in advance of a switch to actually start using Redis as the source of truth. For the most part, that's working, but there are currently a couple of issues with that implementation which both affect the total amount of data being stored, and render that information inaccurate. First, we aren't deleting old sessions from Redis, so aren't capturing the extent to which users logging out will reduce total data storage. To resolve that, this PR simply extends the [`delete_session`](https://github.com/rails/rails/blob/v6.1.7.7/actionpack/lib/action_dispatch/middleware/session/cookie_store.rb#L63-L68) method to begin mirroring deletions in Redis the same as we mirror writes. Second, we're actually writing more data than we need to; I mistakenly thought we were going to need to manually serialize and deserialize the Ruby Hashes that our session data is stored in, and chose to do so with `Marshal`, which is much less space-efficient than simply passing the Hash directly to Redis to store. In my local testing, a Hash for a freshly-logged-in session is 168 bytes in memory, but the `Marshal` serialization takes 424 bytes. Simplifying the write logic and enabling deletion logic should improve the accuracy of our metrics.
Follow-up to #57562, which started writing session data to Redis with the goal of giving us a chance to analyze real-world usage in advance of a switch to actually start using Redis as the source of truth. For the most part, that's working, but there are currently a couple of issues with that implementation which both affect the total amount of data being stored, and render that information inaccurate. First, we aren't deleting old sessions from Redis, so aren't capturing the extent to which users logging out will reduce total data storage. To resolve that, this PR simply extends the [`delete_session`](https://github.com/rails/rails/blob/v6.1.7.7/actionpack/lib/action_dispatch/middleware/session/cookie_store.rb#L63-L68) method to begin mirroring deletions in Redis the same as we mirror writes. Second, we're actually writing more data than we need to; I mistakenly thought we were going to need to manually serialize and deserialize the Ruby Hashes that our session data is stored in, and chose to do so with `Marshal`, which is much less space-efficient than simply passing the Hash directly to Redis to store. In my local testing, a Hash for a freshly-logged-in session is 168 bytes in memory, but the `Marshal` serialization takes 424 bytes. Simplifying the write logic and enabling deletion logic should improve the accuracy of our metrics.
Notably, however, do not yet begin relying on Redis for reading session data.
We want to get an accurate evaluation of what real-world usage of this new Redis cluster will look like, and also benefit from some load testing in advance of an actual switchover. To that end, this PR proposes that we begin writing all session data to Redis as a side effect of using the existing cookie-based store. This new functionality will be gated behind a DCDO flag and wrapped in error handling logic to minimize the chances of this experiment affecting the user experience.
Also remove the instrumented cookie store; we don't need to manually gather data for estimates now that we'll have an actual Redis cluster to monitor.
Links
Follow-up to #57558
Testing story
Tested the Redis cluster provisioned in production by SSHing into a frontend instance and executing some manual queries from the dashboard console:
Tested the functionality added in this PR by temporarily modifying
cloud_formation_stack.yml.erb
to provision a redis session storage cluster even in an adhoc environment, then verified on an adhoc that registration, authentication, and session destruction all work as expected, and that I can successfully find session data in the redis store.Deployment strategy
We should plan to monitor Honeybadger and Zendesk after this DTPs, and have a revert prepped just in case.
After deployment, we should plan to set the DCDO flag
mirror_session_in_redis_enabled
totrue
so we can begin collecting data.Follow-up work
Assuming that our new Redis cluster is able to handle the load generated by this mirroring, we can follow this up with an implementation that will begin using the Redis store rather than cookies as the source of truth.
PR Checklist: