Write Session Data to Redis #57562

Hamms · 2024-03-26T22:17:27Z

Notably, however, do not yet begin relying on Redis for reading session data.

We want to get an accurate evaluation of what real-world usage of this new Redis cluster will look like, and also benefit from some load testing in advance of an actual switchover. To that end, this PR proposes that we begin writing all session data to Redis as a side effect of using the existing cookie-based store. This new functionality will be gated behind a DCDO flag and wrapped in error handling logic to minimize the chances of this experiment affecting the user experience.

Also remove the instrumented cookie store; we don't need to manually gather data for estimates now that we'll have an actual Redis cluster to monitor.

Links

Follow-up to #57558

Testing story

Tested the Redis cluster provisioned in production by SSHing into a frontend instance and executing some manual queries from the dashboard console:

[production] dashboard > client = Redis.new(url: "rediss://[redacted].cache.amazonaws.com:6379")
=> #<Redis client v3.3.3 for redis://[redacted].cache.amazonaws.com:6379/0>
[production] dashboard > client.get('foo')
=> nil
[production] dashboard > client.keys
=> []

Tested the functionality added in this PR by temporarily modifying cloud_formation_stack.yml.erb to provision a redis session storage cluster even in an adhoc environment, then verified on an adhoc that registration, authentication, and session destruction all work as expected, and that I can successfully find session data in the redis store.

Deployment strategy

We should plan to monitor Honeybadger and Zendesk after this DTPs, and have a revert prepped just in case.

After deployment, we should plan to set the DCDO flag mirror_session_in_redis_enabled to true so we can begin collecting data.

Follow-up work

Assuming that our new Redis cluster is able to handle the load generated by this mirroring, we can follow this up with an implementation that will begin using the Redis store rather than cookies as the source of truth.

PR Checklist:

Tests provide adequate coverage
Privacy and Security impacts have been assessed
Code is well-commented
New features are translatable or updates will not break translations
Relevant documentation has been added or updated
User impact is well-understood and desirable
Pull Request is labeled appropriately
Follow-up work items (including potential tech debt) are tracked and linked

The first step in our efforts to transfer user session data from cookies to a centralized data store is to provision the data store in Cloudformation, so we can begin writing data to it. We have quite a few options for how exactly we want to provision this, but broadly speaking we can choose either Serverless or On-Demand nodes, and if we choose On-Demand we also must choose both the size and quantity of nodes we want. According to Google Analytics, we've had 28M users over the last 90 days, and 84M over the last year, which works out to between 45 and 60 million users on average over an arbitrary 200-day period (200 days being our current session duration configuration). At ~300 bytes per session on average, that's 13.5-18 GB of data we'd ideally like to store. Storing 18GB of data in Severless ElastiCache would cost ~$1620/month; even 13GB would still be ~$1170/month. Alternatively, we could use three of the smallest memory-optimized cache node (cache.r7g.large) and get 39.21GB of storage for ~$480/month, or even just use two for 26.14GB of storage at ~$320/month. An even more targeted solution could be to use a larger number of smaller nodes; 6 cache.t4g.medium nodes would allow us to store 18.5GB for ~$280/month. Throughput is less of an issue for us than storage. According to CloudWatch, we average ~800M requests per week, 15M/hour, and under 4,200/second. At an average of 300 bytes stored per session and one GET plus one PUT per request, that's 2.5 Gigabits of needed network performance for on-demand nodes (well under the 12.5 promised by cache.r7g.large nodes), or ~$25/month in ECPUs for a Serverless model. This PR proposes we provision a single cluster consisting of three cache.r7g.large nodes, with encryption, automatic failover, and multi-zone availability all enabled. That should give us plenty of room for growth without too much unnecessary expense. The next step in our broader efforts to enable this feature is to begin writing data to redis so we can get a more accurate idea of actual usage, at which point we can revisit this decision.

Notably, however, do not yet begin relying on Redis for *reading* session data. We want to get an accurate evaluation of what real-world usage of this new Redis cluster will look like, and also benefit from some load testing in advance of an actual switchover. To that end, this PR proposes that we begin writing all session data to Redis as a side effect of using the existing cookie-based store. This new functionality will be gated behind a DCDO flag and wrapped in error handling logic to minimize the chances of this experiment affecting the user experience. Follow-up to #57558

…inspecting the Redis cluster to which we are mirroring data

…s-based-session-store

This reverts commit b3a7370.

…//')

… testing

…, for testing

Hamms · 2024-04-03T00:28:39Z

aws/cloudformation/bootstrap_chef_stack.sh.erb

@@ -113,7 +113,7 @@ cat <<JSON > $FIRST_BOOT
  },
  "cdo-secrets": {
 <% if frontends -%>
-    "session_store_server": "redis://${SessionStoreGroup.PrimaryEndPoint.Address}:${SessionStoreGroup.PrimaryEndPoint.Port}",
+    "session_store_server": "rediss://${SessionStoreGroup.PrimaryEndPoint.Address}:${SessionStoreGroup.PrimaryEndPoint.Port}",


Note that because we have enabled encryption for the underlying Redis cluster, we have to specifically use a "redis secure" connection here, hence rediss rather than redis.

I very briefly tried to find a way to automatically determine whether or not this is necessary based on the CloudFormation output, but couldn't. I decided to just quickly hardcode it instead; please let me know if you think it would be worth me spending a bit more time on this!

sureshc · 2024-04-04T18:38:04Z

We don't yet have a standard way to take the Physical ID of a Resource provisioned in CloudFormation and pass that configuration value to our Ruby web application servers.

Specifically, we don't have an easy way to populate CDO.session_store_server from the SessionStore CloudFormation Resource. Our usage of SessionStoreGroup.PrimaryEndPoint.Address and SessionStoreGroup.PrimaryEndPoint.Port in this feature branch only passes those values to the Chef cookbooks that execute when the daemon instance is provisioned. Those values don't get persisted to that server nor are they ever passed to web application servers or the console.

The very janky workaround I've come up with is to publish an AWS Stack-specific Secret in the template and then use !StackSecret to populate CDO.session_store_server

cat5inthecradle

LGTM

Would there be value in reading the session from Redis in order to evaluate that half of usage too? Or is it pretty much 1:1 with writes?

Hamms · 2024-04-04T20:28:03Z

Specifically, we don't have an easy way to populate CDO.session_store_server from CloudFormation Logical IDs and Attributes SessionStoreGroup.PrimaryEndPoint.Address and SessionStoreGroup.PrimaryEndPoint.Port

Suresh, can you clarify how the changes to aws/cloudformation/bootstrap_chef_stack.sh.erb are insufficient? Note that the old version of this value has already been populated on frontends:

[production] dashboard > CDO.session_store_server
=> "redis://[redacted].cache.amazonaws.com:6379"

Is it just a problem with updating the value once it's been initially set? My expectation is that whenever we spin up a new frontend, it will pick up these changes.

Would there be value in reading the session from Redis in order to evaluate that half of usage too? Or is it pretty much 1:1 with writes?

In my local testing, it seems like session reads and writes are in fact exactly 1:1; every single request incurs exactly one write and one read. I'm certainly not opposed to adding some logic to also read just for thoroughness, but I think just doubling the number of operations generated as a result of this change should accurately capture the number of operations we'll need for full usage.

cat5inthecradle · 2024-04-05T18:48:44Z

We don't yet have a standard way to take the Physical ID of a Resource provisioned in CloudFormation and pass that configuration value to our Ruby web application servers.

Specifically, we don't have an easy way to populate CDO.session_store_server from the SessionStore CloudFormation Resource. Our usage of SessionStoreGroup.PrimaryEndPoint.Address and SessionStoreGroup.PrimaryEndPoint.Port in this feature branch only passes those values to the Chef cookbooks that execute when the daemon instance is provisioned. Those values don't get persisted to that server nor are they ever passed to web application servers or the console.

The very janky workaround I've come up with is to publish an AWS Stack-specific Secret in the template and then use !StackSecret to populate CDO.session_store_server

session_store_server should get set correctly on frontends with this change - the bootstrap script runs whenever we wake up AMI Builder.

Hamms · 2024-04-05T19:44:49Z

I'll also plan to resolve https://app.honeybadger.io/projects/3240/faults/106697134 once this PR is merged, since it removes the relevant code.

unlox775-code-dot-org

Looks good!

Follow-up to #57562, which started writing session data to Redis with the goal of giving us a chance to analyze real-world usage in advance of a switch to actually start using Redis as the source of truth. For the most part, that's working, but there are currently a couple of issues with that implementation which both affect the total amount of data being stored, and render that information inaccurate. First, we aren't deleting old sessions from Redis, so aren't capturing the extent to which users logging out will reduce total data storage. To resolve that, this PR simply extends the [`delete_session`](https://github.com/rails/rails/blob/v6.1.7.7/actionpack/lib/action_dispatch/middleware/session/cookie_store.rb#L63-L68) method to begin mirroring deletions in Redis the same as we mirror writes. Second, we're actually writing more data than we need to; I mistakenly thought we were going to need to manually serialize and deserialize the Ruby Hashes that our session data is stored in, and chose to do so with `Marshal`, which is much less space-efficient than simply passing the Hash directly to Redis to store. In my local testing, a Hash for a freshly-logged-in session is 168 bytes in memory, but the `Marshal` serialization takes 424 bytes. Simplifying the write logic and enabling deletion logic should improve the accuracy of our metrics.

Hamms added 2 commits March 26, 2024 14:53

Hamms mentioned this pull request Mar 26, 2024

Provision new Redis-based Session Store #57558

Merged

8 tasks

Hamms added 6 commits March 27, 2024 13:06

remove instrumented cookie store; we will instead get useful data by …

34d0bf3

…inspecting the Redis cluster to which we are mirroring data

Merge branch 'staging' into write-to-redis-based-session-store

fc73381

Merge branch 'staging' into provision-redis-based-session-store

26ebd6a

Merge branch 'provision-redis-based-session-store' into write-to-redi…

a1de0f6

…s-based-session-store

also update some references to STDIN

b3a7370

Revert "also update some references to STDIN"

79eb670

This reverts commit b3a7370.

Base automatically changed from provision-redis-based-session-store to staging April 2, 2024 18:58

Hamms added 6 commits April 2, 2024 15:01

Merge branch 'staging' into write-to-redis-based-session-store

9d7ad81

use secure redis connection ('rediss://') instead of regular ('redis:…

c928a8c

…//')

temporarily provision session store in non-frontend environments, for…

3f62eac

… testing

also temporarily configure session store in non-frontend envrionments…

70abac1

…, for testing

remove duplicate declaration

2f9518a

return session store to frontends-only

1e223a5

Hamms commented Apr 3, 2024

View reviewed changes

add TODO and use versioned rather than commit-based GH link

5529f68

Hamms marked this pull request as ready for review April 3, 2024 00:34

Hamms requested a review from a team as a code owner April 3, 2024 00:34

Merge branch 'staging' into write-to-redis-based-session-store

11982f2

cat5inthecradle approved these changes Apr 4, 2024

View reviewed changes

unlox775-code-dot-org approved these changes Apr 8, 2024

View reviewed changes

Hamms merged commit 9e3fdab into staging Apr 9, 2024
2 checks passed

Hamms deleted the write-to-redis-based-session-store branch April 9, 2024 19:22

Hamms mentioned this pull request Apr 11, 2024

Mirror Session Deletion to Redis Store #57972

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write Session Data to Redis #57562

Write Session Data to Redis #57562

Hamms commented Mar 26, 2024 •

edited

Hamms Apr 3, 2024 •

edited

sureshc commented Apr 4, 2024 •

edited

cat5inthecradle left a comment

Hamms commented Apr 4, 2024

cat5inthecradle commented Apr 5, 2024

Hamms commented Apr 5, 2024

unlox775-code-dot-org left a comment

Write Session Data to Redis #57562

Write Session Data to Redis #57562

Conversation

Hamms commented Mar 26, 2024 • edited

Links

Testing story

Deployment strategy

Follow-up work

PR Checklist:

Hamms Apr 3, 2024 • edited

Choose a reason for hiding this comment

sureshc commented Apr 4, 2024 • edited

cat5inthecradle left a comment

Choose a reason for hiding this comment

Hamms commented Apr 4, 2024

cat5inthecradle commented Apr 5, 2024

Hamms commented Apr 5, 2024

unlox775-code-dot-org left a comment

Choose a reason for hiding this comment

Hamms commented Mar 26, 2024 •

edited

Hamms Apr 3, 2024 •

edited

sureshc commented Apr 4, 2024 •

edited