Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design backend for Datablock Storage #55554

Closed
4 of 6 tasks
snickell opened this issue Jan 3, 2024 · 6 comments
Closed
4 of 6 tasks

Design backend for Datablock Storage #55554

snickell opened this issue Jan 3, 2024 · 6 comments
Assignees
Labels
unfirebase https://github.com/orgs/code-dot-org/projects/4

Comments

@snickell
Copy link
Contributor

snickell commented Jan 3, 2024

Datablock Storage (#55084) is switching from a browser=>firebase connection for project storage to a browser=>rails=>mysql connection. We have tentatively determined a schema (see: #55344 (comment)) and measured its perf characteristics (#55189), now we need to design the rails backend.

  • Characterize async behavior of the existing Firebase backend
  • Determine acceptable load (particular invocations/s) for Rails backend
  • Decide batching size (if batching is possible within async semantics)
  • Write out & profile SQL queries for doing all required checks (rate limit, table count, etc) for each data block
  • Figure out how we want to implement de-duped tables, which also solves for "current table" (the live tables like Daily Weather, Spotify, and COVID caseload)
  • Design backend API
@snickell
Copy link
Contributor Author

snickell commented Jan 3, 2024

Performance and async characteristics of the existing Firebase blocks

  • Our target burst rate (existing firebase rate) is 20 inserts/s (technically: 300 inserts per 15s).
  • Exceeding this limit results in failure messages to the code.org debug console. It does not appear to alert programmatically (?).
  • A single call to the createRecord(table, record, callback) block takes 0.75s from invocation to callback.
    • We take the createRecord() callback to mean "the record is committed".
    • However, before it is committed, the new record is available instantly (measured <1ms) to a subsequent readRecords() call. Since this is lower than is possible with wire latency, we surmise that there's local buffering of data combined with optimistic updating.
    • Question: Is this local optimistic updating done by Firebase official JS libs or perhaps a local store we implement (redux or similar)?
    • Question: If the buffering is done by Firebase, do we need it? Will we break student code if we don't support synchronous insert?

Performance Characteristics of our MySQL DB

Implementation Ideas

If our target was 0.75s from "invocation of a createRecord to it existing in the DB", we could easily batch calls, reducing load on our rails backend. Strawman idea would be to batch into 0.5s or 1s batches. This would meet or exceed the existing latency until callback, and would take advantage of the improved performance for batching inserts (no need to re-lock every time, see below).

@snickell
Copy link
Contributor Author

snickell commented Jan 3, 2024

Auto-increment of RecordIDs is Tricky

Our existing record.ids are auto-incremented using a Firebase counters "table". MySQL innodb tables don't have support for auto-increment of one column in a composite key, at least not relative to the other keys (see, e.g., https://stackoverflow.com/questions/18120088/defining-composite-key-with-auto-increment-in-mysql).

  • We evaluated putting recordID as the first column, and making it AUTOINCREMENT, but this would be shared by ALL records in ALL Datablock Storage tables, and while it probably preserves the existing semantics, we think it makes for a bad student experience when your records start with huge IDs like 458849898.
  • Alternatively, we could lock the MAX(record_id) for a given channel_id and table_name, but this has race conditions with MySQL transaction isolation levels.

The problem with MAX is that if you have three inserts going (one starts, another starts and delays a hiccup, while a third starts) you can still get a duplicate ID condition. Since we always increase the ID when inserting, we realized you could use MIN as a sort of "table lock" (not SQL table, but Datablock Storage table) that is more stable.

There are probably some race conditions left with this (?) but under the common scenarios we thought of, this looks pretty good for manually incrementing ID for an insert:

set profiling=1;
BEGIN;
  SELECT MIN(record_id) FROM unfirebase.records WHERE channel_id='shared' AND table_name='words' LIMIT 1 FOR UPDATE;
  SELECT @id := IFNULL(MAX(record_id),0)+1 FROM unfirebase.records WHERE channel_id='shared' AND table_name='words';

  INSERT INTO unfirebase.records
     VALUES
     ('shared', 'words', @id, '{}');
COMMIT;
  
show profiles;
Profiles of the above approach look good

We profiled the above approach to locking before inserting, on a Datablock Storage table with 5000 rows (shared/words), we see overall durations of about a millisecond for the combined set of steps to lock and insert a row (duration column is in seconds).

# Query_ID, Duration, Query
'47', '0.00013625', 'BEGIN'
'48', '0.00029125', 'SELECT MIN(record_id) FROM unfirebase.records WHERE channel_id=\'shared\' AND table_name=\'words\' LIMIT 1 FOR UPDATE'
'49', '0.00027725', 'SELECT @id := IFNULL(MAX(record_id),0)+1 FROM unfirebase.records WHERE channel_id=\'shared\' AND table_name=\'words\'\nLIMIT 0, 2000'
'50', '20.00030250', 'DO SLEEP(20)'
'51', '0.00039750', 'INSERT INTO unfirebase.records\n     VALUES\n     (\'shared\', \'words\', @id, \'{}\')'
'52', '0.00017625', 'COMMIT'

Additionally we used SHOW ENGINE INNODB STATUS to look at the locks being created. Using MIN allows us to have only one lock. We were able to verify when a lock wasn't cleaned up, it only affected the one Datablock Storage table, the other rows were accessible.

@snickell
Copy link
Contributor Author

snickell commented Jan 3, 2024

It appears that Firebase RTDB (what we are using) does in fact do optimistic updating: https://firebase.google.com/docs/database/admin/save-data#section-writes-offline

There's very little reference to this, probably because its fundamental to Firebase's design.

@snickell
Copy link
Contributor Author

snickell commented Jan 4, 2024

Enforcing limits in the backend:

  • maxPropertySize:4096, value length of a key value pair, this is enforced in our schema because the Value column of the KeyValuePairs table is VARCHAR(4096).
  • maxRecordSize:4096, max length of a record, this needs to be enforced by the rails backend, because the mysql JSON type doesn't allow setting a max length
  • maxTableCount:10, this doesn't seem to be encorced in rules.bolt?, we'll enforce this on the backend by doing a SELECT COUNT(tableName) FROM Tables WHERE channelID=___
  • maxTableRows: 20000, number of records inside a table, SELECT COUNT(recordId) FROM Records WHERE channelID=___ AND table_name=____
  • maxKeySize: 768, this is a firebase limit for the full key path (like /v3/channels/shared). We don't test for this particular limit, and we keep lengths from being indefinited instead by setting table_name, key value pair key, etc to be VARCHAR(1024). We don't care how long JSON key names are inside a record (firebase would), and instead enforce overall maxRecordSize above.

@cnbrenci
Copy link
Contributor

cnbrenci commented Jan 5, 2024

Today, the data blocks do not use optimistic updating. e.g. If you call a set and a get directly after without waiting for the callback, it is nondeterministic whether the result from the get is the old or new value. So we are OK with that being the same in the new architecture. We will batch, but we won't worry about optimistic updating of local state.

@cnbrenci cnbrenci changed the title Design backend for projectDB Design backend for Datablock Storage Jan 17, 2024
@snickell snickell added the unfirebase https://github.com/orgs/code-dot-org/projects/4 label Feb 3, 2024
@snickell
Copy link
Contributor Author

snickell commented Feb 16, 2024

We've basically finished this item, since we've already executed against the schema design. Breaking out remaining small issues and closing:

  1. Consider batching create/update record requests in JS Consider batching create/update record requests in JS #56615
  2. Schema for library manifest Implement Library Manifest System #56472
  3. Rate limiting design Figure out Rate Limiting for Datasets on MySQL #55481

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unfirebase https://github.com/orgs/code-dot-org/projects/4
Projects
None yet
Development

No branches or pull requests

2 participants