-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design performant schema for formerly-Firebase data #55344
Comments
Row-per-record schema with table dedupe seems to give favorable query performance (of the queries we see in
Storing records as a JSON column gives us validity confidence. It turns out that MySQL parses JSON using JSON validation resulted in dropping only 450 out of 600 million (deduped) records. All seemed like attempts to do exotic/fun things with Unicode, and all but two records appeared to be from chat apps. The two other records appeared to be entries in a high score table for a game. Note that when validation fails, we're not dropping the entire table, just exact records that don't validate. |
Deduplication leaves us with an additional requirement for the schema: a way to indicate that a table is a "pointer" to a stock table. |
@cnbrenci and I discussed schema, here's where we got:
VARCHAR sizes: Firebase validation rules:
Stock/Shared tables are also stored in We propose dropping For data export, we can sign a "what channel ID should you have access to" token, and put that in the data export.
We may still have to add rate limiting to the MySQL schema, see #55481 |
We've changed the tableName and key limits from 768->700. We had to do this because we had to set the charset to utf8mb4 from utfmb3, and the max key length in mysql2 is 3072 bytes. At utf8mb3, the total bytes come out to 3x768=2304 which fits within the limit, but with utf8mb4 it's 4x768=3072 hits the limit. Given that the keys in firebase have a max size of 768 bytes and a prefix 69 characters which we are not storing in the key field in mysql, we can safely shorten the key with confidence that we'll still be able to migrate all the data into the smaller key size. |
Default unicode type for our DB is utf8, which in mysql is (weirdly) 3-byte. This doesn't permit emojis in KVP key names, or in table names. To match firebase we need to support full 4-byte UTF-8. #55344 (comment)
Default unicode type for our DB is utf8, which in mysql is (weirdly) 3-byte. This doesn't permit emojis in KVP key names, or in table names. To match firebase we need to support full 4-byte UTF-8. #55344 (comment) Co-authored-by: Cassi Brenci <cnbrenci@users.noreply.github.com>
Part of the firebase deprecation project: #55084
We have approximately 4 billion records currently in Firebase (by record we mean one row in a student project dataset), spread across millions of student data tables (contained in millions of applab projects).
Now that we have full data imports going (#55189) we can start performance testing possible schemas to store this data in MySQL.
The text was updated successfully, but these errors were encountered: