-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to clean up invalid BinarySchema #10823
Comments
It is not invalid. If a schema was used once, it has to be stored to deserialize that object later. You can try to minimize the number of unique schemas - for example, ensure consistent field order. Currently, |
@ptupitsyn Thanks for your reply and suggestion! By analyzing BinaryObjectBuilderImpl#serializeTo, I found a little rule. Writing fields out of order or updating some fields of an existing record will not create a new BinarySchema, but whenever we add a new field to a record, a new Schema will be created. Suppose there is such a scenario, first we create a new record with field A, and the schemaId of the record is 1 at this time. I solved this problem by a way. When writing the record for the first time, writing null to all non-existing fields creates a unique BinarySchema. But this solution wastes some extra memory, because null values occupy 2 bytes after serialization, and probably most records only need to write 200 fields out of 1000 fields.. |
Thanks for posting the solution. Yes, there is a trade-off - create more schemas, or waste some space for nulls. Can you also describe your use case a little bit please? 1000 fields handled dynamically is somewhat unusual to see. |
@ptupitsyn I agree with you. There is a trade-off here. We try to store the latest values of properties and telemetry of devices in IOT scenarios. Each record is a device, and they generally have dozens to hundreds of fields. Even for a cache with only three fields, there may be 7 BinarySchemas at the beginning, such as 1, 2, 3, 12, 13, 23, 123. In the end, only one or two of them may exist after updating. I have 2 plain ideas.
The second is to periodically check each cache and delete those BinarySchema that are not referenced by any object. It looks like it's a bit complicated, just a proposal. But if we implement this maybe we can get better sparse storage and flexibility. So We can support schemaless better. |
I created a cache like this IgniteCache<String, BinaryObject>. In order to do upsert, I used EntryProcessor as follows.
EntryProcessor
In this cache, I wrote 20,000 records, each record has 1 to 1,000 fields. I found that the ignite service has a lot of full gc, and even crashes directly. Through heap dump analysis, I found that BinarySchema occupies a lot of memory.
heap
Eventually I figured out that the problem was when I was randomly writing to some fields of a record every time.
when I write a record for the first time, a BinarySchema will be created.
The next time I update this record and write one more field, a new BinarySchema will be created and written to ./work/db/ binary_meta/, the old schema will not be cleared.
In the end, tens of thousands of BinarySchemas will be created, but there are only dozens of BinarySchemas that are actually valid in serialized storage.
The following are some places I found to store BinarySchema, is there a way to clean up these BinarySchema?
The text was updated successfully, but these errors were encountered: