Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look at whether suspend data can be made more efficient #1080

Open
christianp opened this issue Feb 9, 2024 · 4 comments
Open

Look at whether suspend data can be made more efficient #1080

christianp opened this issue Feb 9, 2024 · 4 comments
Labels
Difficult! Issues that will take a long time to implement, or involve big breaking changes Needs thinking about

Comments

@christianp
Copy link
Member

Somebody encountered SCORM's limit of 64,000 characters on the suspend data. The LTI provider doesn't enforce this, but generic SCORM players do.

We should try to make the suspend data as small as possible, so that as many exams as possible will work. Obviously there has to be an upper limit on the size of an exam - an exam with 64,001 parts could never fit in 64,000 characters, as a simple upper bound.

At the moment, lots of values are included even if they have the default value. We could try only including keys if they have a non-default value.

Going even further, a lot of space is taken up by the names of object keys. If we use something other than JSON, then we could either assume we know the shape of the data and omit the keys entirely, or give a structure definition at the start.

@christianp christianp added Needs thinking about Difficult! Issues that will take a long time to implement, or involve big breaking changes labels Feb 9, 2024
christianp added a commit that referenced this issue Feb 14, 2024
This commit adds a method `Numbas.storage.remove_default_keys`, which
removes keys from suspend data objects when they have the default value.
There's also some logic about not including list properties when the
list is empty.

I think this cuts the size of suspend data by about half.

see #1080
@christianp
Copy link
Member Author

I wrote some code to chuck out keys from suspend data objects when they have default values. That seems to have cut the size of the suspend data roughly in half, since most questions don't use most features.

However, running the JSON through gzip and then base64-encoding it further cut down the data to between 10% and 20% of the original. This would save a lot of space, at the cost of not being able to read the suspend data directly.

@christianp
Copy link
Member Author

The problem is the CompressionStream is recent, and only at about 80% support at the moment.
I tried using lz-string.js, but base64 -encoding only halves the size, and I'm not sure UTF-16 encoding is safe.

@billy-woods
Copy link

Hi Christian. It was me that ran into this issue. I certainly don't have any big fixes, but I do have a number of small questions and/or thoughts that might help shrink the suspend data a meaningful amount, at least for the sorts of questions I code, and probably for others too:

  1. Does the scorm suspend data need to contain "auxiliary" variables that aren't randomised? e.g. if a = random(1..10), and b = 5*a^2, and c = 12, can you get away without storing c, and possibly not even storing b?
  2. Similar to the above: does the scorm suspend data need to contain the whole of the advice section for each question? This can take up a huge amount of space, and is presumably just a large string with no inherent randomisation.
  3. I'm still getting integers stored in some strange ways: "factorsa":"[ imprecise(2), imprecise(2), imprecise(2), imprecise(2), imprecise(2) ]" (though not very often)
  4. Here's one example (from a single question!) where just trimming the spaces and changing true/false to 1/0 would save over 50% of the space: [{"exec_path":"","studentAnswer":"[ [ [ true ], [ false ], [ false ] ], [ [ false ], [ false ], [ true ] ], [ [ false ], [ true ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ false ], [ false ], [ true ] ], [ [ true ], [ false ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ false ], [ true ] ] ]","results":[]},{"exec_path":"","studentAnswer":"[ [ [ true ], [ false ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ false ], [ false ], [ true ] ], [ [ true ], [ false ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ false ], [ true ] ] ]","results":[]},{"exec_path":"","studentAnswer":"[ [ [ false ], [ false ], [ true ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ false ], [ false ], [ true ] ], [ [ false ], [ true ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ false ], [ true ] ], [ [ false ], [ false ], [ true ] ], [ [ true ], [ false ], [ false ] ] ]","results":[]},{"exec_path":"","studentAnswer":"[ [ [ false ], [ true ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ true ], [ false ], [ false ] ] ]","results":[]}]
    In fact, the word false appears over 300 times in the suspend data for this question (mostly not as part of student answers!), and if that could be changed to 0, it would save 1200 characters.

@christianp
Copy link
Member Author

  1. It should already be the case that only variables which are sources of randomisation are saved. If you have an example of that not happening, please show me.
  2. The advice text isn't saved in the suspend data. What do you mean?
  3. Can you show me a question that does this?
  4. JME is strongly typed, so true is not identical to 1. We could certainly look at representing the answers to multiple choice questions in a specialised format, rather than just the JME representation which is currently used. But I think that entire list can be omitted: it's the cache of pre-submit results, which includes the student's answer, but there are no results. Can you give me a link to the question made this, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Difficult! Issues that will take a long time to implement, or involve big breaking changes Needs thinking about
Projects
None yet
Development

No branches or pull requests

2 participants