Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect literal Character encoding/decoding in SistaV1 bytecode set #618

Closed
nicolas-cellier-aka-nice opened this issue Mar 9, 2022 · 5 comments

Comments

@nicolas-cellier-aka-nice
Copy link
Contributor

As reported by Marcel on Squeak-dev, this snippet:

Compiler evaluate: (String with: $$ with: (Character value: 16r8000)).

will return a Character with a negative value (-32768).

This is because character literal in the range 16r100-16rFFFF are encoded with an extended B bytecode, and the extended B is interpreted as signed value.

I suggested

  • a workaround : using (extB bitAnd: 16rFF) << 8 to reconstruct an unsigned Character value
  • a fix: using extA instead of extB so as to have it interpreted unsigned

See https://source.squeak.org/VMMakerInbox/VMMaker.oscog-nice.3174.diff for the former solution.

The later solution require more changes and a careful recompilation of CompiledMethods at image side.

@nicolas-cellier-aka-nice
Copy link
Contributor Author

nicolas-cellier-aka-nice commented Mar 9, 2022

Note that restricting EncoderForSistaV1>>genPushCharacter: in the range 0-16r7FFF is a safe image-side workaround until the VM is fixed.

See https://source.squeak.org/trunk/Compiler-nice.471.diff

@eliotmiranda
Copy link
Contributor

I think both image and VM have to be fixed. This was a slip by me; apologies. Clearly, the bytecode is wrongly specified. Instead of taking the signed extension B it should take the unsigned extension A. This was a copy-paste error copying the push integer bytecode. So instead of
233 11101001 i i i i i i i i Push Character #iiiiiiii (+ Extend B * 256)
the bytecode should be
233 11101001 i i i i i i i i Push Character #iiiiiiii (+ Extend A * 256).
I will make both of these changes.

@eliotmiranda
Copy link
Contributor

I think it's safe to just make the change. The extended form isn't in use in a VMMaker image (trunk + a lot of other stuff) so it won't affect most people. Marcel will be affected but he can update to a new VM (right Marcel)?

Here's code that checks:
self sn browseAllSelect: [:m| m scanFor: [:a :b :c| a = 16rE1 and: [c = 16rE9]]]

Note that the change is unfortunate. Because the existing VM doesn't clear extA afterwards setting extA can have disastrous effects in the old VM. e.g. if followed by a block creation bytecode then that bytecode will consume the extension not cleared by the push character bytecode. But as I said it is unlikely to bite anyone. Fingers crossed.

Hacking the existing spec to use a sign extension is less harmful in the short term, but more harmful in the long term.

@eliotmiranda
Copy link
Contributor

I shall make the new VM take either extA or extB and clear them both, at least temporarily. We can then change the image code to use extA, and hope that no-one is affected, because Marcel, you can upgrade to a new VM and Nicolas, your fix to avoid the use of the bytecode will save us for now.

@krono
Copy link
Member

krono commented Mar 11, 2022

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants