Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to modify string encoding #6490

Closed
vincent-mirian-google opened this issue Dec 13, 2022 · 6 comments
Closed

Unable to modify string encoding #6490

vincent-mirian-google opened this issue Dec 13, 2022 · 6 comments

Comments

@vincent-mirian-google
Copy link

vincent-mirian-google commented Dec 13, 2022

When setting the global encoding to ascii as seen in the following piece of code:

static void string_values() {
    config cfg;
    cfg.set("unicode", false);
    context c;
    c.set("encoding", "ascii");
    std::cout << "string_values\n";
    expr s = c.string_val("abc\n\n\0\0", 7);
    std::cout << s << "\n";
    std::string s1 = s.get_string();
    std::cout << s1 << "\n";
}

the output is shown in unicode:

string_values
"abc\u{a}\u{a}\u{0}\u{0}"
abc\u{a}\u{a}\u{0}\u{0}

Through a series of tests, it is concluded that the encoding parameter is not set to "ascii". What is the manner to set the encoding to "ascii"?

Note that in earlier versions of z3, the default encoding was ascii.

@vincent-mirian-google
Copy link
Author

vincent-mirian-google commented Dec 13, 2022

Also, note that, in the block that presumably handles ascii, the strm variable is not assigned.

@NikolajBjorner
Copy link
Contributor

the escape characters are just \u{}. I don't have different escape characters for ascii. The character range for ascii is still 256.
There is no global parameter cfg.set("unicode", false);
If you use cfg, then pass it to the constructor for context.

void main() {
    config cfg;
    cfg.set("encoding", "ascii");
    context c(cfg);
    std::cout << "string_values\n";
    expr s = c.string_val("abc\n\n\0\0", 7);
    std::cout << s << "\n";
    std::string s1 = s.get_string();
    std::cout << s1 << "\n";
}

@vincent-mirian-google
Copy link
Author

vincent-mirian-google commented Dec 14, 2022

Thank you for your reply @NikolajBjorner .

When using either:

    config cfg;
    cfg.set("encoding", "ascii");
    context c(cfg);

or

    context c;
    c.set("encoding", "ascii");

and adding the following to zstring::encode():

    if (get_encoding() == string_encoding::ascii) {
        return "ASCII";
    }
    if (get_encoding() == string_encoding::unicode) {
        return "UNICODE";
    }

the return is always "UNICODE".

Also, in previous versions of Z3, a std::string(1, 7) encoded in ascii returned a single byte. In the current version, it returns five bytes: four for the escape sequence and one byte for the character.

@vincent-mirian-google
Copy link
Author

For reference, in previous versions of Z3, ascii escape sequence used byte escapes literals in the format \x%%, where '%' are hexadecimal digit. For example, 7 would have the following escape sequence \x07.

@NikolajBjorner
Copy link
Contributor

  1. There isn't a standard for ASCII at this point. The behavior (how escape strings are printed) can be bent according to desires. The ASCII version was home made and before a standard for unicode. z3 also lets you use 16 byte characters for Java use cases.

  2. I will have to check what is going on with the zstring::encode() issue. Sounds like you are hitting some inconsistencies.

@NikolajBjorner
Copy link
Contributor

The bug with not setting ascii over programmatic parameter updates is fixed.
The bug must have slipped as tt was previously only really tested over the command-line.

The format of escape characters has changed. I removed support for the old escape characters to avoid supporting a mixture of escape conventions. The convention of using \u{XYZ} could appear awkward and different from mainstream conventions, but as far as I can tell can be used independently of the character encoding (unless you choose some character range where the printable ascii and escape sequences themselves cannot be expressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants