Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String escaping #792

Closed
mahesh-hegde opened this issue Aug 26, 2022 · 18 comments · Fixed by #825
Closed

String escaping #792

mahesh-hegde opened this issue Aug 26, 2022 · 18 comments · Fixed by #825
Assignees
Labels
package:jni package:jnigen type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@mahesh-hegde
Copy link
Contributor

mahesh-hegde commented Aug 26, 2022

Currently we are pasting integer, float and String constants in java classes into dart classes directly. There are few edge cases though, where some non-escapable characters get included.

  1. Escape string on java side: I should pull apache commons text for this one purpose, String escape doesn't seem to be in standard library.

  2. Escape in dart: same story here. There's a strings package but that seems to be discontinued.

Ideally there should be something like https://pkg.go.dev/strconv#Quote

Otherwise should code this logic manually.

@mahesh-hegde
Copy link
Contributor Author

We can think this issue from other direction.

Currently, static fields are not binded through FFI, instead they are translated as dart const static fields. That's why this issue arises.

But with strings, we probably want a getter, because it's rare to use java string literal value in dart (by my experience). But when we pass such static string field to Java code, we need to convert it to Java string.

So maybe it's better to leave it as a getter and not convert to direct literal.

cc: @dcharkes .

@dcharkes
Copy link
Collaborator

Good observation. Let's see what is most common in practise. If both are common, we can also generate both. (The Dart consts will be tree-shaken anyway in AOT.)

@dcharkes
Copy link
Collaborator

One benefit of having them in Dart is easier reasoning about your code when writing code in Dart. E.g. doing a switch or equality check on something. When in the debugger you can see the string contents.

@HosseinYousefi
Copy link
Member

We can generate the strings as raw strings like r"content". So we won't need to have any escaping logic.

@mahesh-hegde
Copy link
Contributor Author

I am afraid it may not cover all cases. For example, java string contains escaped character like \n or \r.

@HosseinYousefi
Copy link
Member

I am afraid it may not cover all cases. For example, java string contains escaped character like \n or \r.

r"\n" is basically "\\n", right? What would be the issue here?

@mahesh-hegde
Copy link
Contributor Author

mahesh-hegde commented Mar 7, 2023

Has been a while since I dealt with that code. IIRC

let's say you have static final String NEWLINE = "\n"

java parser gives you string with actual newline character, that is { '\n' }

you have got to escape this, either in java or in dart. I can't find an equivalent of say Go's strconv.Quote, which does this escaping properly.

There may be a way to get literal string on Java parser side, but I think I couldnt find one, or there were edge cases where Java escapes differently than dart which may be problematic again. My memory about this issue is a bit fuzzy. I think we even found a pub package offering this Quote functionality but that package was abandoned.

@mahesh-hegde
Copy link
Contributor Author

Here's an example.

https://github.com/dart-lang/jnigen/blob/e492c8f486171aebc67def75d54f3a02390c94de/jnigen/example/pdfbox_plugin/jnigen.yaml#L58

If you google the javadoc for excluded field, you get the string literal \", which java parser turns into ". It passes through JSON library which has handling for all these cases, we get " in dart, we try to wrap it in quotes. But that's an invalid dart string literal.

If you use single quotes to wrap strings, you will get same issue for a string \'.

So the plausible solutions are:

  1. Implement escaping ourselves

  2. expose static final string fields like any other java field, i.e through runtime field getter.

with trade offs discussed above.

@HosseinYousefi
Copy link
Member

you have got to escape this, either in java or in dart. I can't find an equivalent of say Go's strconv.Quote, which does this escaping properly.

Let me see if I understand this correctly. So Java Parser parses the actual value of the String, for example \" turns into " and \n turns into a single new line character (so nothing printable).

Our task here is to undo this parsing method, which is to turn " back to \" and turn the new line character into \n, correct?

Now if we use something like r''' content ''' this will take care of almost all the culprits, with the exception of ''' itself, everything should work.


To combat the problem with the different quotes we can split the string into multiple sections after each type of quote begins for example (turning ' to s and " to d for easier reading experience):

"ss ddd d dd sss ss s dd dd"

gets split into:

"ss "
"ddd d dd "
"sss ss s "
"dd dd"

We can write the correct raw string for each one of the splits:

r"""ss """
r'''ddd d dd '''
r"""sss ss s """
r'''dd dd'''

@mahesh-hegde
Copy link
Contributor Author

That would work for strings with quotes only, but literally every escaped character will be in its binary form. (consider non printable characters such as control+[a-z] characters, maybe even tab key). You will have to escape them because you can't store them reliably in text file.

Second, will a raw string with complicated construction show up properly on hover / documentation?

@HosseinYousefi
Copy link
Member

That would work for strings with quotes only, but literally every escaped character will be in its binary form. (consider non printable characters such as control+[a-z] characters, maybe even tab key). You will have to escape them because you can't store them reliably in text file.

I see.

I guess what I'm saying here is that what we currently have in place in code is a .replaceAll('\\', '\\\\') and that can be easily improved by using a raw string which can cover more cases, for example org.apache.pdfbox.contentstream.operator.OperatorName#SHOW_TEXT_LINE_AND_SPACE. This change doesn't close this issue

Second, will a raw string with complicated construction show up properly on hover / documentation?

For me it shows the exact definition, your milage may vary in your specific IDE.


In general, I think we usually want to pass the static finals as arguments when calling Java methods, so having them [only] as Dart strings would not be the best idea.

@mahesh-hegde
Copy link
Contributor Author

So the verdict is to have the same static final constants as JString and String.

  1. How should we name two fields?

    • FINAL_FIELD and FINAL_FIELD_DART_VALUE
    • FINAL_FIELD_JSTRING and FINAL_FIELD
    • FINAL_FIELD_JSTRING and FINAL_FIELD_DART_VALUE
  2. Should the JString form be a

    • Normal field getter?
    • static final field, which will be initialized to a global ref on first access.?

Regarding escaping of the literal, I think we should not take any shortcut, rather look at dart spec and write an escaping function.

@mahesh-hegde mahesh-hegde added type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) package:jnigen labels Apr 13, 2023
@mahesh-hegde
Copy link
Contributor Author

From discussion

  • Implement static final string as getter (static final primitives can be values)

  • Leave doc comment containing original value

  • For static final char, generate primitive-valued const field, but leave a comment with stringified value.

@HosseinYousefi
Copy link
Member

Sidenote: we could turn all the non-alphanum characters into \u1234 form. This is probably what I will do for the content in the comment.

@HosseinYousefi HosseinYousefi self-assigned this Sep 25, 2023
@HosseinYousefi
Copy link
Member

I think we can have a special "protected" constructor for JString that has its underlying Dart String representation. In general this is useful, since Java strings are immutable.

JString can have a private String? _dartString that basically caches the result of toDartString. Along with this we can have a _fromRefAndString(super.reference, this._dartString) constructor.

This is useful for multiple reasons:

  1. Multiple calls to .toDartString for the same JString is going to be faster.
  2. static final String s in Java can be generated as static final JString s = JString._fromRefAndString(theRef, 'abc\u{123}'); with a quick toDartString.
  3. 'abc'.toJString() and JString.fromString('abc') can now fill the cache so again .toDartString can be more efficient.

@dcharkes What do you think?

@dcharkes
Copy link
Collaborator

If performance is an issue, sgtm.

@HosseinYousefi
Copy link
Member

HosseinYousefi commented Sep 25, 2023

If performance is an issue, sgtm.

It's not necessarily an issue. It's mainly useful for checking the content of static finals, There were ideas of having two fields, like FIELD and FIELD_DART_VALUE to include both JString and String. But this can combine both. And the content of the String will be visible in the generated code at a glance similar to commenting.

The other performance benefits are merely an added bonus.


We could even consider not filling up the cache for other situations to optimize memory usage (since the string would be stored both in Dart and Java.)

@dcharkes
Copy link
Collaborator

And the content of the String will be visible in the generated code at a glance similar to commenting.

Ah right. Sounds good.

HosseinYousefi added a commit that referenced this issue Nov 27, 2023
* Revert "[jnigen] Add `JLazyReference` and `JFinalString` (dart-lang/jnigen#400)"

This reverts commit 8456a7c.

* Close #792
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package:jni package:jnigen type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
Status: Done
3 participants