Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shouldn't whitespace characters be escaped as well #15

Closed
FSMaxB opened this issue Jan 5, 2018 · 19 comments
Closed

Shouldn't whitespace characters be escaped as well #15

FSMaxB opened this issue Jan 5, 2018 · 19 comments

Comments

@FSMaxB
Copy link

FSMaxB commented Jan 5, 2018

I'm not sure if this is something that you want for this library, but since I'm trying to use it to display a string mostly as is in LaTeX, it would be really useful if it would replace whitespace characters as well, because otherwise I would have to do it myself.

See josdejong/mathjs#995

@dangmai
Copy link
Owner

dangmai commented Jan 5, 2018

I'm not opposed to the idea, but since I haven't used LaTeX in a while I'd need some help implementing that functionality, could you give me some pointers about:

  • This seems to be mostly useful in Maths mode, right? Normal text mode seems to be fine with white spaces.
  • Is there a convention on what character(s) the white space should be converted to?

@FSMaxB
Copy link
Author

FSMaxB commented Jan 5, 2018

I'm not too deep into LaTeX myself, but I consider ~ which is a nonbreaking space in LaTeX a good replacement, although a different one might be necessary for tabs.

This would be useful not only in math mode but also in normal text if multiple whitespaces should produce multiple ones in the output.

One thing I'm not sure about is if this is something that should be enabled by default or require a configuration option (like a second parameter for example).

The actual escaping can be done using a regex that is applied after all previous replacements are complete.

I can make a pull request if you want, once the technical details have been agreed upon.

@dangmai
Copy link
Owner

dangmai commented Jan 5, 2018

Thanks for the quick response, here are a few quick thoughts I have before I leave the office:

  • To me, there are 2 different types of escaping:

    • One is "strict" escaping, which basically means the generated LaTeX source is valid. This is the type of escaping that this library is currently doing (except for en- and em- dashes, which I added because of a user request and didn't think through).
    • Another one is "format" escaping, which makes sure that the final LaTeX looks the same as the Javascript input string. If I understand this correctly, this is what you want to add.
  • For the sake of consistency and backwards compatibility, I'd like the library to default to "strict" mode, but have the option to turn on "format" mode. This could be turned on by some param.

  • A possible issue with adding "format" escaping is that there are many things we'll have to take into account: dashes, spaces, tabs, new lines, and more that I can't come up with right now. So I'd love for us to include as much as we can, but also give the user an option to include their own escapes if we miss some (or to remove escapes that they don't need).

  • Another possible issue is that the library currently escapes a string recursively, which means a risk of stack overflow if we break up a very large string at every white space character. I'd like to come up with a safer solution before we put in white space escapes, maybe some form of event emitters.

I'll mull over this a bit more this weekend, but feel free to put your thoughts here if you have any.

@FSMaxB
Copy link
Author

FSMaxB commented Jan 5, 2018

Thanks for the quick response

Thank you for responding to me. This is your library after all.

I agree with your points about different types of escaping and that they should be separate.

Another possible issue is that the library currently escapes a string recursively, which means a risk of stack overflow if we break up a very large string at every white space character. I'd like to come up with a safer solution before we put in white space escapes, maybe some form of event emitters.

For replacing whitespace this isn't an issue at least, because it only requires one replace via a regex after all the previous replacements are finished. But I will have to take a closer look at your implementation again.

@FSMaxB
Copy link
Author

FSMaxB commented Jan 5, 2018

And if recursion is an issue, it just has to be rewritten procedurally in a loop. It might make sense to first find all the matches, split up the string into those and then replace them one by one by a simple lookup.

@dangmai
Copy link
Owner

dangmai commented Jan 7, 2018

Quick progress update: I've modified the algorithm to be iterative instead of recursive and it doesn't overflow the stack anymore for very large strings. I'll tackle the different type of escapes in the next couple of days.

@dangmai
Copy link
Owner

dangmai commented Jan 8, 2018

I've added a param that let the user choose whether to preserve formatting or not, and also one that lets them modify the escape mapping if they so choose. At this point, we should brainstorm about which characters that should be formatted. I'll do some research about it in the next couple of days, do you have suggestions in the mean time?

@FSMaxB
Copy link
Author

FSMaxB commented Jan 8, 2018

Thanks!

For now I can only think of space -> ~ and tab -> maybe \qquad.

@dangmai
Copy link
Owner

dangmai commented Jan 11, 2018

Do you think we need one for \n? Looks like it can be \\ or \newline, but it seems that those won't stack (i.e. multiple new lines won't be generated by LaTeX)

@FSMaxB
Copy link
Author

FSMaxB commented Jan 11, 2018

Strange, for me they do. But not in math mode though. But I think it is just not possible in math mode. (at least not without doing any matrix tricks or similar.

@dangmai
Copy link
Owner

dangmai commented Jan 16, 2018

Sorry it took me a long time to get back to this. I've published a beta version that contains the new changes we've discussed in this issue, you can try it by running:

npm install escape-latex@next

The README has the most up-to-date API for the library. Feel free to try it out and let me know what you think. I'll push the actual release version in about a week if nothing is wrong with the beta version.

@FSMaxB
Copy link
Author

FSMaxB commented Jan 22, 2018

I just started integrating the new version and it generally works great, but there is one problem that has to be taken into account: If a word follows a tab or newline, it breaks the \qquad and \newline.

I think this can be fixed by adding an additional {} to the end, the same as with \textbackslash{} for example.

@dangmai
Copy link
Owner

dangmai commented Jan 23, 2018

Thank you for spotting that! I totally forgot about that case 🤦‍♂️

I've pushed version 1.0.0-beta.2 to fix that issue.

@FSMaxB
Copy link
Author

FSMaxB commented Jan 24, 2018

Your beta2 doesn't actually have the fix in it!

@FSMaxB
Copy link
Author

FSMaxB commented Jan 24, 2018

(either that or there is a bug in npm)

@dangmai
Copy link
Owner

dangmai commented Jan 24, 2018

Nope no bug on npm side, just carelessness on my part! 😊 I've pushed 1.0.0-beta.3 and verified that it has the fix in it

@FSMaxB
Copy link
Author

FSMaxB commented Jan 25, 2018

Now I've tested it and it works great. I haven't tested the functions that allows changing the escape map though.

@dangmai
Copy link
Owner

dangmai commented Jan 25, 2018

Ok great, I'll close this issue and release the final 1.0.0 version soon. Feel free to open another one if you need more escape characters or anything else.

Thanks for the help 🥇

@dangmai dangmai closed this as completed Jan 25, 2018
@FSMaxB
Copy link
Author

FSMaxB commented Jan 25, 2018

Great, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants