You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm sorry if this is not on topic as you are talking about HTML primarily.
I noticed that I can't successfully translate texts that contain \n through the command line I "fixed" it by substituting \n with another symbol like * during translation and then vice versa after the translation is done. It works sometimes but often not, due to my text input having different sets of \n, \r and so on.
Is there or would there be a way to ignore all sorts of carriage returns?
Edit
Hmm, apologies. It seems like it does indeed translate with no problems and automatically removes all carriage returns on the command line. The issue I had must have had to do with Tauri.
I'll revise my question - Would it be possible to implement an option to only ignore \n & \r but not remove them?
Can we add a command line option that makes translateLocally CLI not treat a new line as a paragraph boundary?
I'm aware the sentence splitter has a mode where it just ignores newlines, and will treat them as tokens in the sentence. I'm not sure how these survive the translation. I'm assuming they'd need to be part of the vocab? None of our models are trained on sentences that contain newlines.
The text was updated successfully, but these errors were encountered:
Pretty sure the sentence splitter, when configured to unwrap lines, just replaces newline with space. There is no formatting preservation there. I guess we could do something alignment based, but the main use case for this is wrapped text with what are effectively soft returns. And soft returns just go back at column width, not a semantic position.
I put a band-aid on it for now by setting the container to the longest line's width & using auto word-breaking if anyone wonders. Replacing \n or \r didn't work perfectly due to some of the translation breaking if replaced with a symbol.
Lifted from @Godnoken #81 (comment)
Can we add a command line option that makes translateLocally CLI not treat a new line as a paragraph boundary?
I'm aware the sentence splitter has a mode where it just ignores newlines, and will treat them as tokens in the sentence. I'm not sure how these survive the translation. I'm assuming they'd need to be part of the vocab? None of our models are trained on sentences that contain newlines.
The text was updated successfully, but these errors were encountered: