Handling \r & \n on CLI #120

jelmervdl · 2022-10-03T14:36:18Z

I'm sorry if this is not on topic as you are talking about HTML primarily.

I noticed that I can't successfully translate texts that contain \n through the command line I "fixed" it by substituting \n with another symbol like * during translation and then vice versa after the translation is done. It works sometimes but often not, due to my text input having different sets of \n, \r and so on.

Is there or would there be a way to ignore all sorts of carriage returns?

Edit

Hmm, apologies. It seems like it does indeed translate with no problems and automatically removes all carriage returns on the command line. The issue I had must have had to do with Tauri.

I'll revise my question - Would it be possible to implement an option to only ignore \n & \r but not remove them?

Can we add a command line option that makes translateLocally CLI not treat a new line as a paragraph boundary?

I'm aware the sentence splitter has a mode where it just ignores newlines, and will treat them as tokens in the sentence. I'm not sure how these survive the translation. I'm assuming they'd need to be part of the vocab? None of our models are trained on sentences that contain newlines.

kpu · 2022-10-03T14:51:59Z

Pretty sure the sentence splitter, when configured to unwrap lines, just replaces newline with space. There is no formatting preservation there. I guess we could do something alignment based, but the main use case for this is wrapped text with what are effectively soft returns. And soft returns just go back at column width, not a semantic position.

Godnoken · 2022-10-12T16:21:02Z

Thank you for bringing this up.

I put a band-aid on it for now by setting the container to the longest line's width & using auto word-breaking if anyone wonders. Replacing \n or \r didn't work perfectly due to some of the translation breaking if replaced with a symbol.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling \r & \n on CLI #120

Handling \r & \n on CLI #120

jelmervdl commented Oct 3, 2022

kpu commented Oct 3, 2022

Godnoken commented Oct 12, 2022

Handling \r & \n on CLI #120

Handling \r & \n on CLI #120

Comments

jelmervdl commented Oct 3, 2022

kpu commented Oct 3, 2022

Godnoken commented Oct 12, 2022