Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

translating xml - spaces deleted at end of text string before <b> bold #35

Closed
vf211 opened this issue May 5, 2022 · 6 comments
Closed

Comments

@vf211
Copy link

vf211 commented May 5, 2022

Hello, I am using this API to translate XML documents from Schema. Some of my text elements have bold sections and I use lxml.etree to parse and translate text and tail elements. The bold elements within text end up with the space removed in front of them.
Anything I can do to keep the spaces?

@daniel-jones-deepl
Copy link
Member

Hi @vf211, thanks for creating this issue.

Can you please provide a short example? I'm assuming you're using translate_text(), do you use tag_handling?.

@vf211
Copy link
Author

vf211 commented May 6, 2022

Dear Daniel, thank you so much for your quick reply. Here is an example of text from the XML file where I have a problem:

1. Tap Set Up (1).

when it translates it replaces this with translated version but takes out the gap after "Tap", like this:

1. TapSet Up (1).

Here is the code I use. The issue is when we have bold text within the text element.

import lxml.etree as ET
import deepl

file='New_project_en-US_to_it-IT.xml' #starting file that needs a translate
file2='doc2.xml' # file that it will save to
lang='IT' #languate to translate to

tree = ET.parse(file)
root = tree.getroot()

translator = deepl.Translator("53f18f3b-ba9c-...") #API key from DEEPL API

for elem in root.iter():
try:
if elem.text:
elem.text = elem.text.replace(elem.text, translator.translate_text(elem.text, target_lang=lang).text)
if elem.tail:
elem.tail = elem.tail.replace(elem.tail, translator.translate_text(elem.tail, target_lang=lang).text)
except AttributeError:
pass
except deepl.DocumentTranslationException as error:
doc_id = error.document_handle.id
doc_key = error.document_handle.key
print(f"Error after uploading document ${error}, id: ${doc_id} key: ${doc_key}")
except deepl.DeepLException as error:
print(error)

tree.write(file2)

@daniel-jones-deepl
Copy link
Member

Hi @vf211, thanks for the reply. Please avoid exposing your authentication key in public; I've edited your post above. You should now regenerate your authentication key on your account page.

Your code looks okay.
In the line elem.text = elem.text.replace(elem.text, translator.translate_text(elem.text, target_lang=lang).text) could you share an example value of an elem.text passed to translate_text that contains the <b> elements you mentioned?
For example by adding print(elem.text) above that line.

I tried translating "Tap <b>Set Up</b> (1)" with target_lang="IT" but the result I get is "Tocca <b>Set Up</b> (1)" with a space correctly before the <b> tag.

@vf211
Copy link
Author

vf211 commented May 6, 2022

Dear Daniel, thank you again for your reply.
Here is a snippet of text where the gap was also consumed in translation. This time in Lexiconlink tagging so il Raggi looks like ilRaggi.

Questo messaggio viene visualizzato quando un pacchetto raggiunge ilRaggi X rivelatore ma ilcodice a barre non viene letto.

Controllare quanto segue:

)

image

and this was the English that went into the code:

This message is displayed when a pack reaches the X-ray detector but the barcode is not read.

Check the following:

The alignment of the barcode scanner.

image

@vf211
Copy link
Author

vf211 commented May 6, 2022

hi Daniel,

I added the xml tagging into the code as you suggested in your first email and that seems to have fixed our problems. Thank you so much for all your help!

@daniel-jones-deepl
Copy link
Member

Hi @vf211, great to hear that you have fixed the problems, thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants