Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use beta-handling of HTML tags in SUMM.AI API #2263

Open
Tracked by #1928
timobrembeck opened this issue May 8, 2023 · 3 comments
Open
Tracked by #1928

Use beta-handling of HTML tags in SUMM.AI API #2263

timobrembeck opened this issue May 8, 2023 · 3 comments
Labels
💡 feature New feature or request ❗ prio: medium Should be scheduled in the forseeable future. ⛔ blocked Blocked by external dependency ❓ question Further information is requested ☺️ effort: low Should be doable in <4h
Milestone

Comments

@timobrembeck
Copy link
Member

Motivation

SUMM.AI published a beta version with they API which can handle basic HTML input (only links and bold text at the moment) by passing the flag input_text_type:

https://summai.notion.site/SUMM-API-Documentation-fc41fa5ef1434ad2b311be4876b4685a?p=e8436a4a6d154249a70d32ba45c88e2e&pm=s

Proposed Solution

  • Add input_text_type="html" to the SUMM.AI API call
  • Do a bit of additional testing whether additional changes to the API client are required
@timobrembeck timobrembeck added 💡 feature New feature or request ❗ prio: medium Should be scheduled in the forseeable future. ☺️ effort: low Should be doable in <4h labels May 8, 2023
@timobrembeck timobrembeck added this to the 23Q3 milestone May 8, 2023
@seluianova seluianova self-assigned this Aug 6, 2023
@svenseeberg svenseeberg modified the milestones: 23Q3, 23Q4 Sep 23, 2023
@timobrembeck timobrembeck modified the milestones: 23Q4, 24Q1 Jan 10, 2024
@seluianova
Copy link
Contributor

seluianova commented Feb 7, 2024

The purpose of this change is to preserve the formatting (bold, links) in the translated text, right?

As far as I tested, SUMM AI retains HTML tags only in some cases and I couldn't discover the exact logic.

For example, if I try to translate the following text in the html mode:

Deutsches Recht (<strong>Gesetzessammlungen</strong>) ist darüber hinaus der Name von verschiedenen Gesetzessammlungen, die das zum Erscheinungszeitpunkt in Deutschland geltende <a href='http://google.com'>Recht</a> oder auch nur bestimmte Teile davon, ggf. auch außer Kraft getretenes Recht, zusammenstellen und auf einfache Weise verfügbar machen.

I get the result without the boldface and link:

<p>Es gibt auch Bücher mit den Gesetzen.<br/>Die Bücher heißen: Deutsches Recht.<br/>In den Büchern stehen die Gesetze von Deutschland.<br/>Manchmal stehen auch nur die alten Gesetze in den Büchern.</p>

In addition (sometimes), the arrangement of tags in the text affects the translation result in unexpected ways.

For example, if I make 'strong' another word in the text above, I get a different translation 🤔

Request:
<strong>Deutsches Recht</strong> (Gesetzessammlungen) ist darüber hinaus der Name von verschiedenen Gesetzessammlungen, die das zum Erscheinungszeitpunkt in Deutschland geltende <a href='http://google.com'>Recht</a> oder auch nur bestimmte Teile davon, ggf. auch außer Kraft getretenes Recht, zusammenstellen und auf einfache Weise verfügbar machen.

Result:
<strong>Deutsches Recht</strong> (Gesetzes-Sammlungen) ist darüber hinaus der Name von verschiedenen Gesetzes-Sammlungen, die das zum Erscheinungs-Zeit-Punkt in Deutschland geltende <a href=\"http://google.com\">Recht</a> oder auch nur bestimmte Teile davon, ggf. auch außer Kraft getretenes Recht, zusammenstellen und auf einfache Weise verfügbar machen.

So, do we still want to introduce this?
@timobrembeck could you pls take a look?

Do a bit of additional testing whether additional changes to the API client are required

Yes, we would need to refactor the way how we prepare the request to SUMM AI, because currently we clean all the tags. And the way how we process the response data, because we will need to adjust the processing of paragraphs.

@seluianova seluianova added the ❓ question Further information is requested label Feb 7, 2024
@seluianova
Copy link
Contributor

seluianova commented Feb 14, 2024

Another curious example

Input text:
<a href='http://gooogle.com'>Willkommen</a> in Augsburg

Translation result - link is preserved:
<a href=\"http://gooogle.com\">Willkommen</a> in Augsburg

But for this input text:
Willkommen in <a href='http://gooogle.com'>Augsburg</a>

Translation result - link is lost:
<p>Willkommen in Augsburg</p>

@JoeyStk JoeyStk added the ⛔ blocked Blocked by external dependency label Feb 27, 2024
@JoeyStk JoeyStk modified the milestones: 24Q1, 24Q2 Feb 27, 2024
@JoeyStk
Copy link
Contributor

JoeyStk commented Feb 27, 2024

Was moved to the next milestone backlog as we are still blocked by SummAI and therefore not feasible at the moment

@JoeyStk JoeyStk modified the milestones: 24Q2, Backlog Feb 27, 2024
@seluianova seluianova removed their assignment Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💡 feature New feature or request ❗ prio: medium Should be scheduled in the forseeable future. ⛔ blocked Blocked by external dependency ❓ question Further information is requested ☺️ effort: low Should be doable in <4h
Projects
None yet
Development

No branches or pull requests

4 participants