Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add translation pipeline model #110

Merged
merged 12 commits into from Nov 17, 2021
Merged

Add translation pipeline model #110

merged 12 commits into from Nov 17, 2021

Conversation

haileyschoelkopf
Copy link
Collaborator

@haileyschoelkopf haileyschoelkopf commented Nov 10, 2021

add a translation pipeline model class (other lang -> translate to english -> summarization in english -> translate summaries to english)

Addressing #109

@haileyschoelkopf
Copy link
Collaborator Author

still TODO: add this new model to the documentation

@niansong1996
Copy link
Collaborator

Let's merge #96 and #98 first and maybe rebase this branch on the merged main to make the changes on the translation pipeline model actually stand out. Sorry about the delay on those PRs.

@haileyschoelkopf
Copy link
Collaborator Author

@niansong1996 should be ready for review now!

README.md Outdated
@@ -87,6 +87,7 @@ SummerTime supports different models (e.g., TextRank, BART, Longformer) as well
| LongformerModel | :heavy_check_mark: | | | | |
| MBartModel | :heavy_check_mark: | | | | 50 languages (Arabic, Czech, German, English, Spanish, Estonian, Finnish, French, Gujarati, Hindi, Italian, Japanese, Kazakh, Korean, Lithuanian, Latvian, Burmese, Nepali, Dutch, Romanian, Russian, Sinhala, Turkish, Vietnamese, Chinese, Afrikaans, Azerbaijani, Bengali, Persian, Hebrew, Croatian, Indonesian, Georgian, Khmer, Macedonian, Malayalam, Mongolian, Marathi, Polish, Pashto, Portuguese, Swedish, Tamil, Telugu, Thai, Tagalog, Ukrainian, Urdu, Xhosa, Slovenian) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the languages, maybe compile a list and only put a link here to that list?

)
# TODO: translate each doc separately if provided multiple docs in corpus?
if queries:
queries = self.translator.translate(queries, target_lang="en", beam_size=4)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you specify the beam_size as 4, is the output one sentence or 4?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still just one sentence -- this is just the beam size for the translation model's generation.

Copy link
Collaborator

@niansong1996 niansong1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left a few comments and should be ready to merge after you respond to them!

@haileyschoelkopf
Copy link
Collaborator Author

All set with the changes! (assuming you were referring to the mBART-50 listed languages in the README)

@niansong1996
Copy link
Collaborator

LGTM, merging now

@niansong1996 niansong1996 merged commit eaac483 into main Nov 17, 2021
@haileyschoelkopf haileyschoelkopf deleted the nick/translationmodel branch November 18, 2021 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants