Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CJK and emoji stream output #309

Merged
merged 1 commit into from
Aug 29, 2023
Merged

Fix CJK and emoji stream output #309

merged 1 commit into from
Aug 29, 2023

Conversation

MeouSker77
Copy link
Contributor

Missing CJK characters is caused by decoding incomplete bytes. This PR shows a way to fix it, hope it can be a reference.

@gjmulder gjmulder added bug Something isn't working high-priority quality Quality of model output labels Jun 2, 2023
@Equim-chan
Copy link

Is there any further progress on this? This not only affects CJK but also wide Unicode characters such as emojis (see #372).

@MeouSker77 MeouSker77 marked this pull request as ready for review August 9, 2023 14:08
@MeouSker77
Copy link
Contributor Author

I didn't mark it as ready for review because I'm not sure how to handle the case where logprobs is not None.
But it seems logprobs is unimportant, so I just keep it unchanged.

@MeouSker77 MeouSker77 changed the title Draft: fix CJK stream output Fix CJK and emoji stream output Aug 9, 2023
@MeouSker77
Copy link
Contributor Author

hi, @gjmulder can you approve running workflow ?

@gjmulder
Copy link
Contributor

@abetlen ping

@LiuYuWei
Copy link

I hope to express through this message that the Pull Request (PR) submitted by MeouSker77 can be beneficial to me. This PR is extremely important for the output of Traditional Chinese characters as it can resolve the existing issue of missing characters.

While using the current version, we found that occasionally, there would be missing characters when outputting Traditional Chinese text. This not only affects the user experience but can also lead to the loss or misinterpretation of important information.

In MeouSker77's PR, they made some adjustments and optimizations to the code to ensure that no characters are missed when outputting Traditional Chinese text. We have conducted multiple tests on this PR and confirmed its effectiveness and stability.

Therefore, we strongly recommend that you consider this PR, as it not only solves a significant issue but also enhances the user experience.

Before:
image

After
image

@MeouSker77
Copy link
Contributor Author

@abetlen ping

@abetlen abetlen merged commit bae44ec into abetlen:main Aug 29, 2023
@abetlen
Copy link
Owner

abetlen commented Aug 29, 2023

@MeouSker77 thank you for the contribution and my apologies for the delay on reviewing this PR, should have a release pushed shortly.

@MeouSker77 MeouSker77 deleted the fix-CJK branch August 29, 2023 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high-priority quality Quality of model output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants