Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function to check if target_word contains CJK characters #928

Merged
merged 2 commits into from Jan 9, 2023

Conversation

ahmad-alkadri
Copy link
Contributor

This PR is linked to Issue #904 which shows that Whoogle results do not render bold all of target words if they are Chinese characters.

Further investigations show similar behavior for Japanese (hiragana, katakana, kanji), and Korean (hangul syllables, hangul jamo) characters: not all of the words displayed on the result page are bolded.

To handle this, a function was added to check if target_word in bold_search_terms.replace_any_case contains Chinese, Korean, or Japanese characters and apply the regex that doesn't check for whitespace. This way, each search term would be bolded differently.

Screenshots of the search results after the commits linked to this PR:

screenshot-localhost_5000-2023 01 08-23_12_08

screenshot-localhost_5000-2023 01 08-23_14_26

screenshot-localhost_5000-2023 01 08-23_17_20

CJK characters: Chinese, Japanese (hiragana, katakana, kanji), and Korean (hangul syllables, hangul jamo)
Copy link
Owner

@benbusby benbusby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for taking care of this! I had just one idea for a way to make the code a little easier to modify in the future (if needed), let me know your thoughts.

app/utils/results.py Outdated Show resolved Hide resolved
Co-authored-by: Ben Busby <contact@benbusby.com>
@ahmad-alkadri
Copy link
Contributor Author

Looks good to me, thanks for taking care of this! I had just one idea for a way to make the code a little easier to modify in the future (if needed), let me know your thoughts.

Hi @benbusby; thanks for your input! I agree with your suggestion, and I've tested it also on several queries:

screenshot-localhost_5000-2023 01 09-19_58_06

screenshot-localhost_5000-2023 01 09-19_59_20

It seems to work well. So I've committed it to the branch; would you like to test it also? I'm available for other follow-up discussions/modifications on this.

Copy link
Owner

@benbusby benbusby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you again!

@benbusby benbusby merged commit e5a5aad into benbusby:main Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Most of the search terms are not bold in Chinese results
2 participants