Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter by detected unicode script types or IANA character sets detected in name and files.path values #155

Closed
bmfrosty opened this issue Feb 21, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@bmfrosty
Copy link

Is your feature request related to a problem? Please describe

I'd like to be able to filter for types of script or IANA character set detected in file and path names.

Describe the solution you'd like

Unicode is broken up into multiple script types and character sets (I believe there are characters than can be a part of multiple script types or character sets, but I am not a unicode expert). I would like to be able to filter by things that only include things in something like Latin script or the US-ASCII character set to try and limit one way, or alternatively search for things that include Hiragana OR Katakana, or even more complex like Hiragana OR Katakana. Even more complex would be Han AND NOT Hiragana AND NOT Katakana.

Additional context

Certain languages will include characters in multiple scripts or character sets. For content in English, it may work well to filter only by something (this will take trial and error), but Chinese content will contain Han, but not Hiragana or Katakana, and Japanese content will usually contain Hiragana or Katakana and also other characters.

@bmfrosty bmfrosty added the enhancement New feature or request label Feb 21, 2024
@mgdigital
Copy link
Collaborator

Hi @bmfrosty , there's already this open issue that might do what you want in a slightly different way - #49 - if the language filter was populated using the results from https://github.com/pemistahl/lingua-go, would this achieve what you'e aiming for?

@bmfrosty
Copy link
Author

bmfrosty commented Feb 24, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants