Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-latin formal app names cause problems #612

Closed
freakboy3742 opened this issue Sep 1, 2021 · 13 comments
Closed

Non-latin formal app names cause problems #612

freakboy3742 opened this issue Sep 1, 2021 · 13 comments
Labels
bug A crash or error in behavior. good first issue Is this your first time contributing? This could be a good place to start!

Comments

@freakboy3742
Copy link
Member

If you use non-latin characters (e.g., 学口算, or anything outside the ASCII set) in the formal app name, briefcase new fails. Similar problems occur if you manually define pyproject.toml, and then run briefcase create.

This is because briefcase tries to create a "Class Name", as well as other artefacts by processing the formal app name; these conversions fail with non-latin names.

To Reproduce
Steps to reproduce the behavior:

  1. Run briefcase new
  2. Specify an app name of 学口算.

or

  1. Run briefcase new
  2. Modify pyproject.toml so that the formal app name is 学口算
  3. Run briefcase create

Expected behavior

The formal name is human-readable, so any unicode string should be valid.

@freakboy3742 freakboy3742 added the bug A crash or error in behavior. label Sep 1, 2021
@freakboy3742 freakboy3742 changed the title Non-latin application names cause problems Non-latin formal app names cause problems Sep 1, 2021
@freakboy3742 freakboy3742 added up-for-grabs good first issue Is this your first time contributing? This could be a good place to start! labels Sep 1, 2021
@NaftoliOst
Copy link

Hi, can I claim this issue

@freakboy3742
Copy link
Member Author

@NaftoliOst Absolutely! If you need any pointers or help getting started, let us know!

@freakboy3742 freakboy3742 removed up-for-grabs good first issue Is this your first time contributing? This could be a good place to start! labels Sep 1, 2021
@freakboy3742 freakboy3742 added good first issue Is this your first time contributing? This could be a good place to start! up-for-grabs labels Sep 26, 2021
@freakboy3742
Copy link
Member Author

Via Discord, there is evidently a problem with non-latin descriptions as well. Any testing for this bug should include anything "human text-like" in the spread of tests.

@guaiguaicai
Copy link

It’s so touching to see you guys improving Briefcase so seriously, thank you!

@xyang-github
Copy link

I see this issue is still open. If so, can I claim this one?

@freakboy3742
Copy link
Member Author

@xyang-github Absolutely! Let us know if you need any pointers or assistance.

@xyang-github
Copy link

@freakboy3742 It appears that the re.sub is removing all non-ascii characters. For example, entering Chinese characters will result in a empty string; entering a mixture of English letters and Chinese characters will result in a string that is just English letters. I've found a way to bypass this, but getting a cookiecutter.exceptions.FailedHookException now. Still looking at this, but if you have any pointers on how to attempt this kind of exception, I would appreciate it (sorry, this is my first attempt to contribute so still new to the process).

@freakboy3742
Copy link
Member Author

@xyang-github Thanks for that investigation - that definitely explains why we're seeing a problem.

As for your question - it's hard to suggest a fix without knowing what it is you're trying. If you can submit what you have as a pull request, that gives us something to discuss - even if it fails tests, it's something concrete to use for a discussion.

However, looking at the general problem, we're going to need to work out how to convert a "free text" app name into a Python-compatible class name. When the text is purely latin, the re.sub approach works fine, but when the input text is all "invalid Python identifier" characters, it's going to be difficult to manufacture a class name.

I suspect we may need to make the "get a class name" portion explicit, in a similar way that we do for the app name. That is, instead of assuming the class name can be automatically derived from the formal name, we should explicitly ask the user "what name do you want to use for the main App class?". We can use the re.sub() approach to provide an initial hint (much as we do for app_name); if the user uses a Chinese name, then the "hint" will be an empty string, but that won't be valid as actual input, so they'll need to provide a valid Python class name.

The only other alternative would be to work out a way to convert the Chinese text into a Python identifier. My knowledge of Mandarin doesn't go much beyond 你好 and counting 😄, so I don't know if there's any sort of common convention amongst Chinese-speaking programmers for coming up with class names - but if there is, and we can reliably automate (or semi-automate) that process, then we could try to follow that. However, I strongly suspect that "just ask the user for a class name" will be a lot more reliable.

@xyang-github xyang-github mentioned this issue Oct 17, 2021
2 tasks
@xyang-github
Copy link

@freakboy3742 I've done more debugging, and realized that any non-ascii characters will lead to some kind of crash (either with the hooks or a unicode decode error when running 'briefcase dev'). Based on your input and what I have poked around with, it seems that the only work around is to explicitly ask for a class name, and also use some form of validation for all text input (e.g. formal name, class name, app name, author, etc.). Would a new request/ticket need to made for this type of change, or can I go ahead and continue with the current ticket?

@freakboy3742
Copy link
Member Author

@xyang-github It's all the same problem, so there's no new ticket required; whether it's a new PR is up to you. If it's easier to start from scratch, then start a new PR; however, if it's just as easy to adapt what you have here, you can keep going on this PR.

@xyang-github xyang-github mentioned this issue Oct 29, 2021
4 tasks
@charlietrumm
Copy link

I am new to contributing to open source.

I have made a simple patch for this bug. If I type in non-latin characters, it will make a class name of only python acceptable characters. If there are no python acceptable characters (all non-latin characters or a python operator name e.g. while) then it will ask again for another app name.

I use the git command line utility all of the time but have never used github website. I have setup a github account. I have a local commit on my computer with my changes but I do not know where to push this data. Please can I have help submitting this as a request.

@freakboy3742
Copy link
Member Author

@charlietrumm As I indicated on the PR; someone is already working on this PR. It would be better to look for another issue that is flagged first-timers-only and up-for-grabs.

However, from the sound of it, it might be worth doing a few Github tutorials first. This section of Github's docs gives a introduction to the Github UI and how it maps to the git command line.

@guaiguaicai
Copy link

guaiguaicai commented Apr 4, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A crash or error in behavior. good first issue Is this your first time contributing? This could be a good place to start!
Projects
None yet
Development

No branches or pull requests

5 participants