Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate tsi.src and phone.cin #214

Open
kcwu opened this issue Mar 30, 2016 · 5 comments
Open

Validate tsi.src and phone.cin #214

kcwu opened this issue Mar 30, 2016 · 5 comments
Labels

Comments

@kcwu
Copy link
Member

kcwu commented Mar 30, 2016

tsi.src and phone.cin were often broken in the past.
Not only sorting order, sometimes the syntax is bad (missing frequency, extra space, illegal bopomofo, etc.)

We should validate them in CI to keep them in good state.

Before somebody write the validation code, checking the sorting order seems a good start.
cc @PeterDaveHello

@czchen
Copy link
Member

czchen commented Mar 30, 2016

I think we already has some checks implemented in https://github.com/chewing/libchewing/blob/master/src/tools/init_database.c for phone.cin and tsi.src. Not sure if any check is missing for these two.

@kcwu
Copy link
Member Author

kcwu commented Mar 30, 2016

init_database.c is tolerance to errors and more robust. For example,

  • init_database.c allows delimiter has more than one space or trailing space.
  • init_database.c allows illegal bopomofo sequence like ˊ.
  • init_database.c allows negative numbers or even non-decimal 0xab
  • init_database.c allows blank line

I'd like to have stricter validator.

@czchen
Copy link
Member

czchen commented Mar 30, 2016

@kcwu, do you think we can just use a stricter parser in init_database.c, or we really need a separate validator?

@kcwu
Copy link
Member Author

kcwu commented Mar 31, 2016

These two definitely should be rejected by init_database.c

  • init_database.c allows illegal bopomofo sequence like ˊ.
  • init_database.c allows negative numbers or even non-decimal 0xab

For blank line and extra spaces (and sorting order), I'm not sure should we enforce or not.

@czchen czchen added the feature label Apr 1, 2016
@Billy4195
Copy link
Contributor

I can't understand the above descriptions of the situation that init_database.c should avoid.
The first feature illegal bopomofo sequence means the sequence only contains ˋ ˊ ˇ??
The second feature where are the negative numbers and the non-decimal numbers??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants