Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly read phrase frequencies for sorting #337

Merged
merged 1 commit into from Jul 15, 2023

Conversation

yan12125
Copy link
Contributor

Closes #334

@yan12125
Copy link
Contributor Author

Well, no CI tests are run as travis-ci.org is ceased. With this change, all tests are still green on my machine.

@ShikiSuen
Copy link

@yan12125 What about GitHub continuous integration?

@@ -531,7 +531,7 @@ void insert_leaf(NODE * parent, long phr_pos, uint32_t freq)
NODE *pnew;

for (p = parent->pFirstChild; p && GetUint16(p->data.key) == 0; prev = p, p = p->pNextSibling)
if (GetUint16(p->data.phrase.freq) <= freq)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是太懂 C,不知道這裡是否可以用 GetUint32。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果要用uint32,詞庫的格式以及libchewing要跟著改,用四個uint8_t表示頻率。目前我是以相容現有的libchewing為目標。

@yan12125
Copy link
Contributor Author

What about GitHub continuous integration?

That's a good idea. If libchewing core developers are fine with GitHub Actions, I can try it in another PR.

@yan12125
Copy link
Contributor Author

Rebased to test the change through GitHub Actions, which was added recently in #349.

@yan12125
Copy link
Contributor Author

@kanru Mind to have another look?

Per #334 (comment), it was expected that init_database.c would be removed in favor of init_database.rs. It turns out that the latter uses a different file format - 32 bit integers for phrase frequency [1] rather than 24-bit integers as in init_database.c. As long as WITH_RUST=OFF is supported, init_database.c should be maintained. Thus, I think it's time to revisit this fix.

[1]

let freq: u32 = match phrase.chars().count() {

@kanru kanru self-assigned this Jul 15, 2023
@kanru kanru added this pull request to the merge queue Jul 15, 2023
Merged via the queue into chewing:master with commit 6921e4a Jul 15, 2023
10 checks passed
@yan12125 yan12125 deleted the fix-db-freq branch July 16, 2023 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

請問 tsi.src 當中的單筆頻率數據的數值上限是多少?
3 participants