Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_dict doesn't work #2

Open
winston0410 opened this issue Mar 17, 2022 · 1 comment
Open

parse_dict doesn't work #2

winston0410 opened this issue Mar 17, 2022 · 1 comment

Comments

@winston0410
Copy link

I tried to use your library like this:

pub fn wordshk() -> () {
    static DATA_FILE: &'static str = include_str!("../data/words.hk/all.csv");
    let dict = parse_dict(DATA_FILE.as_bytes());
    println!("{:?}", dict);
}

And I hit an error like this:

Err(Error(UnequalLengths { pos: Some(Position { byte: 3, line: 1, record: 1 }), expected_len: 1, len: 5 }))

I am using the csv here: https://github.com/wordshk/data2021/blob/main/all.csv.gz

Cargo.lock info for reproduce(as I am using package from github directly):

[[package]]
name = "wordshk_tools"
version = "1.0.0-beta.10"
source = "git+https://github.com/AlienKevin/wordshk-tools.git?branch=main#e6283cb6415b8fa71ad0019330cd5eb528886d5f"
@AlienKevin
Copy link
Owner

AlienKevin commented Mar 17, 2022

Thanks for trying out this library. I didn't document it clearly. Can you try the parse_dict example I just added to github?
Here are three important preparation steps that I didn't document clearly:

  • Preferabally download the latest CSV file from https://words.hk/static/all.csv.gz
  • Some new edits may contain syntax errors in the latest version. You can find the entry with the error and fix it on words.hk or use this older working CSV in this repo downloaded on Mar 17, 2022.
  • The CSV file may contain an extra "" at the first line or if you downloaded it from the website, it has two lines of metadata at the beginning. In either case, you need to manually delete those lines so the parser can understand the CSV. The metadata looks something like this:
,,"--- Generated 2022-03-16 20:05:03.248395 (UTC) - ALL RIGHTS RESERVED. DO NOT DISTRIBUTE UNLESS WITH EXPLICIT PERMISSION. You may be eligible to use this work under https://words.hk/base/hoifong/ , otherwise please contact email (info at words.hk) for licensing or collaboration enquiries. Although you are not required to notify us when using this CSV (as long as you obtain a permission under the license above), you may want to let us know (via the email above) so that we can notify you when we make significant changes to the format of this file. "
""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants