Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python binding #25

Closed
mh-northlander opened this issue Sep 21, 2021 · 4 comments
Closed

Python binding #25

mh-northlander opened this issue Sep 21, 2021 · 4 comments
Assignees
Labels
python Python binding-related
Milestone

Comments

@mh-northlander
Copy link
Collaborator

Create sudachipy conpatible python binding

@mh-northlander
Copy link
Collaborator Author

considering to use PyO3, following hugging-face tokenizers

@mh-northlander mh-northlander self-assigned this Sep 22, 2021
@eiennohito
Copy link
Collaborator

eiennohito commented Sep 24, 2021

What about Python API FFI like this? Main Idea - hide heavyweight API behind traits and use them via objects from Python side.

pub fn dictionary_from_config(cfg: Config) -> SudachiResult<Arc<dyn PyDictionary>> {
    todo!()
}

pub trait PyDictionary {
    fn tokenizer(&self) -> Box<dyn mut PyTokenize>;
}

pub trait PyTokenise {    
    fn tokenize(&mut self, input: &str, mode: Mode, debug: bool) -> SudachiResult<Vec<Morpheme>>;
}

Morpheme should be owning for Python, internal one would not work, binding should handle copying from Rust types to Python ones. Result also may be non-compatible.

@mh-northlander
Copy link
Collaborator Author

PyO3 can only convert struct, so I'm considering to expose wrapper classes like following to hide rust API:

pub struct PyDictionary {
    inner: Arc<JapaneseDictionary>
}

impl PyDictionary {
    fn from_config(config: Config) -> Self {
        todo!();
    }

    fn tokenizer(&self) -> PyTokenizer{
        PyTokenizer { inner: self.inner }
    }
}

pub PyTokenizer {
    inner: StatelessTokenizer<Arc<JapaneseDictionary>>
}

@eiennohito
Copy link
Collaborator

I'd like that we actually write down the list of requirements for the python binding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Python binding-related
Projects
None yet
Development

No branches or pull requests

2 participants