Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple IO train/predict #1

Closed
pjankiewicz opened this issue Mar 9, 2022 · 2 comments
Closed

Decouple IO train/predict #1

pjankiewicz opened this issue Mar 9, 2022 · 2 comments

Comments

@pjankiewicz
Copy link

pjankiewicz commented Mar 9, 2022

Hi @ankane. I was looking for some Rust based FM / FFM and I found your crate. Good job! It looks quite clean. I would like to test it on my problem. However I would like to skip serialization of my data before I call train/predict. If someone is using Rust and your crate as a library it could be easier to maybe pass the data itself than save it to disk first - which could be a lengthy process. In general it is almost always a good idea to decouple input and output from the actual algorithm.

I propose to change the signatures of the methods to something like this

pub fn train(&mut self, x: Vec<Vec<Node>>, y: Vec<f32>) // of course using references instead of Vec would be probably a good idea
pub fn predict(&self, x: Vec<Vec<Node>>) 
pub fn train_from_file(...)
pub fn predict_from_file(...)

Actually... when I think about it more it perhaps is more complicated than I thought because there should be some Iterator implemented to read the data from the disk. So in fact the signature for x would be more like

train(..., data: I)
where
    I: IntoIterator<Item = (&[Node], &f32)>

And then there should be an Iterator implemented for file based data.

If you are ok with such a change I could contribute this. Let me know your thoughts.

@ankane
Copy link
Owner

ankane commented Mar 12, 2022

Hey @pjankiewicz, thanks for the suggestion! Agree it'd be nice to pass data directly to the library and the interface above makes sense to me, so happy to review a PR. I don't have a strong opinion on the slice vs iterator approach (the slice seems more straightforward and the iterator more flexible/better for large datasets).

@ankane
Copy link
Owner

ankane commented Aug 3, 2022

Cleaning up issues, but still open to a PR.

@ankane ankane closed this as completed Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants