Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just need to add a feature flag - or at least discuss this.
Otherwise this looks ready to go.
/// Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. | ||
/// Irvine, CA: University of California, School of Information and Computer Science. | ||
pub fn load_iris() -> Dataset<Matrix<f64>, Vector<usize>> { | ||
let data: Vec<f64> = vec![5.1, 3.5, 1.4, 0.2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: It might be easier to use the matrix!
macro here. My thinking is that if we need to add a row there's a little less work.
|
||
/// Dataset container | ||
#[derive(Clone, Debug)] | ||
pub struct Dataset<D, T> where D: Clone + Debug, T: Clone + Debug { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense for now. We might want to be more strict in future if we want to be generic over DataSets
. However, this is something that I don't think we will ever want to do.
@@ -219,3 +219,6 @@ pub mod analysis { | |||
pub mod cross_validation; | |||
pub mod score; | |||
} | |||
|
|||
/// Module for datasets. | |||
pub mod datasets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should feature gate this. My thinking is that if we have a few datasets users will not want to download all of this data by default.
To do this:
- Add a new feature to
Cargo.toml
- In
lib.rs
add a feature flag
Added feature gates. Is it ok to be included by default ATM (as it is likely to be used in most tests)? |
Thanks for the update. It looks good but I'm a little cautious about having the datasets flag included by default. I wanted it feature flagged specifically so that it had to be opted-in. I can see that we will probably want to use it in some tests but I'd try a few ways around this first.
Finally note that we will need to modify the travis CI matrix to include the "datasets" flag. |
43d6e2d
to
90e1944
Compare
OK, made |
This looks good to me but before merging I'd like to check out the branch and play around with it a little. Thanks! |
I checked out the code and I have a few thoughts. I am happy to merge this in without any further changes but we should at least write up a tracking issue for improvements. I think the description of
Also I think it might be a good idea to organize the datasets module a little differently. If we add more datasets the module is going to get large quickly and difficult to manage. I think we should move the iris data into a new use rusty_machine::datasets:iris;
let (inputs, targets) = iris::load(); But we could also use the current format and have a Let me know what you think. If you don't want to make any of these changes now I'll merge and move this information to a separate ticket. |
4cdb18a
to
06e52c4
Compare
Thx for the comment. I've did a change requested. Pls take a look when u have a time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now!
I have a minor nitpick for the features section but so far as I can tell it makes no real difference.
stats = [] | ||
datasets = [] | ||
test = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to include the test
or default
features. These already exist as defined here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, removed.
|
||
use super::Dataset; | ||
|
||
/// Load iris dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is great!
Thank you! Merging now. |
Closes #115. Added
Dataset
struct which hasdata()
andtarget()
impl (intended for supervised learning).Adding more data once API looks OK.