Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AbstractDataVec should inherit from AbstractVector #23

Closed
tshort opened this issue Jul 16, 2012 · 4 comments
Closed

AbstractDataVec should inherit from AbstractVector #23

tshort opened this issue Jul 16, 2012 · 4 comments
Labels

Comments

@tshort
Copy link
Contributor

tshort commented Jul 16, 2012

This will make a lot of functions work, including most functions in statistics.jl (mean, median, etc.).

It will take some work to go through and cut down on warnings and tweak things. Also, the default functions won't pay attention to the nafilter or nareplace indicators, but methods supporting those can be added as we go along.

@HarlanH
Copy link
Contributor

HarlanH commented Sep 24, 2012

Some more thoughts about this. The problem with ADVs inheriting from AbstractVector is that it complicates operations that need to ignore NAs. Consider mean(dv). If dv has NAs, I think we can agree that the operation should fail. That works now. :) But if we have mean(naIgnore(dv)), it's unclear how the mean(AbstractVector) method should work. When the loop gets to dv[i], where that element is an NA, what should ref return?

Instead, I think it's better to have a set of numerical methods that work on iterable types. There's already a sum method that does this. See base/reduce.jl. It would be easy to write mean and any number of other methods that use duck-typing (since we don't have an Iterable type and multiple inheritance, yet) to efficiently calculate numerical summaries of an ADV with NAs.

Recommend we close this and replace it with an issue to implement statistical summary methods that work on iterables.

cc @johnmyleswhite @doobwa

@tshort
Copy link
Contributor Author

tshort commented Sep 24, 2012

One way around the problem is to have naIgnore(dv) return a different type that would not be an AbstractVector. That's the direction I was heading with regular arrays (see issue #54). I like this better than stuffing a flag into DataVecs. One of the reasons I like it is that if you can inherit from AbstractVectors, then many functions work for the case where there are no NAs. Then, adding support for NAs involves writing iterables to support the filter and replace options.

Either way, a lot more work needs to be done to improve support for DataVecs, and iterables are a good way to go. As far as this particular issue, I'll leave that up to you.

@HarlanH
Copy link
Contributor

HarlanH commented Sep 24, 2012

Hm, that would make sense to me if the primary goal was to avoid meta-data
inside DataVecs. But I don't think that's a good principle here. The
distinction between Data and non-Data is that Data has context aside from
its values. In particular, for statistical or measured data, it has
missing-values, as well as things like the nominal/ordinal/interval/ratio
distinction, which I'm planning to add soon to cover some of the issues
around categorical data we've been running into.

I'd prefer that we have the principle that a method either works for Data
or it doesn't, and that it doesn't depend on whether or not the Data has
NAs. You can always naignore() or nareplace() to get a non-Data vector that
can be processed by anything.

Other thoughts?

On Mon, Sep 24, 2012 at 9:37 AM, Tom Short notifications@github.com wrote:

One way around the problem is to have naIgnore(dv) return a different
type that would not be an AbstractVector. That's the direction I was
heading with regular arrays (see issue #54https://github.com/HarlanH/JuliaData/issues/54).
I like this better than stuffing a flag into DataVecs. One of the reasons I
like it is that if you can inherit from AbstractVectors, then many
functions work for the case where there are no NAs. Then, adding support
for NAs involves writing iterables to support the filter and replace
options.

Either way, a lot more work needs to be done to improve support for
DataVecs, and iterables are a good way to go. As far as this particular
issue, I'll leave that up to you.


Reply to this email directly or view it on GitHubhttps://github.com/HarlanH/JuliaData/issues/23#issuecomment-8818927.

@johnmyleswhite
Copy link
Contributor

Closed by 8d9ce5c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants