Skip to content
This repository has been archived by the owner on May 4, 2019. It is now read-only.

Indexing with missing values #207

Closed
diegozea opened this issue Jul 20, 2016 · 5 comments
Closed

Indexing with missing values #207

diegozea opened this issue Jul 20, 2016 · 5 comments

Comments

@diegozea
Copy link

datos[:x_13] .== "#0_PHY" returns a DataArrays.DataArray{Bool,1} with NAs, this causes datos[datos[:x_13] .== "#0_PHY",:] to throw the following error:

LoadError: DataArrays.NAException("cannot index an array with a DataArray containing NA values")
while loading In[8], in expression starting on line 1

 in to_index at /home/dzea/.julia/v0.4/DataArrays/src/indexing.jl:76
 in getindex at /home/dzea/.julia/v0.4/DataArrays/src/indexing.jl:173
 in getindex at /home/dzea/.julia/v0.4/DataFrames/src/dataframe/dataframe.jl:281

Related to: JuliaData/DataFramesMeta.jl#58

@johnmyleswhite
Copy link
Member

johnmyleswhite commented Jul 22, 2016

I'm unclear what this issue is about. There was a pretty clear design decision made to not support indexing an array with NA values. Are you unhappy with that design decision? Or are you reporting an implementation error?

@diegozea
Copy link
Author

diegozea commented Jul 22, 2016

I found this error using DataFramesMeta and @nalimilan suggest me to post it here. I expected @where to do not return the rows where :x_13 has NAs, but @where throws a DataArrays error.

@garborg
Copy link
Member

garborg commented Jul 22, 2016

It was a design decision, before my time, -- I bet it just slipped @nalimilan's mind when he saw your issue come in. FWIW, personally, I'd be for a terse way to map NA/null to false, but dropping implicitly when filtering (where clause, etc.) still doesn't seem great to me.

@diegozea
Copy link
Author

Yes, maybe dropping values by default is not the best idea... Maybe a keyword argument like dropna=true for where? R introduces a NA value in the output array for each position with a NA in the index array.

@nalimilan
Copy link
Member

Sorry, I read that issue too quickly. So this needs to be addressed in DataFramesMeta by providing a convenient way of skipping missing values. Let's discuss that on the other issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants