Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem in unstack #691

Closed
floswald opened this issue Sep 26, 2014 · 9 comments
Closed

problem in unstack #691

floswald opened this issue Sep 26, 2014 · 9 comments

Comments

@floswald
Copy link
Contributor

hope I'm using this correctly. from looking at the function signature that was my best guess:

julia> df = DataFrame(x1 = [1,2,3,1,2,3],x2=[1,1,1,2,2,2],x3=rand(6))
6x3 DataFrame
| Row | x1 | x2 | x3        |
|-----|----|----|-----------|
| 1   | 1  | 1  | 0.0997254 |
| 2   | 2  | 1  | 0.114421  |
| 3   | 3  | 1  | 0.374605  |
| 4   | 1  | 2  | 0.871304  |
| 5   | 2  | 2  | 0.742842  |
| 6   | 3  | 2  | 0.0358517 |

julia> unstack(df,1,2,3)
ERROR: `symbol` has no method matching symbol(::Int64)
 in map at abstractarray.jl:1328
 in unstack at /Users/florianoswald/.julia/v0.3/DataFrames/src/dataframe/reshape.jl:47

julia> methods(unstack)
#2 methods for generic function "unstack":
unstack(df::AbstractDataFrame,rowkey::Int64,colkey::Int64,value::Int64) at /Users/florianoswald/.julia/v0.3/DataFrames/src/dataframe/reshape.jl:39
unstack(df::AbstractDataFrame,rowkey,colkey,value) at /Users/florianoswald/.julia/v0.3/DataFrames/src/dataframe/reshape.jl:62
@johnmyleswhite
Copy link
Contributor

I've always found stack and unstack inscrutable. I would be so happy to replace them with the semantics of Hadley Wickham's tidyr.

@StefanKarpinski
Copy link
Member

The benefit of just copying all of Hadley's good designs is hard to overstate.

@tshort
Copy link
Contributor

tshort commented Sep 26, 2014

Hadley's tidyr is nice and something to shoot for. It does rely on R's delayed evaluation for some things, so we'd need to use expressions or some other way to get some of the fancier features. For example, Hadley's

res <- gather(messy, drug, heartrate, a:b)

might need to look like:

res = gather(messy, :drug, :heartrate, :(a:b))

We've already gotten into trouble trying to use expressions for things like this, so it would take some thinking. Anyway, back to @floswald's problem. unstack wants to make column names out of the values in the :x1 column. Because they are integers, this won't work. It probably worked back when we used strings for column names. Here's an example that works for me:

df = DataFrame(x1 = ["x","y","z","x","y","z"],x2=[1,1,1,2,2,2],x3=rand(6))
unstack(df, 1, 2, 3)
unstack(df, :x1, :x2, :x3)

We could/should try to make better column names to handle this case.

@StefanKarpinski
Copy link
Member

I've said it before and I'll say it again: having a Column type and doing @columns drug heartrate a b at the top of a file seems like a really clean way to handle this sort of thing. Then there's nothing special about the column names in gather(messy, drug, heartrate, a:b), they're just normal variables whose value indicates that they should be taken to refer to columns.

@floswald
Copy link
Contributor Author

thanks @tshort i wouldn't have guessed that. didn't know tidyr either, but it looks ace (as you suspect it might!). Should definitely have that in julia! :-)

not sure why you guys have strong opinion on the column name thing. :column works alright for me? I guess it interferes with the quote operator or other things? i think in @StefanKarpinski 's version an issue might be if you create columns as you go along, you'd have to say @columns again and again?
just thinking aloud here.

@StefanKarpinski
Copy link
Member

I guess using symbols is ok, but it just feels off to me. The most concrete issue I can think of is that it makes expressing actual symbols hard in these contexts. Not the end of the world, but not great either.

@johnmyleswhite
Copy link
Contributor

I actually kind of love the use of symbols, since it's almost a perfect reversal of embedded SQL: http://infolab.stanford.edu/~ullman/fcdb/oracle/or-proc.html

@floswald
Copy link
Contributor Author

sorry can I just add a question here? I'm still wondering how to best unstack my data.frame. i should convert the column that becomes the new column names into strings, right? how do i do this here?

 df = DataFrame(x1 = [true,true,true,false,false,false],x2=[1,2,3,1,2,3],x3=rand(6))

# doesnt work (see @tshort above)
unstack(df,1,2,3)

# not what i want
julia> string(df[:x1])
"Bool[true,true,true,false,false,false]"

also, for a temporary fix that addresses that one could do something very similar to DataFrames.gennames(ncols)?

@tshort
Copy link
Contributor

tshort commented Sep 29, 2014

To convert a vector to a string, you can do:

map(string, df[:x1])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants