problem in unstack #691

floswald · 2014-09-26T12:36:16Z

hope I'm using this correctly. from looking at the function signature that was my best guess:

julia> df = DataFrame(x1 = [1,2,3,1,2,3],x2=[1,1,1,2,2,2],x3=rand(6))
6x3 DataFrame
| Row | x1 | x2 | x3        |
|-----|----|----|-----------|
| 1   | 1  | 1  | 0.0997254 |
| 2   | 2  | 1  | 0.114421  |
| 3   | 3  | 1  | 0.374605  |
| 4   | 1  | 2  | 0.871304  |
| 5   | 2  | 2  | 0.742842  |
| 6   | 3  | 2  | 0.0358517 |

julia> unstack(df,1,2,3)
ERROR: `symbol` has no method matching symbol(::Int64)
 in map at abstractarray.jl:1328
 in unstack at /Users/florianoswald/.julia/v0.3/DataFrames/src/dataframe/reshape.jl:47

julia> methods(unstack)
#2 methods for generic function "unstack":
unstack(df::AbstractDataFrame,rowkey::Int64,colkey::Int64,value::Int64) at /Users/florianoswald/.julia/v0.3/DataFrames/src/dataframe/reshape.jl:39
unstack(df::AbstractDataFrame,rowkey,colkey,value) at /Users/florianoswald/.julia/v0.3/DataFrames/src/dataframe/reshape.jl:62

The text was updated successfully, but these errors were encountered:

johnmyleswhite · 2014-09-26T17:18:32Z

I've always found stack and unstack inscrutable. I would be so happy to replace them with the semantics of Hadley Wickham's tidyr.

StefanKarpinski · 2014-09-26T19:55:57Z

The benefit of just copying all of Hadley's good designs is hard to overstate.

tshort · 2014-09-26T20:10:02Z

Hadley's tidyr is nice and something to shoot for. It does rely on R's delayed evaluation for some things, so we'd need to use expressions or some other way to get some of the fancier features. For example, Hadley's

res <- gather(messy, drug, heartrate, a:b)

might need to look like:

res = gather(messy, :drug, :heartrate, :(a:b))

We've already gotten into trouble trying to use expressions for things like this, so it would take some thinking. Anyway, back to @floswald's problem. unstack wants to make column names out of the values in the :x1 column. Because they are integers, this won't work. It probably worked back when we used strings for column names. Here's an example that works for me:

df = DataFrame(x1 = ["x","y","z","x","y","z"],x2=[1,1,1,2,2,2],x3=rand(6))
unstack(df, 1, 2, 3)
unstack(df, :x1, :x2, :x3)

We could/should try to make better column names to handle this case.

StefanKarpinski · 2014-09-26T20:31:22Z

I've said it before and I'll say it again: having a Column type and doing @columns drug heartrate a b at the top of a file seems like a really clean way to handle this sort of thing. Then there's nothing special about the column names in gather(messy, drug, heartrate, a:b), they're just normal variables whose value indicates that they should be taken to refer to columns.

floswald · 2014-09-27T16:00:51Z

thanks @tshort i wouldn't have guessed that. didn't know tidyr either, but it looks ace (as you suspect it might!). Should definitely have that in julia! :-)

not sure why you guys have strong opinion on the column name thing. :column works alright for me? I guess it interferes with the quote operator or other things? i think in @StefanKarpinski 's version an issue might be if you create columns as you go along, you'd have to say @columns again and again?
just thinking aloud here.

StefanKarpinski · 2014-09-27T16:19:01Z

I guess using symbols is ok, but it just feels off to me. The most concrete issue I can think of is that it makes expressing actual symbols hard in these contexts. Not the end of the world, but not great either.

johnmyleswhite · 2014-09-27T16:35:41Z

I actually kind of love the use of symbols, since it's almost a perfect reversal of embedded SQL: http://infolab.stanford.edu/~ullman/fcdb/oracle/or-proc.html

floswald · 2014-09-28T18:31:29Z

sorry can I just add a question here? I'm still wondering how to best unstack my data.frame. i should convert the column that becomes the new column names into strings, right? how do i do this here?

 df = DataFrame(x1 = [true,true,true,false,false,false],x2=[1,2,3,1,2,3],x3=rand(6))

# doesnt work (see @tshort above)
unstack(df,1,2,3)

# not what i want
julia> string(df[:x1])
"Bool[true,true,true,false,false,false]"

also, for a temporary fix that addresses that one could do something very similar to DataFrames.gennames(ncols)?

tshort · 2014-09-29T11:29:18Z

To convert a vector to a string, you can do:

map(string, df[:x1])

floswald closed this as completed Sep 27, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem in unstack #691

problem in unstack #691

floswald commented Sep 26, 2014

johnmyleswhite commented Sep 26, 2014

StefanKarpinski commented Sep 26, 2014

tshort commented Sep 26, 2014

StefanKarpinski commented Sep 26, 2014

floswald commented Sep 27, 2014

StefanKarpinski commented Sep 27, 2014

johnmyleswhite commented Sep 27, 2014

floswald commented Sep 28, 2014

tshort commented Sep 29, 2014

problem in unstack #691

problem in unstack #691

Comments

floswald commented Sep 26, 2014

johnmyleswhite commented Sep 26, 2014

StefanKarpinski commented Sep 26, 2014

tshort commented Sep 26, 2014

StefanKarpinski commented Sep 26, 2014

floswald commented Sep 27, 2014

StefanKarpinski commented Sep 27, 2014

johnmyleswhite commented Sep 27, 2014

floswald commented Sep 28, 2014

tshort commented Sep 29, 2014