Multiple Columns as Group ID #3

recleev · 2019-02-07T21:44:08Z

I wanted to do something like this

library(cdata)

control_table <- qchar_frame(
  Part   , Measure , Value        |
  "Sepal", "Length", Sepal.Length |
  "Sepal", "Width" , Sepal.Width  |
  "Petal", "Length", Petal.Length |
  "Petal", "Width" , Petal.Width
)

rowrecs_to_blocks(iris, control_table)

But I get this error

Error in rowrecs_to_blocks.default(iris, control_table) : 
  cdata::rowrecs_to_blocks all control table group ids must be distinct

checkControlTable() assumes that the first column is always and the only id column, so when the first column does not have distinct values it throws an error.

I think cdata should be able to support a combination of multiple columns as ids. In my example above, the combination of Part and Measure constitutes a group id. Maybe an extra argument to specify the id cols in the control table can make this work?

Something like this?

rowrecs_to_blocks(iris, control_table, keyColumns = c("Part", "Measure"))

keyColumns (like the one in blocks_to_rowrecs()) can also take a vector of col index to specify the columns to take as group ids. Default should be 1 to keep current behavior.

Here is a work around with {data.table}

control_table_2 <- qchar_frame(
  Part.Measure  , Value        |
  "Sepal.Length", Sepal.Length |
  "Sepal.Width" , Sepal.Width  |
  "Petal.Length", Petal.Length |
  "Petal.Width" , Petal.Width
)

iris_long <- rowrecs_to_blocks(iris, control_table_2)

library(data.table)

iris_long <- as.data.table(iris_long)

iris_long[, c("Part", "Measure") := tstrsplit(Part.Measure, split = "\\.")]

# > iris_long
# Part.Measure      Value  Part  Measure
# 1: Sepal.Length     5.1 Sepal  Length
# 2: Sepal.Width      3.5 Sepal  Width
# 3: Petal.Length     1.4 Petal  Length
# 4: Petal.Width      0.2 Petal  Width
# 5: Sepal.Length     4.9 Sepal  Length
# ---                                 
# 596: Petal.Width    2.3 Petal  Width
# 597: Sepal.Length   5.9 Sepal  Length
# 598: Sepal.Width    3.0 Sepal  Width
# 599: Petal.Length   5.1 Petal  Length
# 600: Petal.Width    1.8 Petal  Width

Splitting columns is easy, but I was just wondering if it can be avoided.

The text was updated successfully, but these errors were encountered:

JohnMount · 2019-02-07T22:12:59Z

Good (and interesting point). Thank you for taking the time to think about it and share it here.

As you commented the current control table definition treats the first column in a special way: it is the record portion key. You are correct more could be done if some set of columns in the control table were allowed to be the record portion keys. blocks_to_rowrecs() does allow some additional key columns from the incoming table, I need to think a bit on if there is a way to extend rowrecs_to_blocks() as you just describe (and what the consequences would be).

This is an asymmetry: column names are single strings (though I think Pandas can be more general than this). Whereas, you are correct composite keys are very natural for rows.

Your current work around is in fact what I would have suggested. But I perhaps we can automate this.

recleev · 2019-02-07T22:46:02Z

Thanks. I use {cdata} whenever I can in my work. controlTable makes data transformation almost automatic and intuitive.

I have not thought of any consequences yet, but I hope there are nothing major and that the benefits will outweigh them.

JohnMount · 2019-02-08T00:19:13Z

I'll probably add the feature soon. The nice thing is if cdata itself does it there is not string manipulation- as all the transforms are in the control table. I had thought about things like this, but decided they might be too advanced/baroque for the initial versions. But now at least one user has thought of it- and I think it regularizes the teaching by making things more uniform.

I have started to lay-down infrastructure to handle the feature and entered a (failing) test to track progress on the feature.

JohnMount · 2019-02-08T20:27:41Z

Got it! (In the development version, for local ops, will probably get to the database versions later.)

library(cdata)
library(rqdatatable)
#> Loading required package: rquery

d <- iris
d$id <- seq_len(nrow(d))

control_table <- qchar_frame(
  Part,  Measure, Value       |
  Sepal, Length, Sepal.Length |
  Sepal, Width, Sepal.Width   |
  Petal, Length, Petal.Length |
  Petal, Width, Petal.Width   )

d %.>%
  rowrecs_to_blocks(
    .,
    control_table,
    controlTableKeys = c("Part", "Measure"),
    columnsToCopy = c("id", "Species")) %.>%
  orderby(., c("id", "Part", "Measure")) %.>%
  head(.)
#>    id Species  Part Measure Value
#> 1:  1  setosa Petal  Length   1.4
#> 2:  1  setosa Petal   Width   0.2
#> 3:  1  setosa Sepal  Length   5.1
#> 4:  1  setosa Sepal   Width   3.5
#> 5:  2  setosa Petal  Length   1.4
#> 6:  2  setosa Petal   Width   0.2

recleev · 2019-02-08T23:11:10Z

Wow. What a quick update. Will try it out as soon as I can, although I may wait for the CRAN update. Thanks!

JohnMount · 2019-02-09T04:56:38Z

I found it an interesting possibility. Your example as a vignette here.

JohnMount closed this as completed Feb 8, 2019

JohnMount reopened this Feb 8, 2019

JohnMount closed this as completed Feb 8, 2019

JohnMount mentioned this issue Feb 8, 2019

Port flexible control table keying to database implementation #4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple Columns as Group ID #3

Multiple Columns as Group ID #3

recleev commented Feb 7, 2019

JohnMount commented Feb 7, 2019 •

edited

Loading

recleev commented Feb 7, 2019

JohnMount commented Feb 8, 2019 •

edited

Loading

JohnMount commented Feb 8, 2019

recleev commented Feb 8, 2019

JohnMount commented Feb 9, 2019

Multiple Columns as Group ID #3

Multiple Columns as Group ID #3

Comments

recleev commented Feb 7, 2019

JohnMount commented Feb 7, 2019 • edited Loading

recleev commented Feb 7, 2019

JohnMount commented Feb 8, 2019 • edited Loading

JohnMount commented Feb 8, 2019

recleev commented Feb 8, 2019

JohnMount commented Feb 9, 2019

JohnMount commented Feb 7, 2019 •

edited

Loading

JohnMount commented Feb 8, 2019 •

edited

Loading