Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.SD locked error using foverlaps after ordering by variable #2099

Closed
jrderuiter opened this issue Apr 4, 2017 · 7 comments · Fixed by #3807
Closed

.SD locked error using foverlaps after ordering by variable #2099

jrderuiter opened this issue Apr 4, 2017 · 7 comments · Fixed by #3807

Comments

@jrderuiter
Copy link

jrderuiter commented Apr 4, 2017

I am trying to sort out why some code from an existing R package (not my own code) runs fine on data.table version 1.9.4 but gives an error with later versions of data.table.

The code the package uses looks something like this, in which foverlaps is used to determine the overlap between a set of copy number segments and the corresponding probes (seg.cna represents the segments, and markers represents the probes):

library(data.table)

seg.cna <- data.table(
    Sample=c('S001', 'S002', 'S003', 'S004', 'S005', 'S006'), 
    Chromosome=rep('1', 6), Start=rep(3252007, 6), 
    End=c(3252007, 3714033, 3714033, 4083031, 4083031, 4214828), 
    LogRatio=c(-0.166154, -0.660141, 0.404224, -0.375556, 0.0, -0.354481), 
    Segment=c('Seg1', 'Seg2', 'Seg3', 'Seg4', 'Seg5', 'Seg6'))

markers <- data.table(
    Name=c('P0001', 'P0002', 'P0003'),
    Chromosome=rep('1', 3), 
    Position=c(3252007, 3714033, 4083031))

# Without this ordering it works fine.
seg.cna <- seg.cna[,Chromosome:=ordered(toupper(Chromosome), c('1'))]

setkey(seg.cna, Chromosome, Start, End)

seg.markers <- seg.cna[, 
    foverlaps(markers[, .(Name, Chromosome, Position, Position2=Position)],
             .SD,
             by.x=c('Chromosome', 'Position','Position2'),
             type='within', nomatch=0L)[, .(ProbesNo=.N), by=Segment],
             .SDcols=c('Chromosome', 'Start', 'End', 'Segment'),
    by=Sample][, Sample:=NULL]

Running this code in data.table > 1.9.4 gives the following error:

Error in set(i, j = lc, value = newfactor) : 
  .SD is locked. Updating .SD by reference using := or set are reserved for future use. Use := in j directly. Or use copy(.SD) as a (slow) last resort, until shallow() is exported.

With the following traceback:

7: set(i, j = lc, value = newfactor)
6: bmerge(xx, ii, seq_along(xx), seq_along(xx), haskey(xx), integer(0), 
       mult = mult, ops = rep(1L, length(xx)), integer(0), 1L, verbose = verbose, 
       ...)
5: matches(x, y, intervals[2L], rollends = rep(type == "any", 2L), 
       ...)
4: indices(uy, y, yintervals, nomatch = 0L, roll = roll)
3: foverlaps(markers[, list(Name, Chromosome, Position, Position2 = Position)], 
       .SD, by.x = c("Chromosome", "Position", "Position2"), type = "within", 
       nomatch = 0L)
2: `[.data.table`(seg.cna, , foverlaps(markers[, .(Name, Chromosome, 
       Position, Position2 = Position)], .SD, by.x = c("Chromosome", 
       "Position", "Position2"), type = "within", nomatch = 0L)[, 
       .(ProbesNo = .N), by = Segment], .SDcols = c("Chromosome", 
       "Start", "End", "Segment"), by = Sample)
1: seg.cna[, foverlaps(markers[, .(Name, Chromosome, Position, Position2 = Position)], 
       .SD, by.x = c("Chromosome", "Position", "Position2"), type = "within", 
       nomatch = 0L)[, .(ProbesNo = .N), by = Segment], .SDcols = c("Chromosome", 
       "Start", "End", "Segment"), by = Sample]

This seems similar to the error encountered in #1341, however the error we observe still persists in data.table >= 1.9.8. What seems to trigger the error is the following line in the above code (as without this line the example runs fine):

seg.cna <- seg.cna[,Chromosome:=ordered(toupper(Chromosome), c('1'))]

Any idea if this is related to the previous issue?

R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.4

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.10.5

loaded via a namespace (and not attached):
[1] tools_3.3.2
@jrderuiter
Copy link
Author

jrderuiter commented Apr 24, 2017

Solved by another person in our group by using .copy(.SD) instead of .SD.

@franknarf1
Copy link
Contributor

Glad that works, but maybe it is still a bug? The error cites using := or set on .SD, but you use neither.

@jrderuiter
Copy link
Author

Bmerge is using set internally. I'm not sure if this is supposed to be the intended behavior.

@franknarf1
Copy link
Contributor

franknarf1 commented Apr 24, 2017

Ok. Well, fwiw, I don't think it should cause trouble; have posted a similar bug (?) over here: #1926

@jrderuiter
Copy link
Author

I'm inclined to agree. In any case, the error message is uninformative and confusing.

@jrderuiter jrderuiter reopened this Apr 24, 2017
@arunsrinivasan
Copy link
Member

Ha.. I think the locked attribute needs to be removed (IIRC all operations happen on shallow copied tables, and all reference operations are full column plonks). Will take a look.

@Henrik-P
Copy link

Henrik-P commented Sep 1, 2019

unlock shallow copy of .SD when needed in joins in j seems to have fixed this issue as well.

The conversion of 'Chromosome' to factor no longer generates an error.

seg.cna <- seg.cna[,Chromosome:=ordered(toupper(Chromosome), c('1'))]
setkey(seg.cna, Chromosome, Start, End)

seg.markers <- seg.cna[, 
    foverlaps(markers[, .(Name, Chromosome, Position, Position2=Position)],
             .SD,
             by.x=c('Chromosome', 'Position','Position2'),
            type='within', nomatch=0L)[, .(ProbesNo=.N), by=Segment],
            .SDcols=c('Chromosome', 'Start', 'End', 'Segment'),
    by=Sample][, Sample:=NULL]

seg.markers
#    Segment ProbesNo
# 1:    Seg1        1
# 2:    Seg2        2
# 3:    Seg3        2
# 4:    Seg4        3
# 5:    Seg5        3
# 6:    Seg6        3

data.table 1.12.3 IN DEVELOPMENT built 2019-09-01 03:43:32 UTC

Thanks a lot for your great work with this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants