-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subset filearray into filearray proxy #5
Comments
Yes I think so =) methods(class="FileArray")
[1] $ $<- [ [<- [[ apply
[7] as.array coerce dim dimnames dimnames<- fwhich
[13] initialize length mapreduce max min range
[19] show subset sum typeof Btw, would it be possible to lazy-load a netcdf file as library(stars)
proxy <- stars::read_stars(system.file("nc/reduced.nc", package="stars"), proxy=T) # lazy-load nc file
message("proxy obj needs ", format(utils::object.size(proxy), units="auto"))
#proxy obj needs 12.3 Kb
stars <- stars::st_as_stars(proxy) # convert to accessible data = use memory
message("stars obj needs ", format(utils::object.size(stars), units="auto"))
#stars obj needs 519.3 Kb
methods(class="stars_proxy")
[1] Math Ops [ [<-
[5] [[<- adrop aggregate aperm
[9] as.data.frame c coerce dim
[13] droplevels filter hist initialize
[17] is.na merge plot predict
[21] print show slotsFromS3 split
[25] st_apply st_as_sf st_as_stars st_crop
[29] st_dimensions<- st_downsample st_mosaic st_redimension
[33] st_sample st_set_bbox write_stars Thanks a lot for your great work! |
That sounds like a good idea. There will be some limitations to the types of methods available. point-wise methods such as
No you are good. Glad you brought up this feature request.
Not natively. I think you can convert the arrays though. I'm not very familiar with the low-level implementation of
The performance comes with costs. For example, random access is relatively slow. filearray does not use universal file formats that can be read from other programs. The data array is only expandable along the last margin... If you are OK with these disadvantages, or have alternative methods to get around, |
Hi @chrisdane I have added this experimental feature to branch https://github.com/dipterix/filearray/tree/lazyeval Would you mind helping me check this branch to see if there is method that you want to support? Also please let me know if you find any bugs :) You can install and compile this dev branch via remotes::install_github("dipterix/filearray@lazyeval") If you run on Windows, Here's a sanity test: > x <- as_filearray(1:24, dimension = c(4,6))
> y <- (2^(x - 1) + log(x)) > 10000 | x <= 2
> print(y)
Reference class object of class "FileArrayProxy"
Mode: readwrite
UUID: 0005-640eaaf8-c6e7-4f55-aa6e-2956a872155c (depth=5)
Dimension: 4x6
Partition count: 6
Partition size: 1
Data type: logical
Internal type: integer
Location: $TEMPDIR/tmpfilearray11ef51b065fe9.farr
> x[y]
[1] 1 2 15 16 17 18 19 20 21 22 23 24
> # Sanity check
> x[][(2^(x[] - 1) + log(x[])) > 10000 | x[] <= 2]
[1] 1 2 15 16 17 18 19 20 21 22 23 24 |
Added as of 0.1.6 |
The original issue: dipterix/lazyarray#3
Is it possible that subsetting a lazyarray again yields a lazyarray?
I am a bit puzzled whether I use your package correctly, e.g.
During this call, arr[] fully populates the memory, i.e. the whole lazy-aspect is gone?
Original reply:
Hi @chrisdane , the development for this package has been paused in favor of https://github.com/dipterix/filearray , a very similar package that offers better performance and more functions. This package (
lazyarray
) is still on CRAN because some of my old projects are still depending on it, but soon the migration will complete. I'm sorry for the inconvenience.Back to your question. It's not straightforward to subset lazyarray/filearray in that way for now because I'm dealing with arrays with sizes of 10GB+. Your proposed operations might need to create a new array on disk. This could very easily fill up the hard disks if not carefully treated.
It's true that once you call
[
, the data will be loaded into memory, hence the "lazy" aspect goes away.What I could do, however, is I might be able to set some lazy-evaluated proxies. The proxies does not evaluate the arrays immediately. Instead, they only evaluate when you subset the arrays:
Does that resolve your problems?
The text was updated successfully, but these errors were encountered: