Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading only a subset of columns #78

Open
CarlColglazier opened this issue Dec 15, 2020 · 2 comments
Open

Reading only a subset of columns #78

CarlColglazier opened this issue Dec 15, 2020 · 2 comments

Comments

@CarlColglazier
Copy link

Please correct me if this is possible already. I looked through the source code and the documentation and did not find a clear way to do this: basically, I want to read a FeatherV2 file, but not mmap every single column. I already know which columns I need and I'd like to tell Arrow.Table the subset of columns I want read into memory.

This is similar to this issue on Feather.jl.

This seems to be possible in the R arrow package using col_select.

@quinnj
Copy link
Member

quinnj commented Dec 15, 2020

Hey @CarlColglazier, thanks for opening an issue. We could probably support keyword arguments like select and drop, but note that it wouldn't change how much memory is "mmapped". Arrow tables are stored in a single memory blob and there isn't really a way to only mmap a few columns. You still have to read the header/metadata to figure out the offsets of specific columns into the data.

So, happy to support select/drop, since it can be convenient to only get back the columns you really need, but I just want to point out that I wouldn't expect there to be any real effect on memory/performance.

@JayjeetAtGithub
Copy link

I went through the feather c++ source code and it seems this hasn't been fixed yet in the upstream C++ api. Am i right ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants