Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

define mapobs behavior for vector of indexes #147

Merged
merged 4 commits into from
Feb 12, 2023
Merged

define mapobs behavior for vector of indexes #147

merged 4 commits into from
Feb 12, 2023

Conversation

CarloLucibello
Copy link
Member

@CarloLucibello CarloLucibello commented Feb 12, 2023

Previous mapobs behavior was either not meaningful or causing error, e.g.

julia> mdata = mapobs(x -> sum(x.a) + sum(x.b), (a = 1:10, b = 11:20))
mapobs(#112, NamedTuple{(:a, :b), Tuple{UnitRange{Int64}, UnitRange{Int64}}})

julia> mdata[1] # OK with integer index
12

julia> mdata[1:2] # ERROR with vector index
ERROR: ArgumentError: broadcasting over dictionaries and `NamedTuple`s is reserved
Stacktrace:
...

This PR settles for a sensible behavior but other choices are possible, for instance

getindex(md::MappedDataset, idx::Vector) = [md.f(getobs(md.data, i)) for i in idx]

but that seems strictly less flexible than what this PR does.

Edit
Playing a bit with this to create transformed dataset I realized I need more customizability, hence the batched argument.

batched = :never is a behavior similar to pytorch transforms, while batched = :always is how HuggingFace dataset's transforms are applied.

@codecov-commenter
Copy link

codecov-commenter commented Feb 12, 2023

Codecov Report

Merging #147 (17f1075) into main (ff2fcc1) will increase coverage by 0.11%.
The diff coverage is 57.14%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main     #147      +/-   ##
==========================================
+ Coverage   88.28%   88.40%   +0.11%     
==========================================
  Files          15       13       -2     
  Lines         589      595       +6     
==========================================
+ Hits          520      526       +6     
  Misses         69       69              
Impacted Files Coverage Δ
src/obstransform.jl 82.69% <57.14%> (-1.40%) ⬇️
src/Datasets/Datasets.jl
src/MLUtils.jl

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@CarloLucibello CarloLucibello merged commit 08226a4 into main Feb 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants