Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYSTEMDS-3589] Frame single column ragged array #1857

Closed
wants to merge 9 commits into from

Conversation

OlgaOvcharenko
Copy link
Contributor

This PR adds a ragged array for a single column containing row values. It is just a wrapper around actual frame column Arrays.
It can be used for the transformencode metadata frames.

Thanks for the review.

@phaniarnab
Copy link
Contributor

I don't fully understand the use case of a ragged array-based frame. Is the metatadata from a transformencode the only case where this feature helps? Will splitting a metadata frame into a list of frames (one per column) not serve the purpose?

@OlgaOvcharenko
Copy link
Contributor Author

I don't fully understand the use case of a ragged array-based frame. Is the metatadata from a transformencode the only case where this feature helps? Will splitting a metadata frame into a list of frames (one per column) not serve the purpose?

The ragged array is intended to save mem since the metadata frame is mostly nulls. We probably do not want to change the API of transformencode; therefore, I do not think a list of frames would be a suitable solution. This change makes no outside API changes and reduces memory allocation and the on-disk size of the metadata that we save after transformencode.

Copy link
Contributor

@Baunsgaard Baunsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good start, some details missing.

}

@Override
public Object get() {
throw new NotImplementedException("Unimplemented method 'get'");
return _a.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw exception on this one. since the API expects to get the full array of size m.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. However, OptionalArray.get returns an Object and it makes sense to me.

}

@Override
public byte[] getAsByteArray() {
throw new NotImplementedException("Unimplemented method 'getAsByteArray'");
return _a.getAsByteArray();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw some exception here. this will not work, since the caller expects the byte array to be certain m size * nBytes per value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, I believe that optional should also throw an exception to be consistent?
Since currently it is not the case:

public byte[] getAsByteArray() {
     // technically not correct.
    return _a.getAsByteArray();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes optional should throw exception as well.

}

@Override
public long getExactSerializedSize() {
throw new NotImplementedException("Unimplemented method 'getExactSerializedSize'");
return _a.getExactSerializedSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • int size.

Baunsgaard pushed a commit to Baunsgaard/systemds that referenced this pull request Aug 21, 2023
This commit contains code to add a simple ragged array, that allows us
to allocate columns in frames with a lower number of contained materialized
values.

Closes apache#1857
@Baunsgaard
Copy link
Contributor

Thanks for the commit, i am merging and fixing remaining issues in other PR.

@Baunsgaard Baunsgaard closed this Aug 21, 2023
Baunsgaard pushed a commit that referenced this pull request Aug 21, 2023
This commit contains code to add a simple ragged array, that allows us
to allocate columns in frames with a lower number of contained materialized
values.

Closes #1857
Closes #1884
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
3 participants