Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sprint3 #3

Merged
merged 20 commits into from
Jun 27, 2023
Merged

Sprint3 #3

merged 20 commits into from
Jun 27, 2023

Conversation

niklasmohrin
Copy link
Collaborator

@niklasmohrin niklasmohrin commented May 31, 2023

Note to reviewers: In addition to the task, we also changed the DictionarySegment to use the highest possible index as the ValueID instead of 0.

Copy link

@phkeese phkeese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello group 1, nicely done though we do have some ideas for improvements.
Please do not hesitate to ask if you have any questions :D
-- Group 2 @phkeese @23mafi @ClFeSc


namespace opossum {

AbstractOperator::AbstractOperator(const std::shared_ptr<const AbstractOperator> left,
const std::shared_ptr<const AbstractOperator> right)
: _left_input(left), _right_input(right) {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice, maybe we should have something similar


void AbstractOperator::execute() {
Assert(!_was_executed, "Operators shall not be executed twice.");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a debug assert, as users must never see this error and correct code should not trigger this.

Comment on lines +135 to +150
const auto value_segment = std::dynamic_pointer_cast<ValueSegment<Type>>(segment);
const auto dictionary_segment = std::dynamic_pointer_cast<DictionarySegment<Type>>(segment);
const auto reference_segment = std::dynamic_pointer_cast<ReferenceSegment>(segment);

DebugAssert(value_segment || dictionary_segment || reference_segment,
"Segment has to be ValueSegment, DictionarySegment or ReferenceSegment.");

if (value_segment) {
_scan_value_segment(chunk_id, *value_segment, *pos_list);
} else if (dictionary_segment) {
_scan_dictionary_segment(chunk_id, *dictionary_segment, *pos_list);
} else if (reference_segment) {
_scan_reference_segment(*reference_segment, *pos_list);
referenced_table = reference_segment->referenced_table();
++reference_segment_count;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird structure

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? How do you think it can be improved?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this comment seems to have slipped through. We meant to remove it as we could not think of better ways to structure it. The initial comment was made by me due to the placement of the assert between casts and accesses. It could also be moved to another else-branch with a Fail() but we decided to omit the comment.

});
}

Assert(reference_segment_count == 0 || (chunk_count == 1 && reference_segment_count == 1),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing this to a debug assert.

ValueID DictionarySegment<T>::null_value_offset() const {
return _is_nullable ? ValueID{1} : ValueID{0};
// The maximum value representable within the attribute vector's width.
return ValueID{(1u << attribute_vector()->width() * 8) - 1};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using the length of the dictionary and refactoring your table scan. This works because this is cast to INVALID_VALUE_ID which is not matched by your table scan predicate in any case. This solution relies a lot on implicit behavior. Also: When using 32 bit, you return INVALID_VALUE_ID in both branches.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using this value as the value id for null is fine, I would go as far saying that I prefer it over all other approaches. As for the behavior in the table scan: We at some point just made the decision to live with this behavior, because the task didn't require any particular handling yet. I think a proper solution would involve checking for null explicitly in the scan, regardless of what particular value id is used for it. I am not sure I understand what you mean by implicit behavior - we did not intend to hide some math around this special value id anywhere (note that, in case you meant that place, the comment in _scan_dictionary_segment is referring to the documented interface of DictionarySegment::{lower,upper}_bound). Did we use this anywhere without noticing?

}
}

void TableScan::_scan_reference_segment(const ReferenceSegment& segment, PosList& pos_list) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, no check for NULL of search value.


void TableScan::_scan_reference_segment(const ReferenceSegment& segment, PosList& pos_list) {
const auto predicate = predicate_for_scantype<AllTypeVariant>(_scan_type);
for (const auto row_id : *segment.pos_list()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads a bit strange. Consider moving this reference into its own variable.

return NULL_VALUE;
}
const auto chunk = _referenced_table->get_chunk(row_id.chunk_id);
return (*chunk->get_segment(_referenced_column_id))[row_id.chunk_offset];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use operator[]() (it is meant for debugging only).
Also consider using your accessor methods here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you suggest instead? We are in a function that is supposed to return AllTypeVariant, so we used the accessor that does just that, delegating the conversion from T to AllTypeVariant to whoever has the T.

}

size_t ReferenceSegment::estimate_memory_usage() const {
// Implementation goes here
Fail("Implementation is missing.");
return size() * sizeof(RowID);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PosList is shared between ReferenceSegments, beware of counting its memory twice. You also do not count this into the size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are counting the entries of the one PosList just once, or am I not seeing this correctly? As for adding the size of this: We just stuck to what the tests for ValueSegment demanded back in sprint 1 which was just to account for the contained tuples. Either way seems reasonable to me


TEST_F(OperatorsTableScanTest, ScanType) {
auto scan = std::make_shared<TableScan>(_table_wrapper, ColumnID{0}, ScanType{-1}, 90000);
EXPECT_THROW(scan->execute(), std::logic_error);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test.

@niklasmohrin
Copy link
Collaborator Author

Merging to get this out of my PR list :^)

@niklasmohrin niklasmohrin merged commit 8cc1902 into main Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants