Sprint3 #3

niklasmohrin · 2023-05-31T19:50:43Z

Note to reviewers: In addition to the task, we also changed the DictionarySegment to use the highest possible index as the ValueID instead of 0.

…ribute vectors value range

- Adding some comments - Giving longer names - Starting the names of protected members with an underscore - Removing the unused / unimplemented `TableScan::check_scan_condition`

phkeese

Hello group 1, nicely done though we do have some ideas for improvements.
Please do not hesitate to ask if you have any questions :D
-- Group 2 @phkeese @23mafi @ClFeSc

phkeese · 2023-06-05T23:26:50Z

src/lib/operators/abstract_operator.cpp


 namespace opossum {

 AbstractOperator::AbstractOperator(const std::shared_ptr<const AbstractOperator> left,
                                   const std::shared_ptr<const AbstractOperator> right)
-    : _left_input(left), _right_input(right) {}


very nice, maybe we should have something similar

phkeese · 2023-06-05T23:27:43Z

src/lib/operators/abstract_operator.cpp


 void AbstractOperator::execute() {
+  Assert(!_was_executed, "Operators shall not be executed twice.");


This should probably be a debug assert, as users must never see this error and correct code should not trigger this.

phkeese · 2023-06-05T23:37:21Z

src/lib/operators/table_scan.cpp

+      const auto value_segment = std::dynamic_pointer_cast<ValueSegment<Type>>(segment);
+      const auto dictionary_segment = std::dynamic_pointer_cast<DictionarySegment<Type>>(segment);
+      const auto reference_segment = std::dynamic_pointer_cast<ReferenceSegment>(segment);
+
+      DebugAssert(value_segment || dictionary_segment || reference_segment,
+                  "Segment has to be ValueSegment, DictionarySegment or ReferenceSegment.");
+
+      if (value_segment) {
+        _scan_value_segment(chunk_id, *value_segment, *pos_list);
+      } else if (dictionary_segment) {
+        _scan_dictionary_segment(chunk_id, *dictionary_segment, *pos_list);
+      } else if (reference_segment) {
+        _scan_reference_segment(*reference_segment, *pos_list);
+        referenced_table = reference_segment->referenced_table();
+        ++reference_segment_count;
+      }


weird structure

How so? How do you think it can be improved?

Sorry, this comment seems to have slipped through. We meant to remove it as we could not think of better ways to structure it. The initial comment was made by me due to the placement of the assert between casts and accesses. It could also be moved to another else-branch with a Fail() but we decided to omit the comment.

phkeese · 2023-06-05T23:39:23Z

src/lib/operators/table_scan.cpp

+    });
+  }
+
+  Assert(reference_segment_count == 0 || (chunk_count == 1 && reference_segment_count == 1),


Consider changing this to a debug assert.

phkeese · 2023-06-05T23:43:54Z

src/lib/storage/dictionary_segment.cpp

-ValueID DictionarySegment<T>::null_value_offset() const {
-  return _is_nullable ? ValueID{1} : ValueID{0};
+  // The maximum value representable within the attribute vector's width.
+  return ValueID{(1u << attribute_vector()->width() * 8) - 1};


Consider using the length of the dictionary and refactoring your table scan. This works because this is cast to INVALID_VALUE_ID which is not matched by your table scan predicate in any case. This solution relies a lot on implicit behavior. Also: When using 32 bit, you return INVALID_VALUE_ID in both branches.

I think using this value as the value id for null is fine, I would go as far saying that I prefer it over all other approaches. As for the behavior in the table scan: We at some point just made the decision to live with this behavior, because the task didn't require any particular handling yet. I think a proper solution would involve checking for null explicitly in the scan, regardless of what particular value id is used for it. I am not sure I understand what you mean by implicit behavior - we did not intend to hide some math around this special value id anywhere (note that, in case you meant that place, the comment in _scan_dictionary_segment is referring to the documented interface of DictionarySegment::{lower,upper}_bound). Did we use this anywhere without noticing?

phkeese · 2023-06-06T07:44:58Z

src/lib/operators/table_scan.cpp

+  }
+}
+
+void TableScan::_scan_reference_segment(const ReferenceSegment& segment, PosList& pos_list) {


Same as above, no check for NULL of search value.

phkeese · 2023-06-06T07:48:26Z

src/lib/operators/table_scan.cpp

+
+void TableScan::_scan_reference_segment(const ReferenceSegment& segment, PosList& pos_list) {
+  const auto predicate = predicate_for_scantype<AllTypeVariant>(_scan_type);
+  for (const auto row_id : *segment.pos_list()) {


This reads a bit strange. Consider moving this reference into its own variable.

phkeese · 2023-06-06T08:08:25Z

src/lib/storage/reference_segment.cpp

+    return NULL_VALUE;
+  }
+  const auto chunk = _referenced_table->get_chunk(row_id.chunk_id);
+  return (*chunk->get_segment(_referenced_column_id))[row_id.chunk_offset];


Please do not use operator[]() (it is meant for debugging only).
Also consider using your accessor methods here.

What would you suggest instead? We are in a function that is supposed to return AllTypeVariant, so we used the accessor that does just that, delegating the conversion from T to AllTypeVariant to whoever has the T.

phkeese · 2023-06-06T08:10:39Z

src/lib/storage/reference_segment.cpp

 }

 size_t ReferenceSegment::estimate_memory_usage() const {
-  // Implementation goes here
-  Fail("Implementation is missing.");
+  return size() * sizeof(RowID);


PosList is shared between ReferenceSegments, beware of counting its memory twice. You also do not count this into the size.

We are counting the entries of the one PosList just once, or am I not seeing this correctly? As for adding the size of this: We just stuck to what the tests for ValueSegment demanded back in sprint 1 which was just to account for the contained tuples. Either way seems reasonable to me

phkeese · 2023-06-06T08:13:30Z

src/test/operators/table_scan_test.cpp

+
+TEST_F(OperatorsTableScanTest, ScanType) {
+  auto scan = std::make_shared<TableScan>(_table_wrapper, ColumnID{0}, ScanType{-1}, 90000);
+  EXPECT_THROW(scan->execute(), std::logic_error);


niklasmohrin · 2023-06-27T09:12:53Z

Merging to get this out of my PR list :^)

Finn-HPI and others added 20 commits May 24, 2023 18:34

implement get_table.cpp

79757eb

implement reference_segment

7e8714a

remove include from get_table

ca206ab

start implementing table_scan

c684e36

change scan_*_segment interfaces and add remaining

95e3c71

create table of referencesegments

5417a6f

refactor scan_predicate

6ca094e

Format

b7487bf

Fix type mismatch in test DoubleScan

2d5d106

Implement scanning of reference segments

45f6208

Change DictionarySegment to have null value id be the last in the att…

8c37f5c

…ribute vectors value range

Implement dictionary scanning

a1cc5ef

work on test coverage

bfb4fc0

operators can only be executed once

92f1592

switch Fail to Assert in TableScan::_on_execute()

b0dd5e9

change Assert to DebugAssert

e37fe42

Some final clean ups, such as:

dba66dd

- Adding some comments - Giving longer names - Starting the names of protected members with an underscore - Removing the unused / unimplemented `TableScan::check_scan_condition`

Remove left over comment

ca7aaaf

Fix compile error with clang

a2672d0

Fix another occurence of that

3daa271

phkeese reviewed Jun 6, 2023

View reviewed changes

niklasmohrin merged commit 8cc1902 into main Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sprint3 #3

Sprint3 #3

niklasmohrin commented May 31, 2023 •

edited

Loading

phkeese left a comment

phkeese Jun 5, 2023

phkeese Jun 5, 2023

phkeese Jun 5, 2023

niklasmohrin Jun 6, 2023

phkeese Jun 7, 2023

phkeese Jun 5, 2023

phkeese Jun 5, 2023

niklasmohrin Jun 6, 2023

phkeese Jun 6, 2023

phkeese Jun 6, 2023

phkeese Jun 6, 2023

niklasmohrin Jun 6, 2023

phkeese Jun 6, 2023

niklasmohrin Jun 6, 2023

phkeese Jun 6, 2023

niklasmohrin commented Jun 27, 2023


		void AbstractOperator::execute() {
		Assert(!_was_executed, "Operators shall not be executed twice.");

Sprint3 #3

Sprint3 #3

Conversation

niklasmohrin commented May 31, 2023 • edited Loading

phkeese left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niklasmohrin commented Jun 27, 2023

niklasmohrin commented May 31, 2023 •

edited

Loading