New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[15721] Compiled Catalog Query Access #1339

Open
wants to merge 18 commits into
base: master
from

Conversation

Projects
None yet
9 participants
@nwang57
Contributor

nwang57 commented May 4, 2018

We enabled complied catalog lookup in the first pull request. Based upon this, we further support complied insert and delete queries for all catalogs. We also fix bugs for the following issues:

  • #1298 Need to manually bind the tuple value expression so that equality checks will work correctly.

  • Sequential scan assumes that the output columns should start at offset 0 otherwise PerformBinding will not correctly find the corresponding column attributes.

  • ZoneMapCatalog needs special care to avoid chicken and egg problem. Each sequential plan needs to check the zone map to know whether to scan the tile or not, but checking the zone map requires ZoneMap catalog access which leads to an infinity loop. So we ensure in the ZoneMap catalog manager that if the scanning table is ZoneMap catalog then it will just scan it without checking the ZoneMap

nwang57 and others added some commits Mar 24, 2018

compiled catalog access merged with master
add in predicate

fix

enable debug

enable debug logger

use seq scan for get table

test compile

fix unused

write table object constructor

fix bug

query execute

fix bug

fix bug

fix bug

fix query param

print wrapped tuple

fix bug

cache

cache

fix try

fix shared pointer

clean up

performance test

GetDatabaseObject

column_catalog.pp/h

database_catalog

fix seqscan

fix

db_catalog

db_catalog

fix bug

db_catalog

column_catalog.cpp/h

index catalog

lan catalog

column stat

lang catalog

settings catalog

trigger catalog

zone map catalog

proc_catolog.cpp/h

delete code

clean up db and trigger

cleanup settings catalog

delete code

restore catalog test

format

fix include format

fix bug in scan plan by prashanth

add index test

fixed binding for tuple_value_expression, and changed query_cache_test

added Insert and Delete with Compiled Query in abstract_catalog and table_catalog

compiled seq plan for table catalog by looking up table_oid

fix trigger

changed trigger_test, changed the wrong assumption that triggers are in a certain order

fix settings catalog

query metrics catalog

query metrics catalog

Changed zone_map_catalog, having issue running zone_map_scan_test

using expressionPtr

Edited cloumn_catalog, index_catalog, proc_catalog, settings_catalog
Added Insert and Delete with Compiled Query
Fixed Binding for TupleValueExpression

database catalog insert

database catalog bound

index metrics catalog insert

query history catalog

table metrics insert

database catalog delete

modify catalog inserts

change to complied insert plan

intex metrics catalog delete

table metrics catalog delete

update catalog to use compiled delete plan

fixed code review addressed issue

trigger catalog bound

table catalog bound

language catalog bound

added insert, delete and bouding for zone_map_catalog

clean up

deleted redundant comments in index_catalog

index cache uncomment

fix proc catalog

fix proc catalog

added bind in language_catalog's delete

add comment to zone map manager
Mapping table fix to avoid preallocating memory (#1190)
* Adding new mapping table

* Revert "Fixing non-unique key insert problem"

This reverts commit 4267752.

* Revert LOG_INFO to LOG_TRACE

* Fix segment fault problem by moving munmap() to after ~EpochManager()

* Avoid compiler error

* Enhance log message for mmap()'ed mapping table
@aaron-tian

Make sure header files newly added are necessary. And optimizations can be made in the engineering side.

* @param insert_values tuples to be inserted
* @param txn TransactionContext
* @return Whether insertion is Successful
*/

This comment has been minimized.

@aaron-tian

aaron-tian May 5, 2018

Contributor

As mentioned last time, move comments to header files.

* @param predicate Predicate used in the seq scan
* @param txn TransactionContext
* @return Whether deletion is Successful
*/

This comment has been minimized.

@aaron-tian

aaron-tian May 5, 2018

Contributor

As above, move comments to header files.

* @param txn TransactionContext
*
* @return Unique pointer of vector of logical tiles
*/

This comment has been minimized.

@aaron-tian

aaron-tian May 5, 2018

Contributor

Move comments to header files.

* @param column_offsets columns used for seq scan
* @param predicate Predicate used in the seq scan
* @return true if successfully executes
*/

This comment has been minimized.

@aaron-tian

aaron-tian May 5, 2018

Contributor

Move comments to header files.

auto constant_expr_7 = new expression::ConstantValueExpression(
val7);
tuples.push_back(std::vector<ExpressionPtr>());

This comment has been minimized.

@aaron-tian

aaron-tian May 5, 2018

Contributor

Use emplace_back instead of push_back as the former would construct the object immediately.

config_value = (*result_tiles)[0]->GetValue(0, 0).ToString();
}
PELOTON_ASSERT(result_tuples.size() <= 1);
if (result_tuples.size() != 0) {

This comment has been minimized.

@aaron-tian

aaron-tian May 5, 2018

Contributor

Use empty method to check for emptiness.

config_value = (*result_tiles)[0]->GetValue(0, 0).ToString();
}
PELOTON_ASSERT(result_tuples.size() <= 1);
if (result_tuples.size() != 0) {

This comment has been minimized.

@aaron-tian

aaron-tian May 5, 2018

Contributor

Use empty method to check for emptiness.

// EXPECT_EQ(nullptr, table_object_1);
// txn_manager.CommitTransaction(txn);
//}
//

This comment has been minimized.

@aaron-tian

aaron-tian May 5, 2018

Contributor

Do you have problems passing this test? Are you going to uncomment it later?

This comment has been minimized.

@nwang57

nwang57 May 5, 2018

Contributor

This is the test we create for #1336 . Now it has been fixed.

@@ -24,6 +24,7 @@
#include "storage/storage_manager.h"
#include "type/ephemeral_pool.h"
#include "sql/testing_sql_util.h"
#include "common/timer.h"

This comment has been minimized.

@aaron-tian

aaron-tian May 5, 2018

Contributor

Why do you need to include this header file?

This comment has been minimized.

@nwang57

nwang57 May 5, 2018

Contributor

This is needed for performance testing. We plan to remove it later.

@@ -17,7 +17,7 @@
// 0: oid (pkey)
// 1: tgrelid : table_oid
// 2: tgname : trigger_name
// 3: tgfoid : function_oid
// 3: tgfoid : function_name

This comment has been minimized.

@aaron-tian

aaron-tian May 5, 2018

Contributor

Should tgfoid be changed if function_oid was meant to be function_name?

codegen::BufferingConsumer buffer{{}, context};
bool cached;

This comment has been minimized.

@aaron-tian

aaron-tian May 6, 2018

Contributor

Would be better to define cached below where it's first assigned to a value.

codegen::BufferingConsumer buffer{column_offsets, context};
bool cached;

This comment has been minimized.

@aaron-tian

aaron-tian May 6, 2018

Contributor

Move the cached definition to where it's first assigned.

// Create consumer
codegen::BufferingConsumer buffer{column_offsets, scan_context};
bool cached;

This comment has been minimized.

@aaron-tian

aaron-tian May 6, 2018

Contributor

Move the cached definition to where it's first assigned.

codegen::BufferingConsumer buffer{column_offsets, context};
bool cached;

This comment has been minimized.

@aaron-tian

aaron-tian May 6, 2018

Contributor

Move the cached definition to where it's first assigned.

// execute SELECT a FROM table where a == 40;
codegen::BufferingConsumer buffer_1{{0, 1}, context_1};
bool cached;

This comment has been minimized.

@aaron-tian

aaron-tian May 6, 2018

Contributor

Move the cached definition to where it's first assigned.

@gandeevan gandeevan assigned gandeevan and unassigned gandeevan May 10, 2018

@gandeevan gandeevan self-requested a review May 10, 2018

@@ -142,9 +145,9 @@ bool ColumnCatalog::InsertColumn(oid_t table_oid,
const std::vector<Constraint> &constraints,
type::AbstractPool *pool,
concurrency::TransactionContext *txn) {
(void) pool;

This comment has been minimized.

@gandeevan

gandeevan May 10, 2018

you could probably use the "UNUSED ATTRIBUTE" specifier instead.

The same applies to a bunch of other places where "(void) attr" has been used to pacify the compiler.

This comment has been minimized.

@latelatif

latelatif May 12, 2018

I am not sure why we would want to keep the parameter if we are not using it at all.

values.emplace_back(new expression::ConstantValueExpression(
val11));
values.emplace_back(new expression::ConstantValueExpression(
val12));

This comment has been minimized.

@gandeevan

gandeevan May 10, 2018

Again, the multiple mallocs invocations can be coalesced using an array of objects.

auto *table_oid_expr =
new expression::TupleValueExpression(type::TypeId::INTEGER, 0,
ColumnId::TABLE_OID);

This comment has been minimized.

@gandeevan

gandeevan May 10, 2018

I don't think the allocated memory has been freed. Irrespective, you should probably use smart pointers instead of working with raw pointers. The same applies to a bunch of other places in the code where raw pointers could be replaced with smart pointers.

most_common_freqs = tuple.GetValue(ColumnId::MOST_COMMON_FREQS);
hist_bounds = tuple.GetValue(ColumnId::HISTOGRAM_BOUNDS);
column_name = tuple.GetValue(ColumnId::COLUMN_NAME);
has_index = tuple.GetValue(ColumnId::HAS_INDEX);

This comment has been minimized.

@gandeevan

gandeevan May 10, 2018

Pooja mentioned in the last meeting that tuple.GetValue() is going to deprecated. However, I vaguely remember her mentioning that it could still be used with catalog tables. You might want to talk to her/Andy about this.

@@ -2864,6 +2866,21 @@ class BwTree : public BwTreeBase {
* the mapping table rather than CAS with nullptr
*/
void InitMappingTable() {
mapping_table = (std::atomic<const BaseNode *> *) \
mmap(NULL, 1024 * 1024 * 1024,

This comment has been minimized.

@gandeevan

gandeevan May 11, 2018

  1. The value (1024 x 1024 x 1024) shouldn't be hardcoded (the MAPPING_TABLE_SIZE is #defined to 1MB).

  2. InitMappingTable() seems to be invoked from the bwtree constructor - is 1GB mmaped for every index created? If so, this seems like an overkill.

// NOTE: Only unmap memory here because we need to access the mapping
// table in the above routine. If it was unmapped in ~BwTree() then this
// function will invoke illegal memory access
int munmap_ret = munmap(tree_p->mapping_table, 1024 * 1024 * 1024);

This comment has been minimized.

@gandeevan

gandeevan May 11, 2018

Again, a hard-coded value shouldn't be used.

@@ -107,6 +107,7 @@ class ColumnStatsCatalog : public AbstractCatalog {
private:
ColumnStatsCatalog(concurrency::TransactionContext *txn);
std::vector<oid_t> all_column_ids = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

This comment has been minimized.

@gandeevan

gandeevan May 11, 2018

It seems the purpose of all_column_ids is to use it with a copy constructor to construct a vector with all the column oid's.

Firstly, I don't think you need to declare an all_column_ids vector, it can be easily constructed explicitly at runtime. Moreover hardcoding the values it is not a good idea since if anyone changes the schema of the catalog table, he/she must also remember to update the vector declaration.

A better way to construct this at runtime without breaking abstractions is by using the "NumColumns" method and filling the vector using std::iota.

The same applies to every declaration of all_columns_ids and column_ids.

type::Value* getDefaultValue() {
return default_value.get();
}
type::Value *getDefaultValue() { return default_value.get(); }

This comment has been minimized.

@gandeevan

gandeevan May 11, 2018

This can probably be made inline.

TYPE_OFF = 2
};
enum ZoneMapOffset { MINIMUM_OFF = 0, MAXIMUM_OFF = 1, TYPE_OFF = 2 };

This comment has been minimized.

@gandeevan

gandeevan May 11, 2018

I'm not sure why a strongly typed enum has been replaced by an old-style enum. It's better to stick to stongly typed enums.

The same applies to a bunch of places where old-style enums have been used.

val6);
auto constant_expr_7 = new expression::ConstantValueExpression(
val7);

This comment has been minimized.

@gandeevan

gandeevan May 11, 2018

Since the number of objects to be allocated is known at compile time. Multiple malloc invocations can be replaced by a single invocation by allocating an array of objects with parameterized constructor.

eg: arr_constant_expr = new expression::ConstantValueExpression[8]{{val0},{val1}...{val7}}

This same applies to a bunch of other places where the number of objects to be allocated is known at compile time.

@gandeevan

Overall, the code looks clean, other than some minor changes. Good job on including the doxygen comments.

// search for query
codegen::Query *query = codegen::QueryCache::Instance().Find(insert_plan);;
std::unique_ptr<codegen::Query> compiled_query(nullptr);

This comment has been minimized.

@latelatif

latelatif May 12, 2018

This object compiled_query is never used if the query is cached. Why create it every time? Can be moved inside the if block

query->Execute(std::move(executor_context), buffer,
[&ret](executor::ExecutionResult result) { ret = result; });
return ret.m_result == peloton::ResultType::SUCCESS;

This comment has been minimized.

@latelatif

latelatif May 12, 2018

Should we keep the query in our cache even if it fails?
I am not sure if this is an overkill or even correct but we could probably keep only the compiled queries that return success in our cache

new executor::ExecutorContext(txn, std::move(parameters)));
// search for query
codegen::Query *query = codegen::QueryCache::Instance().Find(insert_plan);;

This comment has been minimized.

@latelatif

latelatif May 12, 2018

We should probably make this a function and reuse the code instead of re writing it for every other function. Something like GetCompiledQuery() which searches for the query in the cache and if not found, compiles it and adds it to the cache.

We can even make it a macro or an inline function if it is on the critical path and we want to save a function call

size_t column_count = catalog_table_->GetSchema()->GetColumnCount();
for (size_t col_itr = 0; col_itr < column_count; col_itr++) {
// Skip any column for update

This comment has been minimized.

@latelatif

latelatif May 12, 2018

Not sure what this comment means here. Could you please help me understand what this means

This comment has been minimized.

@nwang57

nwang57 May 13, 2018

Contributor

I think it is only looking for the columns in the tuple which need to be updated. For more detailed explanation you can turn to @mengranwo

: table_oid(tile->GetValue(tupleId, ColumnCatalog::ColumnId::TABLE_OID)
ColumnCatalogObject::ColumnCatalogObject(codegen::WrappedTuple wrapped_tuple)
: table_oid(wrapped_tuple.GetValue(ColumnCatalog::ColumnId::TABLE_OID)

This comment has been minimized.

@latelatif

latelatif May 12, 2018

As mentioned by @gandeevan, GetValue is deprecated now. Please talk to @poojanilangekar to make sure you can use this for a wrapped tuple object safely

tuple->SetValue(ColumnId::COLUMN_NAME, val_column_name, pool);
tuple->SetValue(ColumnId::HAS_INDEX, val_has_index, nullptr);
tuples.emplace_back();
// tuples.push_back(std::vector<ExpressionPtr>());

This comment has been minimized.

@latelatif
LanguageCatalogObject::LanguageCatalogObject(executor::LogicalTile *tuple)
: lang_oid_(tuple->GetValue(0, 0).GetAs<oid_t>()),
lang_name_(tuple->GetValue(0, 1).GetAs<const char *>()) {}
LanguageCatalogObject::LanguageCatalogObject(codegen::WrappedTuple tuple)

This comment has been minimized.

@latelatif

latelatif May 12, 2018

Do we want to make a copy of the WrappedTuple parameter? Wouldn't using a reference suffice?

namespace peloton {
namespace catalog {
#define PROC_CATALOG_NAME "pg_proc"
ProcCatalogObject::ProcCatalogObject(executor::LogicalTile *tile,
ProcCatalogObject::ProcCatalogObject(codegen::WrappedTuple wrapped_tuple,

This comment has been minimized.

@latelatif

latelatif May 12, 2018

Same as above. Is a reference not good enough? This applies to all such uses in other constructors as well

GetResultWithCompiledSeqScan(column_ids, predicate, txn);
// carefull! the result could be null!
LOG_INFO("size of the result tiles = %lu", result_tuples.size());

This comment has been minimized.

@latelatif

latelatif May 12, 2018

Is this still needed? Should it be LOG_INFO?

@@ -40,7 +40,7 @@ namespace catalog {
class ColumnCatalogObject {
public:
ColumnCatalogObject(executor::LogicalTile *tile, int tupleId = 0);
ColumnCatalogObject(codegen::WrappedTuple wrapped_tuple);

This comment has been minimized.

@latelatif

latelatif May 12, 2018

Use a reference instead

@latelatif

Some of the test cases fail when running make check

The following tests FAILED:
52 - stats_test (Failed)
54 - zone_map_scan_test (Failed)
Errors while running CTest
make[3]: *** [test/CMakeFiles/check] Error 8
make[2]: *** [test/CMakeFiles/check.dir/all] Error 2
make[1]: *** [test/CMakeFiles/check.dir/rule] Error 2
make: *** [check] Error 2
akanjani@dev3:~/review/peloton/build$ git status
On branch pcq-cp2
Your branch is up-to-date with 'origin/pcq-cp2'.

nothing to commit, working directory clean
akanjani@dev3:~/review/peloton/build$

Zeninma added some commits May 12, 2018

There are 3 bugs need to be fixed:
1. valgrind memory leak during fillepredicatearray
2. uninitialized Value being moved/destroyed - this has been solved
3. Concurrency issue, need to put an issue on this
for CatalogObject constructor change the wrapped_tuple from pass by v…
…alue to pass by reference. Also add UNUSED_ATTRIBUTE instead of (void).
@coveralls

This comment has been minimized.

coveralls commented May 18, 2018

Coverage Status

Coverage increased (+0.04%) to 77.595% when pulling 89161d8 on nwang57:pcq-cp2 into 5686479 on cmu-db:master.

@apavlo

This comment has been minimized.

Member

apavlo commented Jun 21, 2018

I am reviving this PR. We will need this when we get rid of the interpreted engine. It also has bug fix for #1362

@apavlo apavlo requested a review from pervazea Jun 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment