New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[15721] Blocking Schema Changes #1341

Open
wants to merge 69 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@sxzh93

sxzh93 commented May 5, 2018

Overview:
This PR contains all blocking schema changes implementation for Add/Drop column, rename column, and change column type. Alter table for Add/Drop column and change column type operations could be aggregated in one single SQL statement; rename column is separated alone.

Implementation Changes:
1.Functional code: We construct a new schema object for target table in parser, this object includes all changes related to this alter statement. Executor will execute the alter plan and invoke catalog to replace old table with new table using the new schema object. Catalog will use a scan to apply the changes to all tuples.
2.JUnit test: add unit test for add/drop column, change column type.

Known issues:
1.Multiple transactions bugs.
2.We don't handle drop column contains foreign key in this PR.

DeanChensj and others added some commits Apr 2, 2018

Revert "Merge pull request #1 from DeanChensj/catlog"
This reverts commit 6704d0e, reversing
changes made to 39fa518.
This commit added parser support of alter table
    - added alter and rename plan and statements
    - modified optimizor to bind db name and make plans
    - modified parsenodes
    - added node to string support and vice versa
    - added nodetransform of alter table
Revert "Merge pull request #1 from DeanChensj/catlog"
This reverts commit 6704d0e, reversing
changes made to 39fa518.
This commit added parser support of alter table
    - added alter and rename plan and statements
    - modified optimizor to bind db name and make plans
    - modified parsenodes
    - added node to string support and vice versa
    - added nodetransform of alter table
Revert "Merge pull request #1 from DeanChensj/catlog"
This reverts commit 6704d0e, reversing
changes made to 39fa518.
Revert "Merge pull request #1 from DeanChensj/catlog"
This reverts commit 6704d0e, reversing
changes made to 39fa518.

Dingshilun and others added some commits May 3, 2018

Merge pull request #17 from sxzh93/change_type
added change type, changed logic in alter_executor, varchar now has default length.
Merge pull request #18 from sxzh93/alter_test
basic test for alterTable
Merge pull request #19 from sxzh93/varchar_length
added alter varchar length support, changed the plan to use schema
// txn->RecordDrop(database_oid, old_table->GetOid(), INVALID_OID);
// Final step of physical change should be moved to commit time
database->ReplaceTableWithOid(table_oid, new_table);

This comment has been minimized.

@star013

star013 May 11, 2018

[memory leak]
old table object is not explicitly freed.
database->ReplaceTableWithOid just replace the table pointer to new_table but old_table still exists.

This comment has been minimized.

@DeanChensj

DeanChensj May 12, 2018

Contributor

Thanks for pointing this out! We left a TODO here, and we are still looking for the correct way to handle this.

new_table->AddIndex(new_index);
// reinsert record into pg_index
pg_index->InsertIndex(

This comment has been minimized.

@star013

star013 May 11, 2018

[race condition]
It is risky to manipulate index directly under multiple transactions, because it is not protected by any consistency control mechanism like locks. It is better to do such operations through an executor just like your step 4(use a SeqScanPlan). The index executor with lock mechanism will take care of consistency.

This comment has been minimized.

@DeanChensj

DeanChensj May 12, 2018

Contributor

Good idea!

This comment has been minimized.

@Dingshilun

Dingshilun May 12, 2018

Actually InsertIndex uses a InsertPlan to insert a new tuple into pg_attribute so we can get concurrency control for free.

for (oid_t old_column_id = 0;
old_column_id < old_schema->GetColumnCount(); old_column_id++) {
old_column_ids.push_back(old_column_id);
for (oid_t new_column_id = 0;

This comment has been minimized.

@star013

star013 May 11, 2018

[inefficient implementation]
It is not efficient to get a matched name in a for loop. It is better to use a hashmap to do such a thing.

This comment has been minimized.

@DeanChensj

DeanChensj May 12, 2018

Contributor

Sure, thanks for your suggestion.

auto new_column_name = node.GetNewName();
auto old_column_name = node.GetOldName();
ResultType result = catalog::Catalog::GetInstance()->RenameColumn(

This comment has been minimized.

@star013

star013 May 11, 2018

[race condition]
If there are two transactions trying to rename a column in a table at the same time, it is not safe to do operations on catalog instance without locks.

This comment has been minimized.

@DeanChensj

DeanChensj May 12, 2018

Contributor

We planed to add lock logic here and in the alterTable, should we merge your lock manager and executor implementation?

This comment has been minimized.

@star013

star013 May 12, 2018

Sure.
Although our current implementation of lock manager is correct, we are still working on it to support more functions. Most interface may not change in future, but I suggest that you keep an eye on our future pull request to guarantee your codes can run on our latest version of lock manager and index executors.
If you find any bugs in our lock manager or index executor, please feel free to contact us. We are pleasure to fix those problems.

oldName(nullptr),
newName(nullptr) {
dropped_names =
type == AlterTableType::RENAME ? nullptr : (new std::vector<char *>);

This comment has been minimized.

@star013

star013 May 11, 2018

[code style]
It seems better to bracket the judgement in the clause.

This comment has been minimized.

@Dingshilun

Dingshilun May 12, 2018

Of course, thanks for pointing out.

@star013

General Questions:

  1. The codes seem to work. They modified postgres parser, planner, executor, and catelog, which is the typical information flow to change schema. Some places need to add locks to protect consistency under multiple transactions.
  2. All the codes are easily understood because of detailed comments and clear names.
  3. There are a few redundant codes like using for loop to find a match string, but most of codes are neat.
  4. The codes are modular and they re-use other modules properly.
  5. There exist some commented out codes, like some test codes, but I believe they retain those codes to debug in future.
  6. They use debug log functions properly and most of them are at TRACE level, which is suitable to avoid too many debug info.

Documentation Questions:

  1. Comments describe intents of codes correctly.
  2. All important functions are commented. Although some functions are not commented, their names are good enough suggestions.
  3. No unusual behavior observed.
  4. No other third-party libraries.
  5. Garbage collection for the old table is not completed.

Testing Questions:

  1. They have tests and they are comprehensive on single transaction operations, but they do not test concurrent situations.
  2. Their test is valid because they use JDBC to send SQL instructions and the parser in peloton will convert it into plans to execute.
  3. No hardcoded answers.
  4. They tests cover all their basic functions like rename, type change etc.
@alphalzh

[Major]: Major issues need to notice
[Minor]: Minor issues need to notice
[Trivial]: Suggestions

Improvements:
+ Much better use of comments
+ Much more test cases
+ Overall LGTM

Documentation:
+ Proper use of TODO
+ No incomplete code
+ Good comments

Tests:
+ Test exists and comprehensive
- Some tests still fails

Generally better coding style, keep it up!

public void test_RenameCol_Base() throws SQLException {
conn.createStatement().execute(SQL_RENAME_COLUMN);
ResultSet rs = conn.createStatement().executeQuery(SQL_SELECT_STAR);
rs.next();

This comment has been minimized.

@alphalzh

alphalzh May 11, 2018

[suggestion] If rs doesn't have next, this will through an exception that is not easy to catch. Maybe add some more handling here?

This comment has been minimized.

@Dingshilun

Dingshilun May 12, 2018

Sure, we should use try-catch here.

conn.commit();
conn2.commit();
}

This comment has been minimized.

@alphalzh

alphalzh May 11, 2018

[minor] Still commented out code, need to address this in the next PR

new int [] {5, 400});
assertNoMoreRows(rs);
}
}

This comment has been minimized.

@alphalzh

alphalzh May 11, 2018

[comment] great tests!

// ALTER TABLE
//===--------------------------------------------------------------------===//
/**

This comment has been minimized.

@alphalzh

alphalzh May 11, 2018

[good] Good comments!

if (txn == nullptr)
throw CatalogException("Alter table requires transaction");
try {

This comment has been minimized.

@alphalzh

alphalzh May 11, 2018

[minor] Why double try catch block?

This comment has been minimized.

@DeanChensj

DeanChensj May 12, 2018

Contributor

Sure, we'll fix this.

@@ -26,8 +26,8 @@
// Fix for PRId64 (See https://stackoverflow.com/a/18719205)
#if defined(__cplusplus) && !defined(__STDC_FORMAT_MACROS)
#define __STDC_FORMAT_MACROS 1 // Not sure where to put this
#endif
#define __STDC_FORMAT_MACROS 1 // Not sure where to put this

This comment has been minimized.

@alphalzh

alphalzh May 11, 2018

[trivial] Is this a auto-formatting change?

This comment has been minimized.

@DeanChensj

DeanChensj May 12, 2018

Contributor

Yes

class AlterExecutor : public AbstractExecutor {
public:
AlterExecutor(const AlterExecutor &) = delete;

This comment has been minimized.

@alphalzh

alphalzh May 11, 2018

[good] Nice use of delete

/**
* @class AlterTableStatement
* @brief Represents "ALTER TABLE add column COLUMN_NAME COLUMN_TYPE"
* TODO: add implementation of AlterTableStatement

This comment has been minimized.

@alphalzh

alphalzh May 11, 2018

[trivial] maybe you forgot to remove todo...

This comment has been minimized.

@DeanChensj

DeanChensj May 12, 2018

Contributor

Thanks for pointing out!

this->table_name.c_str(),
this->database_name.c_str());
}

This comment has been minimized.

@alphalzh

alphalzh May 11, 2018

[trivial] Maybe putting longer functions to .cpp file

if ((*parse_tree->added_columns)[i].get()->not_null) {
catalog::Constraint constraint(ConstraintType::NOTNULL,
"con_not_null");
column.AddConstraint(constraint);

This comment has been minimized.

@alphalzh

alphalzh May 11, 2018

[trivial] Easy-to-mixup variable names column and columns

conn2.commit();
}
// The following tests are currently broken.

This comment has been minimized.

@eloiseh

eloiseh May 12, 2018

What will happen when running these test cases?

This comment has been minimized.

@DeanChensj

DeanChensj May 12, 2018

Contributor

Currently the system will crash due to infinite retry in the binder.

conn.createStatement().execute(sql);
ResultSet rs = conn.createStatement().executeQuery(SQL_SELECT_STAR);
rs.next();
checkRow(rs,

This comment has been minimized.

@eloiseh

eloiseh May 12, 2018

I think this check can't ensure there is only one column left.

for (auto index_oid_pair : old_index_oids) {
oid_t index_oid = index_oid_pair.first;
// delete record in pg_index
pg_index->DeleteIndex(index_oid, txn);

This comment has been minimized.

@eloiseh

eloiseh May 12, 2018

Why not delete index in pg_index only if not all indexed columns still exists?

txn);
column_offset++;
}
// TODO: Add gc logic

This comment has been minimized.

@eloiseh

eloiseh May 12, 2018

Have you guys finished this?

This comment has been minimized.

@DeanChensj

DeanChensj May 12, 2018

Contributor

We're still working on this, sorry about that.

auto old_schema = old_table->GetSchema();
std::vector<oid_t> column_ids;
// Step 1: remove drop columns from old schema

This comment has been minimized.

@eloiseh

eloiseh May 12, 2018

Good strategy to remove dropped columns.

bool missing_ok; /* skip error if table missing */
} AlterTableStmt;
typedef enum AlterTableType {

This comment has been minimized.

@eloiseh

eloiseh May 12, 2018

Very comprehensive types.

const std::unique_ptr<catalog::Schema> &added_columns,
const std::unique_ptr<catalog::Schema> &changed_type_columns,
AlterType a_type);
explicit AlterPlan(const std::string &database_name,

This comment has been minimized.

@eloiseh

eloiseh May 12, 2018

I believe you guys have changed rename columns from using vector to string, why still vector here?

This comment has been minimized.

@Dingshilun

Dingshilun May 12, 2018

Sorry, will correct it. Thanks for pointing out.

@DeanChensj DeanChensj force-pushed the sxzh93:review2 branch from e3c75c3 to 939df35 May 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment