Skip to content

Optimize path gc#777

Merged
imay merged 9 commits into
apache:be_refactorfrom
kangpinghuang:be_refactor_new_optimize_path_gc
Mar 28, 2019
Merged

Optimize path gc#777
imay merged 9 commits into
apache:be_refactorfrom
kangpinghuang:be_refactor_new_optimize_path_gc

Conversation

@kangpinghuang
Copy link
Copy Markdown
Contributor

  1. move path gc to DataDir
  2. use condition variable to coordinate scan and gc thread
  3. optimize pending paths to pending ids

Comment thread be/src/olap/data_dir.cpp Outdated
std::set<std::string> paths;
paths.insert(path);
if (!is_valid) {
LOG(WARNING) << "unknow path:" << path;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG(WARNING) << "unknow path:" << path;
LOG(WARNING) << "unknown path:" << path;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment thread be/src/olap/data_dir.cpp Outdated
_remove_check_paths_no_lock(paths);
continue;
} else {
if (tablet_id >0 && schema_hash >0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (tablet_id >0 && schema_hash >0) {
if (tablet_id > 0 && schema_hash > 0) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment thread be/src/olap/tablet_manager.cpp Outdated

bool TabletManager::get_rowset_id_from_path(const std::string& path, RowsetId* rowset_id) {
std::string pattern = "/data/\\d+/\\d+/\\d+/(\\d+)_.*";
std::regex rgx (pattern.c_str());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should make this static to avoid constructing this multiple times.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment thread be/src/olap/base_compaction.cpp Outdated
RowsetId rowset_id = 0;
RETURN_NOT_OK(_tablet->next_rowset_id(&rowset_id));
RowsetWriterContextBuilder context_builder;
context_builder.set_rowset_id(rowset_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build should have default value, otherwise we can create class instantly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment thread be/src/olap/data_dir.h Outdated

void remove_pending_ids(const std::string& id);

void perform_path_gc();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add comment about these two functions, and their relations

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment thread be/src/olap/data_dir.cpp Outdated
return _pending_path_ids.find(id) != _pending_path_ids.end();
}

void DataDir::_remove_check_paths_no_lock(const std::set<std::string> paths) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguments should be reference

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment thread be/src/olap/data_dir.cpp
}

// path consumer
void DataDir::perform_path_gc() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add unit test for this two function

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to fix the unit tests and add the unit test case for these two function later

Comment thread be/src/olap/data_dir.cpp Outdated
std::string path = *path_iter;
TTabletId tablet_id = -1;
TSchemaHash schema_hash = -1;
bool is_valid = StorageEngine::instance()->tablet_manager()->get_tablet_id_and_schema_hash_from_path(path,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using instance(), you'd better pass tablet_manager to this class

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment thread be/src/olap/tablet_manager.cpp Outdated
if (path.find(data_dir_path) != std::string::npos) {
std::string pattern = data_dir_path + "/data/\\d+/(\\d+)/?(\\d+)?";
std::regex rgx (pattern.c_str());
static std::string pattern = data_dir_path + "/data/\\d+/(\\d+)/?(\\d+)?";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can't use static here, it would lead a bug


class RowsetWriterContextBuilder {
public:
RowsetWriterContextBuilder() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this builder is useless, RowsetWriterContext is enough

Comment thread be/src/olap/data_dir.cpp Outdated

// path producer
void DataDir::perform_path_scan() {
std::unique_lock<std::mutex> lck(_check_path_mutex);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
unique_lock
}
cv.notify_one()

Comment thread be/src/olap/data_dir.cpp Outdated
_scanned = false;
LOG(INFO) << "start to path gc.";
int counter = 0;
for (auto path_iter = _all_check_paths.begin(); path_iter != _all_check_paths.end(); ++path_iter) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (auto path_iter = _all_check_paths.begin(); path_iter != _all_check_paths.end(); ++path_iter) {
for (auto& path : _all_check_paths) {

Comment thread be/src/olap/data_dir.cpp
// path producer
void DataDir::perform_path_scan() {
{
std::unique_lock<std::mutex> lck(_check_path_mutex);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (_all_check_paths.size() > 0) {
return ;
}

@kangpinghuang kangpinghuang force-pushed the be_refactor_new_optimize_path_gc branch from 4d42b66 to 7f35610 Compare March 28, 2019 08:08
@imay imay merged commit c8dcd83 into apache:be_refactor Mar 28, 2019
@kangpinghuang kangpinghuang deleted the be_refactor_new_optimize_path_gc branch August 20, 2019 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants