Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tree-based-model #31696

Merged
merged 33 commits into from
Apr 15, 2021
Merged

tree-based-model #31696

merged 33 commits into from
Apr 15, 2021

Conversation

123malin
Copy link
Contributor

@123malin 123malin commented Mar 17, 2021

PR types

Others

PR changes

Others

Describe

TreeBased统一支持方案基础功能组件:

  1. TreeIndex,实现树相关的检索功能。
  2. IndexSampler,实现层级采样,TDM/JTM需要。

当前所有基础组件需在WITH_DISTRIBUTED=ON时才可以使用。

@seiriosPlus
Copy link
Collaborator

  1. move index_dataset implement & proto to paddle/fluid/distributed.

#include "paddle/fluid/operators/math/sampler.h"

namespace paddle {
namespace framework {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespace upgrade.

namespace paddle {
namespace framework {

using Sampler = paddle::operators::math::Sampler;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary?

public:
virtual ~LayerWiseSampler() {}
explicit LayerWiseSampler(const std::string& name) {
tree_ = IndexWrapper::GetInstance()->GetTreeIndex(name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

singleton is necessary?

See the License for the specific language governing permissions and
limitations under the License. */

#include <stdio.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need stdio.h ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FILE* fp = fopen(filename.c_str(), "rb")

if (item.key() == ".tree_meta") {
meta_.ParseFromString(item.value());
} else {
auto code = boost::lexical_cast<uint64_t>(item.key());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need boost?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to parse from bytes to uint64.

std::vector<uint64_t> get_travel_path(uint64_t child, uint64_t ancestor) {
std::vector<uint64_t> res;
while (child > ancestor) {
res.push_back(data_[child].id());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to at

}
TreePtr tree = std::make_shared<TreeIndex>();
int ret = tree->load(tree_path);
if (ret != 0) return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ret != 0 means loading tree_path fails, nothing need to do.

}
layer_counts_sum_ += layer_sample_num + 1;
layer_counts_.push_back(layer_sample_num);
VLOG(1) << "[INFO] level " << cur_layer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

调整下log级别.

int TreeIndex::Load(const std::string filename) {
int err_no;
auto fp = paddle::framework::fs_open_read(filename, &err_no, "");
CHECK(fp != nullptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

调整为PADDLE_ENFORCE

uint64_t max_id_;
};

using TreePtr = std::shared_ptr<TreeIndex>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么使用 shared_ptr 不是 unique_ptr


void insert_tree_index(const std::string name, const std::string tree_path) {
if (tree_map.find(name) != tree_map.end()) {
VLOG(0) << "Tree " << name << " has already existed.";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所有LOG级别都要调整.

WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
if (WITH_PSCORE)
add_custom_command(TARGET framework_py_proto POST_BUILD
COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_BINARY_DIR}/python/paddle/fluid/proto
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确定是改这里?

Copy link
Collaborator

@seiriosPlus seiriosPlus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for const_cast

@123malin 123malin merged commit a8c3a90 into PaddlePaddle:develop Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants