Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the parsing part for the profiling tool #7043

Merged
merged 15 commits into from
Jan 8, 2018

Conversation

kuke
Copy link
Contributor

@kuke kuke commented Dec 26, 2017

See Part I in #6701

Please review this part after the Part I merged.

@kuke
Copy link
Contributor Author

kuke commented Jan 5, 2018

The profiling report looks like:

W0105 14:48:05.080145  1552 profiler.cc:232] Cannot find the push marker of event 'event_without_push', which will be ignored in profiling report.
W0105 14:48:05.080199  1552 profiler.cc:266] Cannot find the pop marker of event 'event_without_pop', which will be ignored in profiling report.

------------------------->     Profiling Report     <-------------------------

Place: CUDA
Time unit: ms
Sorted by total time in descending order in the same thread

Event                Calls       Total       Min.        Max.        Ave.
thread0::op_1        3           0.006565    0.001816    0.002444    0.00218833
thread0::op_3        3           0.005658    0.001816    0.001956    0.001886
thread0::op_4        3           0.005518    0.001816    0.001886    0.00183933
thread0::op_2        3           0.005446    0.001815    0.001816    0.00181533
thread0::evs_op_3    1           0.002165    0.002165    0.002165    0.002165
thread0::evs_op_2    1           0.002026    0.002026    0.002026    0.002026
thread0::evs_op_1    1           0.001956    0.001956    0.001956    0.001956
thread0::evs_op_4    1           0.001816    0.001816    0.001816    0.001816

And this pull request is ready for review. @qingqing01 @reyoung

void PopEvent(const std::string& name, DeviceContext* dev_ctx) {
GetEventList().Record(EventKind::kPopRange, std::move(name), g_thread_id,
dev_ctx);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to refine RecordEvent::RecordEvent() and RecordEvent::~RecordEvent() to reuse these two functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

std::cout << "Place: GPU" << std::endl;
#else
std::cout << "Place: CPU" << std::endl;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If compiling with GPU, the users also can run tasks on CPU only. So here should not use the PADDLE_WITH_CUDA macro. You can use g_state to determine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Use another global variable g_profiler_place because g_state has been set to kDisabled when parsing events.

double event_time = rit->CudaElapsedMs(events[i][j]);
#else
double event_time = rit->CpuElapsedMs(events[i][j]);
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same problem with above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pushed_events.erase((++rit).base());
} else {
std::cout << "Warning: can not find the start marker of event "
<< events[i][j].name();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG(WARNING)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

<< std::setw(data_width) << event_item.max_time
<< std::setw(data_width) << event_item.ave_time << std::endl;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the ParseEvents and print can be two functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@kuke kuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed all the comments. Thanks!

void PopEvent(const std::string& name, DeviceContext* dev_ctx) {
GetEventList().Record(EventKind::kPopRange, std::move(name), g_thread_id,
dev_ctx);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

std::cout << "Place: GPU" << std::endl;
#else
std::cout << "Place: CPU" << std::endl;
#endif
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Use another global variable g_profiler_place because g_state has been set to kDisabled when parsing events.

double event_time = rit->CudaElapsedMs(events[i][j]);
#else
double event_time = rit->CpuElapsedMs(events[i][j]);
#endif
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pushed_events.erase((++rit).base());
} else {
std::cout << "Warning: can not find the start marker of event "
<< events[i][j].name();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

<< std::setw(data_width) << event_item.max_time
<< std::setw(data_width) << event_item.ave_time << std::endl;
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -13,12 +13,18 @@ See the License for the specific language governing permissions and
limitations under the License. */

#include "paddle/platform/profiler.h"
#include <iomanip>
#include <map>
#include "gflags/gflags.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gflags is not used. This header can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return a.max_time > b.max_time;
default:
return a.ave_time > b.ave_time;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this lambda expression can be selected before the for loop in line 187, and the sort_domain can be saved at the same time. Then pass the sort_domain as a string argument to PrintProfilingReport function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@kuke kuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified. Thanks!

@@ -13,12 +13,18 @@ See the License for the specific language governing permissions and
limitations under the License. */

#include "paddle/platform/profiler.h"
#include <iomanip>
#include <map>
#include "gflags/gflags.h"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return a.max_time > b.max_time;
default:
return a.ave_time > b.ave_time;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kuke kuke merged commit 5e90f5e into PaddlePaddle:develop Jan 8, 2018
#ifdef PADDLE_WITH_CUDA
PADDLE_ENFORCE(e.has_cuda() && has_cuda());
PADDLE_ENFORCE(e.device() == device());
PADDLE_ENFORCE(cudaEventSynchronize(event_));
PADDLE_ENFORCE(cudaEventSynchronize(e.event()));
float ms;
PADDLE_ENFORCE(cudaEventElapsedTime(&ms, event_, e.event()));
return ms * 1000.0;
return ms;
#else
PADDLE_THROW("CUDA is not enabled");
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not comment on line 110 but only write the comments here.
line 110 should be before line 109.


namespace paddle {
namespace platform {

// The profiler state, the initial value is ProfilerState::kDisabled
static ProfilerState g_state = ProfilerState::kDisabled;
// To record which timer the profiler used, CUDA or CPU.
static std::string g_profiler_place = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g_profiler_place and g_state should be thread_local, just like g_thread_id.

name_(std::move(name)),
thread_id_(thread_id),
has_cuda_(false) {
: kind_(kind), name_(name), thread_id_(thread_id), has_cuda_(false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do not you use std::move?

double Event::CpuElapsedUs(const Event& e) const {
return (e.cpu_ns_ - cpu_ns_) / (1000.0);
double Event::CpuElapsedMs(const Event& e) const {
return (e.cpu_ns_ - cpu_ns_) / (1000000.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ii will be better to replace / (1000000.0) with *0.000001.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants