Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor man's profiler on query level #4247

Merged
merged 55 commits into from Jul 23, 2019
Merged
Show file tree
Hide file tree
Changes from 54 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
632c542
add basic timer with logging for query threads
laplab Dec 25, 2018
4c55802
refactor
laplab Jan 18, 2019
df3db1b
basic pipe
laplab Feb 3, 2019
89582cf
Merge branch 'master' into query-poor-mans-profiler
laplab Feb 3, 2019
5c54bbb
write traces to trace_log
laplab Feb 3, 2019
5f58594
review fix. stop TraceCollector gracefully
laplab Feb 9, 2019
b8d7880
Merge remote-tracking branch 'parent/master' into query-poor-mans-pro…
laplab Feb 9, 2019
7c0339b
fix unresolved conflicts
laplab Feb 9, 2019
628c282
fix style
laplab Feb 9, 2019
2a0af15
add symbolizeTrace function
laplab Feb 10, 2019
ce4e48d
Merge branch 'master' into query-poor-mans-profiler
laplab Mar 1, 2019
d645050
add missing include
laplab Mar 1, 2019
6367e15
add cpu timer. add timer type to log. refactor
laplab Mar 4, 2019
70b5aaa
remove circular dependency
laplab Mar 4, 2019
2429449
remove USE_UNWIND from header
laplab Mar 5, 2019
3d9fb85
revert changes in contrib
laplab Mar 5, 2019
14e3639
add rt library for timer_{create,settime,delete}
laplab Mar 5, 2019
fe2adb2
move rt from dbms to clickhouse_common_io
laplab Mar 6, 2019
12fe175
Merge remote-tracking branch 'upstream/master' into query-poor-mans-p…
laplab Mar 6, 2019
5b50026
- remove memory allocations in signal handler
laplab Mar 10, 2019
1ea971d
Merge remote-tracking branch 'upstream/master' into query-poor-mans-p…
laplab Mar 10, 2019
4a4ff8d
Merge remote-tracking branch 'upstream/master' into query-poor-mans-p…
laplab Mar 23, 2019
023744b
check if cursor is on signal frame
laplab Apr 6, 2019
fbdf2c3
Merge remote-tracking branch 'upstream/master' into query-poor-mans-p…
laplab Apr 6, 2019
870e3ba
remove unused check
laplab Apr 6, 2019
6dbe205
Merge remote-tracking branch 'upstream/master' into query-poor-mans-p…
laplab May 12, 2019
aaa3156
Merge branch 'update_libunwind' into query-poor-mans-profiler
laplab May 12, 2019
ab69128
introduce Backtrace class
laplab May 14, 2019
483f68b
refactor. introduce SleepFor* functions
laplab May 19, 2019
27732ba
enable float seconds in sleep back
laplab May 19, 2019
1dd8726
move SleepFor* to libcommon
laplab May 19, 2019
b12db6e
fix data race on trace_pipe close
laplab May 19, 2019
0a533b0
fix codestyle
laplab May 19, 2019
9ceff40
last hope
laplab May 21, 2019
382beef
Merge remote-tracking branch 'upstream/master' into query-poor-mans-p…
laplab May 21, 2019
419a37c
fix late define of _LIBUNWIND_NO_HEAP
laplab May 22, 2019
30c31bc
Merge remote-tracking branch 'upstream/master' into query-poor-mans-p…
laplab May 24, 2019
06d370d
Merge remote-tracking branch 'upstream/master' into query-poor-mans-p…
laplab May 25, 2019
3b54fde
replace StackTrace with Backtrace
laplab May 30, 2019
8f36af7
Merge remote-tracking branch 'upstream/master' into query-poor-mans-p…
laplab May 30, 2019
86857ef
update libunwind
laplab May 30, 2019
6c75e60
fix memory leak
laplab May 30, 2019
8de1340
update libunwind
laplab Jun 26, 2019
7cff36f
Merge remote-tracking branch 'upstream/master' into query-poor-mans-p…
laplab Jun 26, 2019
0f57986
Merge remote-tracking branch 'upstream/master' into query-poor-mans-p…
laplab Jul 4, 2019
dc6e6ea
fix submodules
laplab Jul 4, 2019
b8585a5
refactor and comments
laplab Jul 5, 2019
6b7f587
cut large query_ids
laplab Jul 5, 2019
c7eaa30
update libunwind to RIP experiment
laplab Jul 5, 2019
aaea6cb
Merge remote-tracking branch 'upstream/master' into query-poor-mans-p…
laplab Jul 5, 2019
17e3542
refactor
laplab Jul 5, 2019
ef1d84b
do not run trace collector without trace_log. do not run query profil…
laplab Jul 6, 2019
3828684
style
laplab Jul 6, 2019
e935cb0
Merge remote-tracking branch 'upstream/master' into query-poor-mans-p…
laplab Jul 7, 2019
9d540ab
refactor
laplab Jul 10, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion contrib/libunwind
2 changes: 2 additions & 0 deletions dbms/CMakeLists.txt
Expand Up @@ -223,6 +223,8 @@ target_link_libraries(clickhouse_common_io
Threads::Threads
PRIVATE
${CMAKE_DL_LIBS}
PRIVATE
rt
PUBLIC
roaring
)
Expand Down
3 changes: 3 additions & 0 deletions dbms/programs/server/Server.cpp
Expand Up @@ -508,6 +508,9 @@ int Server::main(const std::vector<std::string> & /*args*/)
}
LOG_DEBUG(log, "Loaded metadata.");

/// Init trace collector only after trace_log system table was created
global_context->initializeTraceCollector();

global_context->setCurrentDatabase(default_database);

if (has_zookeeper && config().has("distributed_ddl"))
Expand Down
10 changes: 10 additions & 0 deletions dbms/programs/server/config.xml
Expand Up @@ -294,6 +294,16 @@
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log>

<!-- Trace log. Stores stack traces collected by query profilers.
See query_profiler_real_time_period and query_profiler_cpu_time_period settings. -->
<trace_log>
laplab marked this conversation as resolved.
Show resolved Hide resolved
<database>system</database>
<table>trace_log</table>

<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</trace_log>

<!-- Query thread log. Has information about all threads participated in query execution.
Used only for queries with setting log_query_threads = 1. -->
<query_thread_log>
Expand Down
8 changes: 8 additions & 0 deletions dbms/src/Common/QueryProfiler.cpp
@@ -0,0 +1,8 @@
#include "QueryProfiler.h"

namespace DB
{

LazyPipe trace_pipe;

}
158 changes: 158 additions & 0 deletions dbms/src/Common/QueryProfiler.h
@@ -0,0 +1,158 @@
#pragma once

#include <common/Pipe.h>
#include <common/StackTrace.h>
#include <common/StringRef.h>
#include <common/logger_useful.h>
#include <Common/CurrentThread.h>
#include <IO/WriteHelpers.h>
#include <IO/WriteBufferFromFileDescriptor.h>

#include <signal.h>
#include <time.h>

namespace Poco
{
class Logger;
}

namespace DB
{

extern LazyPipe trace_pipe;

enum class TimerType : UInt8
{
Real,
Cpu,
};

namespace
{
/// Normally query_id is a UUID (string with a fixed length) but user can provide custom query_id.
/// Thus upper bound on query_id length should be introduced to avoid buffer overflow in signal handler.
constexpr size_t QUERY_ID_MAX_LEN = 1024;

void writeTraceInfo(TimerType timer_type, int /* sig */, siginfo_t * /* info */, void * context)
{
constexpr size_t buf_size = sizeof(char) + // TraceCollector stop flag
laplab marked this conversation as resolved.
Show resolved Hide resolved
8 * sizeof(char) + // maximum VarUInt length for string size
QUERY_ID_MAX_LEN * sizeof(char) + // maximum query_id length
sizeof(StackTrace) + // collected stack trace
sizeof(TimerType); // timer type
char buffer[buf_size];
DB::WriteBufferFromFileDescriptor out(trace_pipe.fds_rw[1], buf_size, buffer);

StringRef query_id = CurrentThread::getQueryId();
query_id.size = std::min(query_id.size, QUERY_ID_MAX_LEN);

const auto signal_context = *reinterpret_cast<ucontext_t *>(context);
const StackTrace stack_trace(signal_context);

DB::writeChar(false, out);
DB::writeStringBinary(query_id, out);
DB::writePODBinary(stack_trace, out);
DB::writePODBinary(timer_type, out);
out.next();
}

const UInt32 TIMER_PRECISION = 1e9;
}


/**
* Query profiler implementation for selected thread.
*
* This class installs timer and signal handler on creation to:
* 1. periodically pause given thread
* 2. collect thread's current stack trace
* 3. write collected stack trace to trace_pipe for TraceCollector
*
* Desctructor tries to unset timer and restore previous signal handler.
* Note that signal handler implementation is defined by template parameter. See QueryProfilerReal and QueryProfilerCpu.
*/
template <typename ProfilerImpl>
laplab marked this conversation as resolved.
Show resolved Hide resolved
class QueryProfilerBase
laplab marked this conversation as resolved.
Show resolved Hide resolved
{
public:
QueryProfilerBase(const Int32 thread_id, const int clock_type, const UInt32 period, const int pause_signal = SIGALRM)
: log(&Logger::get("QueryProfiler"))
, pause_signal(pause_signal)
{
struct sigaction sa{};
sa.sa_sigaction = ProfilerImpl::signalHandler;
sa.sa_flags = SA_SIGINFO | SA_RESTART;

if (sigemptyset(&sa.sa_mask))
throw Poco::Exception("Failed to clean signal mask for query profiler");
laplab marked this conversation as resolved.
Show resolved Hide resolved

if (sigaddset(&sa.sa_mask, pause_signal))
throw Poco::Exception("Failed to add signal to mask for query profiler");

if (sigaction(pause_signal, &sa, previous_handler))
throw Poco::Exception("Failed to setup signal handler for query profiler");

struct sigevent sev;
sev.sigev_notify = SIGEV_THREAD_ID;
sev.sigev_signo = pause_signal;
sev._sigev_un._tid = thread_id;
if (timer_create(clock_type, &sev, &timer_id))
laplab marked this conversation as resolved.
Show resolved Hide resolved
throw Poco::Exception("Failed to create thread timer");

struct timespec interval{.tv_sec = period / TIMER_PRECISION, .tv_nsec = period % TIMER_PRECISION};
struct itimerspec timer_spec = {.it_interval = interval, .it_value = interval};
if (timer_settime(timer_id, 0, &timer_spec, nullptr))
throw Poco::Exception("Failed to set thread timer");
}

~QueryProfilerBase()
{
if (timer_delete(timer_id))
LOG_ERROR(log, "Failed to delete query profiler timer");
laplab marked this conversation as resolved.
Show resolved Hide resolved

if (sigaction(pause_signal, previous_handler, nullptr))
LOG_ERROR(log, "Failed to restore signal handler after query profiler");
}

private:
Poco::Logger * log;

/// Timer id from timer_create(2)
timer_t timer_id = nullptr;

/// Pause signal to interrupt threads to get traces
int pause_signal;

/// Previous signal handler to restore after query profiler exits
struct sigaction * previous_handler = nullptr;
};

/// Query profiler with timer based on real clock
class QueryProfilerReal : public QueryProfilerBase<QueryProfilerReal>
{
public:
QueryProfilerReal(const Int32 thread_id, const UInt32 period)
: QueryProfilerBase(thread_id, CLOCK_REALTIME, period, SIGUSR1)
{}

static void signalHandler(int sig, siginfo_t * info, void * context)
{
writeTraceInfo(TimerType::Real, sig, info, context);
}
};

/// Query profiler with timer based on CPU clock
class QueryProfilerCpu : public QueryProfilerBase<QueryProfilerCpu>
{
public:
QueryProfilerCpu(const Int32 thread_id, const UInt32 period)
: QueryProfilerBase(thread_id, CLOCK_THREAD_CPUTIME_ID, period, SIGUSR2)
{}

static void signalHandler(int sig, siginfo_t * info, void * context)
{
writeTraceInfo(TimerType::Cpu, sig, info, context);
}
};

}
54 changes: 10 additions & 44 deletions dbms/src/Common/ShellCommand.cpp
Expand Up @@ -8,52 +8,10 @@
#include <IO/WriteHelpers.h>
#include <port/unistd.h>
#include <csignal>


namespace DB
{
namespace ErrorCodes
{
extern const int CANNOT_PIPE;
extern const int CANNOT_DLSYM;
extern const int CANNOT_FORK;
extern const int CANNOT_WAITPID;
extern const int CHILD_WAS_NOT_EXITED_NORMALLY;
extern const int CANNOT_CREATE_CHILD_PROCESS;
}
}

#include <common/Pipe.h>

namespace
{
struct Pipe
{
int fds_rw[2];

Pipe()
{
#ifndef __APPLE__
if (0 != pipe2(fds_rw, O_CLOEXEC))
DB::throwFromErrno("Cannot create pipe", DB::ErrorCodes::CANNOT_PIPE);
#else
if (0 != pipe(fds_rw))
DB::throwFromErrno("Cannot create pipe", DB::ErrorCodes::CANNOT_PIPE);
if (0 != fcntl(fds_rw[0], F_SETFD, FD_CLOEXEC))
DB::throwFromErrno("Cannot create pipe", DB::ErrorCodes::CANNOT_PIPE);
if (0 != fcntl(fds_rw[1], F_SETFD, FD_CLOEXEC))
DB::throwFromErrno("Cannot create pipe", DB::ErrorCodes::CANNOT_PIPE);
#endif
}

~Pipe()
{
if (fds_rw[0] >= 0)
close(fds_rw[0]);
if (fds_rw[1] >= 0)
close(fds_rw[1]);
}
};

/// By these return codes from the child process, we learn (for sure) about errors when creating it.
enum class ReturnCodes : int
{
Expand All @@ -64,10 +22,18 @@ namespace
};
}


namespace DB
{

namespace ErrorCodes
{
extern const int CANNOT_DLSYM;
extern const int CANNOT_FORK;
extern const int CANNOT_WAITPID;
extern const int CHILD_WAS_NOT_EXITED_NORMALLY;
extern const int CANNOT_CREATE_CHILD_PROCESS;
}

ShellCommand::ShellCommand(pid_t pid, int in_fd, int out_fd, int err_fd, bool terminate_in_destructor_)
: pid(pid)
, terminate_in_destructor(terminate_in_destructor_)
Expand Down
1 change: 1 addition & 0 deletions dbms/src/Common/ThreadStatus.cpp
Expand Up @@ -4,6 +4,7 @@
#include <Common/Exception.h>
#include <Common/ThreadProfileEvents.h>
#include <Common/TaskStatsInfoGetter.h>
#include <Common/QueryProfiler.h>
laplab marked this conversation as resolved.
Show resolved Hide resolved
#include <Common/ThreadStatus.h>

#include <Poco/Logger.h>
Expand Down
10 changes: 10 additions & 0 deletions dbms/src/Common/ThreadStatus.h
Expand Up @@ -26,6 +26,8 @@ namespace DB
class Context;
class QueryStatus;
class ThreadStatus;
class QueryProfilerReal;
class QueryProfilerCpu;
class QueryThreadLog;
struct TasksStatsCounters;
struct RUsageCounters;
Expand Down Expand Up @@ -148,6 +150,10 @@ class ThreadStatus : public boost::noncopyable
protected:
void initPerformanceCounters();

void initQueryProfiler();

void finalizeQueryProfiler();

void logToQueryThreadLog(QueryThreadLog & thread_log);

void assertState(const std::initializer_list<int> & permitted_states, const char * description = nullptr);
Expand All @@ -171,6 +177,10 @@ class ThreadStatus : public boost::noncopyable
time_t query_start_time = 0;
size_t queries_started = 0;

// CPU and Real time query profilers
std::unique_ptr<QueryProfilerReal> query_profiler_real;
std::unique_ptr<QueryProfilerCpu> query_profiler_cpu;

Poco::Logger * log = nullptr;

friend class CurrentThread;
Expand Down
8 changes: 2 additions & 6 deletions dbms/src/Common/Throttler.h
Expand Up @@ -6,6 +6,7 @@
#include <Common/Stopwatch.h>
#include <Common/Exception.h>
#include <Common/ProfileEvents.h>
#include <common/Sleep.h>
#include <IO/WriteHelpers.h>
#include <port/clock.h>

Expand Down Expand Up @@ -76,12 +77,7 @@ class Throttler
if (desired_ns > elapsed_ns)
{
UInt64 sleep_ns = desired_ns - elapsed_ns;
::timespec sleep_ts;
sleep_ts.tv_sec = sleep_ns / 1000000000;
sleep_ts.tv_nsec = sleep_ns % 1000000000;

/// NOTE: Returns early in case of a signal. This is considered normal.
::nanosleep(&sleep_ts, nullptr);
SleepForNanoseconds(sleep_ns);

ProfileEvents::increment(ProfileEvents::ThrottlerSleepMicroseconds, sleep_ns / 1000UL);
}
Expand Down
2 changes: 2 additions & 0 deletions dbms/src/Core/Settings.h
Expand Up @@ -220,6 +220,8 @@ struct Settings : public SettingsCollection<Settings>
M(SettingBool, empty_result_for_aggregation_by_empty_set, false, "Return empty result when aggregating without keys on empty set.") \
M(SettingBool, allow_distributed_ddl, true, "If it is set to true, then a user is allowed to executed distributed DDL queries.") \
M(SettingUInt64, odbc_max_field_size, 1024, "Max size of filed can be read from ODBC dictionary. Long strings are truncated.") \
M(SettingUInt64, query_profiler_real_time_period, 500000000, "Period for real clock timer of query profiler (in nanoseconds). Set 0 value to turn off real clock query profiler") \
laplab marked this conversation as resolved.
Show resolved Hide resolved
M(SettingUInt64, query_profiler_cpu_time_period, 500000000, "Period for CPU clock timer of query profiler (in nanoseconds). Set 0 value to turn off CPU clock query profiler") \
\
\
/** Limits during query execution are part of the settings. \
Expand Down
10 changes: 2 additions & 8 deletions dbms/src/DataStreams/IBlockInputStream.cpp
Expand Up @@ -3,7 +3,7 @@
#include <Interpreters/ProcessList.h>
#include <Interpreters/Quota.h>
#include <Common/CurrentThread.h>

#include <common/Sleep.h>

namespace ProfileEvents
{
Expand Down Expand Up @@ -255,13 +255,7 @@ static void limitProgressingSpeed(size_t total_progress_size, size_t max_speed_i
if (desired_microseconds > total_elapsed_microseconds)
{
UInt64 sleep_microseconds = desired_microseconds - total_elapsed_microseconds;
::timespec sleep_ts;
sleep_ts.tv_sec = sleep_microseconds / 1000000;
sleep_ts.tv_nsec = sleep_microseconds % 1000000 * 1000;

/// NOTE: Returns early in case of a signal. This is considered normal.
/// NOTE: It's worth noting that this behavior affects kill of queries.
::nanosleep(&sleep_ts, nullptr);
SleepForMicroseconds(sleep_microseconds);

ProfileEvents::increment(ProfileEvents::ThrottlerSleepMicroseconds, sleep_microseconds);
}
Expand Down
12 changes: 12 additions & 0 deletions dbms/src/Functions/FunctionsIntrospection.cpp
@@ -0,0 +1,12 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionsIntrospection.h>

namespace DB
{

void registerFunctionsIntrospection(FunctionFactory & factory)
{
factory.registerFunction<FunctionSymbolizeTrace>();
}

}