New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TS-4399 TS-4400 Management API messes up proxy options #1073
Conversation
TS-4399: Management API breaks diagnostic log rotation TS-4400: TSProxyStateSet persist cache clearing across restart The two issues are related in that they both deal with the management API not correctly handling proxy flags. For TS-4399, it was because the management API was not aware of traffic_manager setting extra proxy options. This was fixed by providing CoreAPI a callback to get extra proxy options from traffic_manager. For TS-4400, it was because the management API was not properly clearing optional flags between proxy reboots. This was fixed by resetting the proxy options before each reboot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that it is a good idea to have the management API contain an upward dependency on traffic_manager
.
One of the problems here is that the introduction of the --bind_std*
options changed the way argument passing works without fully propagating the change. I think that a better approach to explore is making traffic_manager
call ProxyStateSet
. If we can do that then we only have 1 copy of this code (DRY) and the correct dependencies.
This patch centralizes where traffic_server starts to inside CoreAPI's ProxyStateSet. This is good because we reduce the number of places we deal with traffic_server options. Everything is simply handled by the proxy_options_callback. Note: Right now, there is about a ~30 delay between ATS starting up and traffic_ctl commands working. Not sure why.
@jpeach, is this what you were thinking? Also, between commit 82037 and 3365b, |
just_started = 0; | ||
sleep_time = 0; | ||
} else { | ||
mgmt_log("in ProxyStateSet else branch"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: remove this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moving in the right direction, but still adds more complexity than it removes.
Proxy options can come from
traffic_manager -tsArgs
(explicit)proxy.config.proxy_binary_opts
(explicit)traffic_manager --bind_foo
(implicit)
The goal is to have a single API ProxyStateSet
that does the right thing in exactly 1 place (DRY principle). Using ProxyStateSet
from traffic_manager
is an improvement. Note that all we really need to persist across restarts are (1) and (3). We already have a LocalManager
object that can store the data that we need.
So we can keep LocalManager::proxy_options
as the persistent proxy options, and have ProxyStateSet
generate any additional options needed for this particular launch. The per-launch options from the API flags and from (2) can be passed directly into LocalManager::startProxy()
, which can concatenate them (you just need a Vec<char*>
here). This won't require any new global data.
snprintf(lmgmt->proxy_options + l, n, " --%s %s", TM_OPT_BIND_STDERR, bind_stderr); | ||
char *bind_stderr_opt = new char[strlen("--") + strlen(TM_OPT_BIND_STDERR)]; | ||
strcpy(bind_stderr_opt, "--"); | ||
strcat(bind_stderr_opt, TM_OPT_BIND_STDERR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
foo.push_back("--" TM_OPT_BIND_STDERR")
Instead, use proxy_options as persistent proxy options storage. Then have lmgmt->startProxy() take in an argument for one-time flags. This was done to reduce complicated dependencies.
How does that look? Still need to figure out that delay issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good.
@@ -27,6 +27,7 @@ | |||
#include "ts/ink_sock.h" | |||
#include "ts/ink_args.h" | |||
#include "ts/ink_syslog.h" | |||
#include <vector> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this.
@@ -445,6 +446,8 @@ main(int argc, const char **argv) | |||
int binding_version = 0; | |||
BindingInstance *binding = NULL; | |||
|
|||
std::vector<char*> callback_proxy_options; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this.
strcpy(new_proxy_opts, lmgmt->proxy_options); | ||
strcat(new_proxy_opts, bind_stdout_opt); | ||
strcat(new_proxy_opts, bind_stdout); | ||
delete lmgmt->proxy_options; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace all this string code with TextBuffer
.
TextBuffer args;
if (*bind_stdout) {
const char * space = args.empty() ? "" : " ";
args.format("%s%s", space, "--" TM_OPT_BIND_STDOUT);
}
...
lmgmt->proxy_options = args.release();
ats_free(lmgmt->proxy_options); | ||
lmgmt->proxy_options = tsArgs; | ||
mgmt_log("[main] Traffic Server Args: '%s'\n", lmgmt->proxy_options); | ||
lmgmt->proxy_options = new_proxy_opts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point lmgmt->proxy_options
isn't set, so just copy tsArgs
into the text buffer if it is present.
strcpy(new_proxy_opts, lmgmt->proxy_options); | ||
strcat(new_proxy_opts, bind_stderr_opt); | ||
strcat(new_proxy_opts, bind_stderr); | ||
delete lmgmt->proxy_options; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not mix new/delete and malloc/free here. But this problem goes away with the text buffer.
@@ -53,6 +54,9 @@ | |||
// global variable | |||
CallbackTable *local_event_callbacks; | |||
|
|||
// Used to get any proxy options that traffic_manager might want to tack on | |||
std::function<void(std::vector<char*>&)> proxy_options_callback; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this.
@@ -74,7 +74,7 @@ class LocalManager : public BaseManager | |||
void signalAlarm(int alarm_id, const char *desc = NULL, const char *ip = NULL); | |||
|
|||
void processEventQueue(); | |||
bool startProxy(); | |||
bool startProxy(char *onetime_options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const char *
.
*/ | ||
bool | ||
LocalManager::startProxy() | ||
LocalManager::startProxy(char *onetime_options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const char *
@@ -902,8 +907,11 @@ LocalManager::startProxy() | |||
Vec<char> real_proxy_options; | |||
|
|||
real_proxy_options.append(proxy_options, strlen(proxy_options)); | |||
real_proxy_options.append(" ", strlen(" ")); | |||
real_proxy_options.append(onetime_options, strlen(onetime_options)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's allow onetime_options
to be NULL, so
if (onetime_options && *onetime_options) {
...
}
just_started = 0; | ||
sleep_time = 0; | ||
} else { | ||
mgmt_log("in ProxyStateSet else branch"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please :)
I think the delay issue is either:
|
This change was made because the ProxyStateSet() call was sleeping in the same thread as traffic_manager was running, so the condition in the do/while loop could never be updated by traffic_manager. The solution is to differentiate API calls and local calls to CoreAPI by using the coreapi_sleep flag.
The actual issue was suspected issue #1, where there was a sleeping deadlock scenario. See commit message for details. Other than that most recent change, I think the patch is ready for final review. |
|
||
lmgmt->run_proxy = true; | ||
lmgmt->listenForProxy(); | ||
lmgmt->startProxy(tsArgs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you actually need to do the sleeping stuff here? Since you now call LocalManager:: startProxy()
directly, you have the return value to know that it succeeded. I don't think that the contract for this API needs to include waiting for a message.
return lmgmt->startProxy(tsArgs) ? TS_ERR_OKAY : TS_ERR_FAIL;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good observation, I'm kicking myself for not seeing it before.
|
||
if (!strstr(proxy_options, MGMT_OPT)) { // Make sure we're starting the proxy in mgmt mode | ||
// Make sure we're starting the proxy in mgmt mode | ||
if (!strstr(proxy_options, MGMT_OPT) && !strstr(onetime_options, MGMT_OPT)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer strstr(...) == 0
. It's just that little bit more readable.
@@ -189,6 +189,7 @@ LocalManager::LocalManager(bool proxy_on) : BaseManager(), run_proxy(proxy_on), | |||
proxy_launch_outstanding = false; | |||
mgmt_shutdown_outstanding = MGMT_PENDING_NONE; | |||
proxy_running = 0; | |||
coreapi_sleep = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the other comment, I think we can get away without this.
We no longer need to sleep inside ProxyStateSet because we're directly calling lmgmt->startProxy instead of setting the run_proxy flag and waiting for the TM thread to kick off the lmgmt->startProxy call.
[approve ci] |
FreeBSD build failed! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/1157/ for details. |
Linux build failed! See https://ci.trafficserver.apache.org/job/Github-Linux/1050/ for details. |
@danobi Since this covers 2 JIRAs, please designate one of them for the fix, and mark the other as a duplicate. |
I couldn't figure out where the dup option was so I changed TS-4400 to a subtask of TS-4399. Also, can you kick off the CI again? I think I fixed the build. |
|
||
if (!strstr(proxy_options, MGMT_OPT)) { // Make sure we're starting the proxy in mgmt mode | ||
// Make sure we're starting the proxy in mgmt mode | ||
if (strstr(proxy_options, MGMT_OPT) == 0 && strstr(onetime_options, MGMT_OPT) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you just check real_proxy_options
here? If onetime_options
aren't already in there, they won't be added anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
real_proxy_options
is the mysterious Vec
class. AFAICT, I suppose we could, but it would require more allocation.
TS-4399: Management API breaks diagnostic log rotation
TS-4400: TSProxyStateSet persist cache clearing across restart
The two issues are related in that they both deal with the
management API not correctly handling proxy flags.
For TS-4399, it was because the management API was not aware
of traffic_manager setting extra proxy options. This was fixed
by providing CoreAPI a callback to get extra proxy options from
traffic_manager.
For TS-4400, it was because the management API was not properly
clearing optional flags between proxy reboots. This was fixed
by resetting the proxy options before each reboot.