Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TS-4399 TS-4400 Management API messes up proxy options #1073

Closed
wants to merge 11 commits into from

Conversation

danobi
Copy link
Member

@danobi danobi commented Oct 3, 2016

TS-4399: Management API breaks diagnostic log rotation
TS-4400: TSProxyStateSet persist cache clearing across restart

The two issues are related in that they both deal with the
management API not correctly handling proxy flags.

For TS-4399, it was because the management API was not aware
of traffic_manager setting extra proxy options. This was fixed
by providing CoreAPI a callback to get extra proxy options from
traffic_manager.

For TS-4400, it was because the management API was not properly
clearing optional flags between proxy reboots. This was fixed
by resetting the proxy options before each reboot.

TS-4399: Management API breaks diagnostic log rotation
TS-4400: TSProxyStateSet persist cache clearing across restart

The two issues are related in that they both deal with the
management API not correctly handling proxy flags.

For TS-4399, it was because the management API was not aware
of traffic_manager setting extra proxy options. This was fixed
by providing CoreAPI a callback to get extra proxy options from
traffic_manager.

For TS-4400, it was because the management API was not properly
clearing optional flags between proxy reboots. This was fixed
by resetting the proxy options before each reboot.
@zwoop zwoop added the Manager label Oct 4, 2016
@zwoop zwoop added this to the 7.1.0 milestone Oct 4, 2016
Copy link
Contributor

@jpeach jpeach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it is a good idea to have the management API contain an upward dependency on traffic_manager.

One of the problems here is that the introduction of the --bind_std* options changed the way argument passing works without fully propagating the change. I think that a better approach to explore is making traffic_manager call ProxyStateSet. If we can do that then we only have 1 copy of this code (DRY) and the correct dependencies.

This patch centralizes where traffic_server starts to inside
CoreAPI's ProxyStateSet. This is good because we reduce the
number of places we deal with traffic_server options. Everything
is simply handled by the proxy_options_callback.

Note: Right now, there is about a ~30 delay between ATS starting
up and traffic_ctl commands working. Not sure why.
@danobi
Copy link
Member Author

danobi commented Oct 14, 2016

@jpeach, is this what you were thinking?

Also, between commit 82037 and 3365b, traffic_ctl stopped working immediately after ATS starts up. There is ~30s gap between startup and any traffic_ctl commands working, eg. sudo ./traffic_ctl server stop. I'm still looking into this.

just_started = 0;
sleep_time = 0;
} else {
mgmt_log("in ProxyStateSet else branch");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: remove this line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please :)

Copy link
Contributor

@jpeach jpeach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moving in the right direction, but still adds more complexity than it removes.

Proxy options can come from

  1. traffic_manager -tsArgs (explicit)
  2. proxy.config.proxy_binary_opts (explicit)
  3. traffic_manager --bind_foo (implicit)

The goal is to have a single API ProxyStateSet that does the right thing in exactly 1 place (DRY principle). Using ProxyStateSet from traffic_manager is an improvement. Note that all we really need to persist across restarts are (1) and (3). We already have a LocalManager object that can store the data that we need.

So we can keep LocalManager::proxy_options as the persistent proxy options, and have ProxyStateSet generate any additional options needed for this particular launch. The per-launch options from the API flags and from (2) can be passed directly into LocalManager::startProxy(), which can concatenate them (you just need a Vec<char*> here). This won't require any new global data.

snprintf(lmgmt->proxy_options + l, n, " --%s %s", TM_OPT_BIND_STDERR, bind_stderr);
char *bind_stderr_opt = new char[strlen("--") + strlen(TM_OPT_BIND_STDERR)];
strcpy(bind_stderr_opt, "--");
strcat(bind_stderr_opt, TM_OPT_BIND_STDERR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foo.push_back("--" TM_OPT_BIND_STDERR")

Instead, use proxy_options as persistent proxy options storage.
Then have lmgmt->startProxy() take in an argument for one-time
flags.

This was done to reduce complicated dependencies.
@danobi
Copy link
Member Author

danobi commented Oct 19, 2016

How does that look? Still need to figure out that delay issue.

Copy link
Contributor

@jpeach jpeach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good.

@@ -27,6 +27,7 @@
#include "ts/ink_sock.h"
#include "ts/ink_args.h"
#include "ts/ink_syslog.h"
#include <vector>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this.

@@ -445,6 +446,8 @@ main(int argc, const char **argv)
int binding_version = 0;
BindingInstance *binding = NULL;

std::vector<char*> callback_proxy_options;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this.

strcpy(new_proxy_opts, lmgmt->proxy_options);
strcat(new_proxy_opts, bind_stdout_opt);
strcat(new_proxy_opts, bind_stdout);
delete lmgmt->proxy_options;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace all this string code with TextBuffer.

TextBuffer args;

if (*bind_stdout) {
   const char * space = args.empty() ? "" : " ";
   args.format("%s%s", space, "--" TM_OPT_BIND_STDOUT);
}

...

lmgmt->proxy_options = args.release();

ats_free(lmgmt->proxy_options);
lmgmt->proxy_options = tsArgs;
mgmt_log("[main] Traffic Server Args: '%s'\n", lmgmt->proxy_options);
lmgmt->proxy_options = new_proxy_opts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point lmgmt->proxy_options isn't set, so just copy tsArgs into the text buffer if it is present.

strcpy(new_proxy_opts, lmgmt->proxy_options);
strcat(new_proxy_opts, bind_stderr_opt);
strcat(new_proxy_opts, bind_stderr);
delete lmgmt->proxy_options;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not mix new/delete and malloc/free here. But this problem goes away with the text buffer.

@@ -53,6 +54,9 @@
// global variable
CallbackTable *local_event_callbacks;

// Used to get any proxy options that traffic_manager might want to tack on
std::function<void(std::vector<char*>&)> proxy_options_callback;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this.

@@ -74,7 +74,7 @@ class LocalManager : public BaseManager
void signalAlarm(int alarm_id, const char *desc = NULL, const char *ip = NULL);

void processEventQueue();
bool startProxy();
bool startProxy(char *onetime_options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const char *.

*/
bool
LocalManager::startProxy()
LocalManager::startProxy(char *onetime_options)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const char *

@@ -902,8 +907,11 @@ LocalManager::startProxy()
Vec<char> real_proxy_options;

real_proxy_options.append(proxy_options, strlen(proxy_options));
real_proxy_options.append(" ", strlen(" "));
real_proxy_options.append(onetime_options, strlen(onetime_options));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's allow onetime_options to be NULL, so

if (onetime_options && *onetime_options) {
  ...
}

just_started = 0;
sleep_time = 0;
} else {
mgmt_log("in ProxyStateSet else branch");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please :)

@danobi
Copy link
Member Author

danobi commented Oct 28, 2016

I think the delay issue is either:

  1. The mgmt_sleep_sec(1) loop in ProxyStateSet b/c the main traffic_manager loop needs to run for the lmgmt->proxy_running flag to flip
  2. Some kind of quirk with running lmgmt->listenForProxy()

This change was made because the ProxyStateSet() call was sleeping
in the same thread as traffic_manager was running, so the
condition in the do/while loop could never be updated by traffic_manager.

The solution is to differentiate API calls and local calls to
CoreAPI by using the coreapi_sleep flag.
@danobi
Copy link
Member Author

danobi commented Oct 31, 2016

The actual issue was suspected issue #1, where there was a sleeping deadlock scenario. See commit message for details.

Other than that most recent change, I think the patch is ready for final review.


lmgmt->run_proxy = true;
lmgmt->listenForProxy();
lmgmt->startProxy(tsArgs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually need to do the sleeping stuff here? Since you now call LocalManager:: startProxy() directly, you have the return value to know that it succeeded. I don't think that the contract for this API needs to include waiting for a message.

return lmgmt->startProxy(tsArgs) ? TS_ERR_OKAY : TS_ERR_FAIL;

Copy link
Member Author

@danobi danobi Nov 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good observation, I'm kicking myself for not seeing it before.


if (!strstr(proxy_options, MGMT_OPT)) { // Make sure we're starting the proxy in mgmt mode
// Make sure we're starting the proxy in mgmt mode
if (!strstr(proxy_options, MGMT_OPT) && !strstr(onetime_options, MGMT_OPT)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer strstr(...) == 0. It's just that little bit more readable.

@@ -189,6 +189,7 @@ LocalManager::LocalManager(bool proxy_on) : BaseManager(), run_proxy(proxy_on),
proxy_launch_outstanding = false;
mgmt_shutdown_outstanding = MGMT_PENDING_NONE;
proxy_running = 0;
coreapi_sleep = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the other comment, I think we can get away without this.

We no longer need to sleep inside ProxyStateSet because
we're directly calling lmgmt->startProxy instead of setting
the run_proxy flag and waiting for the TM thread to kick off
the lmgmt->startProxy call.
@jpeach
Copy link
Contributor

jpeach commented Nov 3, 2016

[approve ci]

@atsci
Copy link

atsci commented Nov 3, 2016

FreeBSD build failed! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/1157/ for details.

@atsci
Copy link

atsci commented Nov 3, 2016

Linux build failed! See https://ci.trafficserver.apache.org/job/Github-Linux/1050/ for details.

@jpeach
Copy link
Contributor

jpeach commented Nov 4, 2016

@danobi Since this covers 2 JIRAs, please designate one of them for the fix, and mark the other as a duplicate.

@danobi
Copy link
Member Author

danobi commented Nov 4, 2016

I couldn't figure out where the dup option was so I changed TS-4400 to a subtask of TS-4399.

Also, can you kick off the CI again? I think I fixed the build.


if (!strstr(proxy_options, MGMT_OPT)) { // Make sure we're starting the proxy in mgmt mode
// Make sure we're starting the proxy in mgmt mode
if (strstr(proxy_options, MGMT_OPT) == 0 && strstr(onetime_options, MGMT_OPT) == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you just check real_proxy_options here? If onetime_options aren't already in there, they won't be added anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

real_proxy_options is the mysterious Vec class. AFAICT, I suppose we could, but it would require more allocation.

@jpeach jpeach closed this in eac31a4 Nov 18, 2016
@zwoop zwoop modified the milestone: 7.1.0 May 4, 2017
@zwoop zwoop unassigned jpeach May 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants