Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #3203 usage server broken in 4.11/4.12 #3207

Closed

Conversation

Projects
None yet
7 participants
@GabrielBrascher
Copy link
Member

GabrielBrascher commented Mar 11, 2019

Fixes issue #3203.

Due to the migration to systemd, the configuration -Dpid=$$ does not return the pid value but the string '$$'; thus, the _pid variable received '$$', causing a silenced exception.

This PR adds the file /etc/cloudstack/usage/cloudstack-usage.service.pid that has the process pid. The file is configured after the ExecStart, by running the systemctl show command configured as:

ExecStartPost=/bin/sh -c "systemctl show -p MainPID cloudstack-usage.service 2>/dev/null | cut -d= -f2 > /etc/cloudstack/usage/cloudstack-usage.service.pid"

Description

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

How Has This Been Tested?

CloudStack usage service run as expected after updating with the changes from this PR.

the java application for the cloudstack-usage.service was retrieving
'$$'; thus, the _pid variable received '$$', causing an exception.
@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Mar 11, 2019

@blueorangutan package

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Mar 11, 2019

@GabrielBrascher a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@GabrielBrascher GabrielBrascher changed the title Fix #3203 usage server broken in 4.12.0.0 Fix #3203 usage server broken in 4.11 / 4.12 Mar 11, 2019

@GabrielBrascher GabrielBrascher changed the title Fix #3203 usage server broken in 4.11 / 4.12 Fix #3203 usage server broken in 4.11/4.12 Mar 11, 2019

@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Mar 11, 2019

@blueorangutan package

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Mar 11, 2019

@GabrielBrascher a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@konstantinjfk

This comment has been minimized.

Copy link

konstantinjfk commented Mar 11, 2019

Would you think to put pid file to the /run folder? like /run/cloudstack-usage.service.pid

@wido

This comment has been minimized.

Copy link
Contributor

wido commented Mar 11, 2019

I would say that /run or /var/run would be better as well.

But why does the usage server actually needs it's PID?

@andrijapanicsb

This comment has been minimized.

Copy link

andrijapanicsb commented Mar 11, 2019

I would also ask (sorry if silly question) - why any difference to 4.11.2 at all (did the code around systemd changed) ? - i.e. I observe no pid with 4.11.2, as far as I can see.

But definitively (if PID required) would put it to default location as /var/run/*

Happy to test, just want to avoid doing it 2 times... ?

@ustcweizhou

This comment has been minimized.

Copy link
Contributor

ustcweizhou commented Mar 11, 2019

I agree with @wido @konstantinjfk that it would be better to save pid file as "/var/run/cloudstack-usage.service.pid"

I have applied @GabrielBrascher 's patch in cloudstack 4.11.2, changed pid file to "/var/run/cloudstack-usage.service.pid" and tested it. all seems fine.

so this pull request is good for me, except the pid file location.

thanks @GabrielBrascher for your work.

Change pid file to /var/run/cloudstack-usage.service.pid and throw
exception in case of null or blank pid (NumberUtils.toInt returns zero
if the conversion fails).
@ustcweizhou

This comment has been minimized.

Copy link
Contributor

ustcweizhou commented Mar 11, 2019

code looks good to me.

@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Mar 11, 2019

Thanks for the feedback @wido @andrijapanic @konstantinjfk @ustcweizhou!
I have changed the pid file to /var/run/cloudstack-usage.service.pid.

@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Mar 11, 2019

@wido, as far as I know, the implementation used the pid to have a unique id regarding the cloudstack-usage.service (in case of multiple cloudstack-usage nodes running). To change this approach some refactoring is needed.

Regarding the codebase (prior to this PR), the exception was happening at line 274 from UsageManagerImpl.java.
_pid = Integer.parseInt(System.getProperty("pid")); and silenced at the UsageServer as it would print the stack only in case of running the service from the command line and not a systemd service (which was not the case for any of us). Explaining why the usage.log did not print details and/or an exception.

try {
    ComponentContext.initComponentsLifeCycle();
} catch (Exception e) {
    e.printStackTrace();
}

If the _pid is not a valid pid for the service (for instance zero) the runInContextInternal execution flow is ignored.

UsageJobVO job = _usageJobDao.isOwner(_hostname, _pid);

@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Mar 11, 2019

@andrijapanic the _pid is there at 4.11.2. However, I am not sure if the service was alrady migrate to systemd at 4.11.2 or it was migrated in the 4.11 branch after 4.11.2 was released.
https://github.com/apache/cloudstack/blob/4.11.2.0/usage/src/com/cloud/usage/UsageManagerImpl.java#L353

@andrijapanicsb

This comment has been minimized.

Copy link

andrijapanicsb commented Mar 11, 2019

@GabrielBrascher thx a lot for the fixes ! I'll wait for commit and kick packaging once more.

@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Mar 11, 2019

Fixed! Thanks for the help @andrijapanicsb!

@GabrielBrascher GabrielBrascher force-pushed the PCextreme:usage-aggregation-issue branch from 3510470 to 4c948e0 Mar 11, 2019

@wido

wido approved these changes Mar 11, 2019

Copy link
Contributor

wido left a comment

LGTM

@GabrielBrascher GabrielBrascher force-pushed the PCextreme:usage-aggregation-issue branch from 4d7ad0f to d9d3da9 Mar 11, 2019

@ustcweizhou

This comment has been minimized.

Copy link
Contributor

ustcweizhou commented Mar 11, 2019

@GabrielBrascher could you squash the commits to one commit ?

@GabrielBrascher GabrielBrascher force-pushed the PCextreme:usage-aggregation-issue branch from d9d3da9 to a42bc9f Mar 11, 2019

@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Mar 11, 2019

@ustcweizhou sure, I will Squash when merging (with the Squash and merge).

@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Mar 11, 2019

@blueorangutan package

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Mar 11, 2019

@GabrielBrascher a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Mar 11, 2019

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2624

@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Mar 12, 2019

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Mar 12, 2019

@GabrielBrascher a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@@ -15,8 +15,18 @@
# specific language governing permissions and limitations
# under the License.

JAVA_OPTS="-Dpid=$$ -Xms256m -Xmx2048m"

This comment has been minimized.

Copy link
@rhtyd

rhtyd Mar 12, 2019

Member

@GabrielBrascher What's the issue, this should work?

This comment has been minimized.

Copy link
@rhtyd

rhtyd Mar 12, 2019

Member

Nevermind, re-read the description. Alright keep this change, I'll advise you a simpler fix.

This comment has been minimized.

Copy link
@rhtyd

rhtyd Mar 12, 2019

Member

Btw, this without quotes would work

@@ -24,7 +24,9 @@ After=network.target network-online.target
[Service]
Type=simple
EnvironmentFile=/etc/default/cloudstack-usage
ExecStart=/usr/bin/java $JAVA_OPTS -cp $CLASSPATH $JAVA_CLASS

This comment has been minimized.

Copy link
@rhtyd

rhtyd Mar 12, 2019

Member

Put this and this would fix the pid issue:

Environment=JAVA_PID=$$
ExecStart=/usr/bin/java $JAVA_DEBUG -Dpid=$JAVA_PID $JAVA_OPTS -cp $CLASSPATH $JAVA_CLASS

This comment has been minimized.

Copy link
@rhtyd

rhtyd Mar 12, 2019

Member

Comment: this would need to be fixed for 4.11 as well, let me send a quick fix instead.

@rhtyd
Copy link
Member

rhtyd left a comment

Too complicated, I'll submit a simpler change.

@rhtyd

This comment has been minimized.

Copy link
Member

rhtyd commented Mar 12, 2019

Please review/test - #3210

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Mar 12, 2019

Trillian test result (tid-3416)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 28344 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3207-t3416-kvm-centos7.zip
Smoke tests completed. 70 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
@andrijapanicsb

This comment has been minimized.

Copy link

andrijapanicsb commented Mar 12, 2019

I did not review this one (test it), but already had env for #3210 and confirm it works fine - usage jobs does kick in as it should.

@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Mar 12, 2019

Closing this in favor of #3210

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.