Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OS stats #12049

Merged
merged 1 commit into from Jul 8, 2015
Merged

Update OS stats #12049

merged 1 commit into from Jul 8, 2015

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Jul 6, 2015

This pull request:

  • removes the sigar-specific OS stats:
    • os.uptime_in_millis
    • os.cpu
    • os.cpu.sys
    • os.cpu.user
    • os.cpu.idle
    • os.cpu.usage
    • os.cpu.stolen
    • os.mem.actual_free_in_bytes
    • os.mem.actual_used_in_bytes
    • os.cpu.vendor
    • os.cpu.model
    • os.cpu.mhz
    • os.cpu.total_cores
    • os.cpu.total_sockets
    • os.cpu.cores_per_socket
    • os.cpu.cache_size
  • add default implementation based on JMX beans for other OS stats
  • move some process fields from Nodes Info to Nodes Stats API (os.mem.total, os.swap.total)
  • add documentation for OS stats & OS info

Besides os.name we could also provide os.arch and os.version using org.apache.lucene.util.Constants.

@tlrx
Copy link
Member Author

tlrx commented Jul 7, 2015

Note: similar to what has been done in #12043, we can also use OsProbe as a singleton here and call it in Bootstrap class before security manager is installed.

// not available

}
getFreePhysicalMemorySize = method;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a helper method that gets the method and makes it accessible? and then have getFreePhysicalMemorySize = getMethod("getFreePhysicalMemorySize "). This will simplify the code, and also fix the bug here where if we fail one of the following code fails, we will assign the same method to a different method handler

@kimchy
Copy link
Member

kimchy commented Jul 7, 2015

left a few comments, I would also change to only have "getters" in our Info/Stats classes (we are moving towards that). Also, great idea on having arch and such in os info

@tlrx tlrx force-pushed the update-os-stats branch 2 times, most recently from d2dd014 to b3e3949 Compare July 8, 2015 14:46
@tlrx
Copy link
Member Author

tlrx commented Jul 8, 2015

@kimchy I updated the code with your comments, including os.arch/os.version and comments from #12043 (catch throwable, benchmark class, updated doc). Can you have a look please?

Benchmark results (100 000 iterations for each method):

testing 'getTotalPhysicalMemorySize' method...
total [1290] ms, avg [0.0129] ms

testing 'getFreePhysicalMemorySize' method...
total [1269] ms, avg [0.01269] ms

testing 'getTotalSwapSpaceSize' method...
total [20] ms, avg [2.0E-4] ms

testing 'getFreeSwapSpaceSize' method...
total [20] ms, avg [2.0E-4] ms

testing 'getSystemLoadAverage' method...
total [256] ms, avg [0.00256] ms

@kimchy
Copy link
Member

kimchy commented Jul 8, 2015

LGTM.

I think @rcmuir mentioned it on the other thread, but it would be nice (maybe in another change) to have the tests verify we get non -1 values for platform we know we support it, and -1 for platforms we know is not supported. This will help know where we need to potentially go the extra mile and see how we can support it for certain platform.

@tlrx tlrx merged commit 19e348a into elastic:master Jul 8, 2015
@kevinkluge kevinkluge removed the review label Jul 8, 2015
@tlrx
Copy link
Member Author

tlrx commented Jul 8, 2015

@kimchy thanks!

I think @rcmuir mentioned it on the other thread, but it would be nice (maybe in another change) to have the tests verify we get non -1 values for platform we know we support it, and -1 for platforms we know is not supported.

I agree, this is a very pertinent suggestion. I'll do it in another pull request.

@tlrx tlrx mentioned this pull request Jul 8, 2015
@tlrx tlrx deleted the update-os-stats branch August 5, 2015 09:04
@NickCraver
Copy link

The load_average portion of this change introduces a fair bit of pain in monitoring. While many things moved from the 1.x API, that's not a pain point - that's a "if X is null, use Y" solution. This is the only hard breaking type change. The load_average going from an array to a single value is much more painful in most static languages; since the same type can't be used, one must duplicate quite a bit of code or get crazy with generics to address something this nested in the API. It'd be much easier if this was still an array with a single value.

The change also presents problems for the future, if we can actually get all 3 load values down the road then the API has to be changed back and cause yet another break. Was this considered with changes like this?

It's a bit late to actually fix it now, but I'd like for these breaks to be considered a bit harder with future changes. With the monitoring APIs specifically most consumers need to support multiple major versions at the same time.

jasontedor added a commit that referenced this pull request Jan 12, 2016
Reintroduce five-minute and fifteen-minute load averages on Linux

Relates #12049, relates #14741
jasontedor added a commit that referenced this pull request Jan 28, 2016
@javanna javanna mentioned this pull request Mar 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants