Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipmi plugin: add more analog sensors support #2071

Merged
merged 2 commits into from Oct 20, 2017

Conversation

vmytnykx
Copy link
Contributor

@vmytnykx vmytnykx commented Dec 1, 2016

  • Add more analog sensors like 'PS1 Input Power', 'System Airflow' etc. The sensor type mapping => collectd db type done using sensor base unit type (+ percentage sensor type).
  • Avoid reading value of discrete sensors by ipmi_sensor_id_get_reading() function.

Example of analog sensors:

# ipmitool sensor get 'PS1 Input Power'
Locating sensor record...
Sensor ID              : PS1 Input Power (0x54)
 Entity ID             : 10.1
 Sensor Type (Threshold)  : Other
 Sensor Reading        : 0 (+/- 0) Watts
 Status                : ok
 Lower Non-Recoverable : na
 Lower Critical        : na
 Lower Non-Critical    : na
 Upper Non-Critical    : 868.000
 Upper Critical        : 920.000
 Upper Non-Recoverable : na
 Positive Hysteresis   : Unspecified
 Negative Hysteresis   : Unspecified
 Assertion Events      :
 Assertions Enabled    : unc+ ucr+
 Deassertions Enabled  : unc+ ucr+

# ipmitool sensor get 'System Airflow'
Locating sensor record...
Sensor ID              : System Airflow (0x11)
 Entity ID             : 23.1
 Sensor Type (Threshold)  : Other
 Sensor Reading        : 54 (+/- 0) CFM
 Status                : ok
 Lower Non-Recoverable : na
 Lower Critical        : na
 Lower Non-Critical    : na
 Upper Non-Critical    : na
 Upper Critical        : na
 Upper Non-Recoverable : na
 Positive Hysteresis   : Unspecified
 Negative Hysteresis   : Unspecified
 Assertion Events      :
 Assertions Enabled    :

Example of discrete sensors which are added to internal ipmi list on init but falied to read using ipmi_sensor_id_get_reading() func:

# ipmitool sensor get 'Fan 1 Present'
Locating sensor record...
Sensor ID              : Fan 1 Present (0x40)
 Entity ID             : 29.1
 Sensor Type (Discrete): Fan
 States Asserted       : Availability State
                         [Device Present]

# ipmitool sensor get 'P1 VRD Hot'
Locating sensor record...
Sensor ID              : P1 VRD Hot (0x90)
 Entity ID             : 3.1
 Sensor Type (Discrete): Temperature

Regards,
Volodymyr

@rpv-tomsk
Copy link
Contributor

rpv-tomsk commented Dec 5, 2016

#2024 and rpv-tomsk@ba30d9b

@maryamtahhan
Copy link
Contributor

@rpv-tomsk is there any feedback on this patchset? I don't quite understand the comment

@rpv-tomsk
Copy link
Contributor

I don't quite understand the comment

I'm about we are doing the same work ) and also I want to point you to my work/patch, just you to be informed about its existence.

My feedback/opinion about your changes:

  /* check if this is a percentage */
  if (ipmi_sensor_get_percentage(sensor))
    return "percent";

In IPMI, percentage is a flag, not a unit itself. I don't like idea to mix different units into one "percent" type. It will be hard to group different metrics of percent type on hubs / at the side of displaying the graphs.

Maybe it will be better to add '_percent' suffix to type in that case?

What do you think about this?

@vmytnykx
Copy link
Contributor Author

Hi @rpv-tomsk,

Thank you for your feedbacks.

I agree that mixing different units into one "percent" type is not a best idea. But, the sensors with "percenage" flag set have "unspecified" unit type. So, in that case I suggest to use sensor type + _percentage suffix. E.g.: memory_percentage, temperature_percentage etc.

Regards,
Volodymyr

@vmytnykx
Copy link
Contributor Author

Hi @rpv-tomsk,

Actually, we cannot use my suggestion above, as ipmi plugin may select sensors (using precent flag) that may be unsupported by collectd type db. See example of such sensor below:

# ipmitool sensor get 'MTT CPU2'
Locating sensor record...
Sensor ID              : MTT CPU2 (0x35)
 Entity ID             : 3.2
 Sensor Type (Threshold)  : Memory
 Sensor Reading        : 0 (+/- 0) percent
 Status                : ok
 Lower Non-Recoverable : 0.000
 Lower Critical        : 0.000
 Lower Non-Critical    : 0.000
 Upper Non-Critical    : 0.000
 Upper Critical        : 0.000
 Upper Non-Recoverable : 0.000
 Positive Hysteresis   : Unspecified
 Negative Hysteresis   : Unspecified
 Assertion Events      :
 Assertions Enabled    :

Sensor has "memory" type and provide only "percent" flag. Unit type of this sensor is "unspecified". For this reason, we cannot build and use memory_percent type as it isn't supported by collectd type db. So, you are correct, we should only use type + _percent suffix in that case.

Also, I think it isn't correct to select sensors by just presense of 'percent' falg. Thus, I'm going to make appropriate changes the PR to fix all described issues.

Thanks and Regards,
Volodymyr

@vmytnykx
Copy link
Contributor Author

Hi @rpv-tomsk,

I've removed logic that selects sensors by presence of of 'percent' falg. Right now, when the sensor provides percentage flag, the type of the sensor sets to type + _percent as we discussed above. Also, the support of memory sensor type has been added in usual way (due to the last changes).
Thanks for your feedback.

Regards,
Volodymyr

@maryamtahhan maryamtahhan mentioned this pull request Sep 27, 2017
19 tasks
@octo octo added this to the 5.8 milestone Oct 6, 2017
@octo octo added the Feature label Oct 6, 2017
Copy link
Member

@octo octo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vmytnykx,

thank you very much for your PR! Overall this looks good with some comments inline :)

Best regards,
—octo

src/ipmi.c Outdated
@@ -200,6 +211,19 @@ static void sensor_read_handler(ipmi_sensor_t *sensor, int err,
plugin_dispatch_values(&vl);
} /* void sensor_read_handler */

static const char *sensor_get_db_type(ipmi_sensor_t *sensor) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this function is not very descriptive. How about sensor_unit_to_type?

Copy link
Contributor

@rpv-tomsk rpv-tomsk Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far I remember, use of only one ipmi_sensor_get_base_unit() is not enough for a such task.
The sensor, besides the base unit, also has a 'modifier' and 'rate' values (ipmi_sensor_get_modifier_unit() and ipmi_sensor_get_rate_unit() respectively). May be these values are rare used, but I think we need to check they anyway - otherwise incorrect result can be obtained.

src/ipmi.c Outdated
/* if sensor provides the percentage value, add "_percent" suffix to the
* sensor collectd type */
if (ipmi_sensor_get_percentage(sensor)) {
ssnprintf(percent_type, sizeof(percent_type), "%s_percent", type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the existing percent type instead of introducing more specialized foo_percent types. The original type should go into the type instance. For example, "power-ac0" (type: "power", type instance: "ac0") would become "percent-power-ac0" (type: "percent", type instance: "power-ac0").

It's best to think of collectd's types as a "unit label", if that makes sense.

Copy link
Contributor

@rpv-tomsk rpv-tomsk Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike such a mixing of different types into one 'percent' type.
IMHO, personally I would not use such implementation in my systems. As for me, direct type without 'percent' suffix will be better than 'percent' type.

src/ipmi.c Outdated
/* if sensor provides the percentage value, add "_percent" suffix to the
* sensor collectd type */
if (ipmi_sensor_get_percentage(sensor)) {
ssnprintf(percent_type, sizeof(percent_type), "%s_percent", type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use snprintf(). The ssnprintf() wrapper has been removed from master.

@octo
Copy link
Member

octo commented Oct 8, 2017

P.S.: Don't worry about the merge conflicts, we'll take care of them after the review concluded.

@rpv-tomsk
Copy link
Contributor

P.S.: Don't worry about the merge conflicts, we'll take care of them after the review concluded.

This PR and mine #2024 do the same thing by a different ways.
This implementation adds a 'is discrete' flag into 'sensors' list, mine prohibits adding discrete sensors into that list.
This PR request logic: that may be used at future implementations.
Mine logic: it makes no sensce of a such adding - these sensors are not used later in current implementation.

So, one of implementation completely contradicts the other.

src/ipmi.c Outdated
@@ -275,10 +306,17 @@ static int sensor_list_add(ipmi_sensor_t *sensor) {
type = "fanspeed";
break;

default: {
const char *sensor_type_str;
case IPMI_SENSOR_TYPE_MEMORY:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please show 'ipmitool' output for such a sensor?
For example, by running sensor get SENSOR_NAME ?

Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rpv-tomsk, see me comment above for this example #2071 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Sorry, I have missed that comment.
What metric that sensor represents? What is 'MTT CPU'?
What the value of this sensor, for example, 50% , would mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In nutshell, this sensor represents "Memory Thermal Throttling" and is related to memory thermal management system. Based on the DIMM thermal conditions it may restrict read and write traffic/bandwidth to main memory as a means of controlling power consumption. This metric is measured as a percentage and 0% means no memory throttling occurs. When thermal conditions are going high, the memory management system enables throttling and restricts the read or write traffic (e.g. 50%).

@maryamtahhan
Copy link
Contributor

@rpv-tomsk

So, one of implementation completely contradicts the other.

I think the end goal is the same here... discrete sensors don't get read... so from my perspective it's fine to go with your suggestion

@rpv-tomsk
Copy link
Contributor

rpv-tomsk commented Oct 9, 2017

@maryamtahhan

Let's discuss what to do with 'percent' flag?

Also I want to notice one line from types.db:

percent                 value:GAUGE:0:100.1

I have some doubt that IPMI has similar limitation.
I think there can be cases when sensor value will be greater than 100.1.

@rpv-tomsk
Copy link
Contributor

We have a sensor types and we have units of measurements.

We have a 'temperature' type with 'degrees C' unit.
We have a 'fanspeed' type with 'RPM' unit.

Some IPMI sensor types are mapped to Collectd types directly.

As for me, 'percent' is a unit, not a type.
Happily, I have no much sensors with percents: The only one is Intel platform which reports consumed power in sensor of IPMI_SENSOR_TYPE_CURRENT type and value is a percentage.
Of course, nobody measures current in percents and that sensor value means power supply load.
If that will be reported with a 'percent' type that will be not so good as it might be.

Let's solve this task starting from a practical requirements?

@maryamtahhan
Copy link
Contributor

maryamtahhan commented Oct 9, 2017

@rpv-tomsk There are thermal and other sensors on Intel platforms that report values in %...

My understanding of types in collectd is that type can be thought of the unit used to measure a value (but I could be wrong) ...
If we look at some other plugins for precedent: hupepages, virt and the cpu plugins report % as a type for a number of different measurements...
I would be for keeping the generic type as % as @octo suggests, but then either:

  1. using the sensor type as an extension (prefix/suffix to the type instance)...
  2. use metadata for sensor type/sensor name leaving one of these as the type instance...

I would prefer option 1.

As an aside: MTT sensor is the Memory Throttling sensor on Intel platforms... which would be interesting to track from a platform performance perspective to see if an attempt to cool DRAM (by reducing the memory traffic allowed on a bus) is having an impact on the workload running on your platform.

@octo
Copy link
Member

octo commented Oct 9, 2017

  1. using the sensor type as an extension (prefix/suffix to the type instance)...

+1, that's exactly what I wanted to suggest.

@maryamtahhan, @rpv-tomsk How exactly does this PR relate to #2024? Should we close this in favor of #2024 or do they only overlap in the bit that deals with discrete sensors?

@rpv-tomsk
Copy link
Contributor

rpv-tomsk commented Oct 9, 2017

This PR intended to add new mappings of IPMI units to Collectd types.

I tried to solve this task in rpv-tomsk@ba30d9b

@rpv-tomsk
Copy link
Contributor

I would prefer option 1.

What is about a such solution: map sensors with a 'percent' flag to 'percent' type, but allow to configure custom mapping by a sensor name (may be implemented later)?

@vmytnykx
Copy link
Contributor Author

vmytnykx commented Oct 9, 2017

Just want to summarize the remaining work on the changes:

Is my understanding correct?

@rpv-tomsk
Copy link
Contributor

Volodymyr, I think that your summary is correct.

Does anybody will comment mine proposal of configurable mapping? Any opinions?

@vmytnykx
Copy link
Contributor Author

vmytnykx commented Oct 9, 2017

@rpv-tomsk, could you please clarify your proposal. Do you mean to add new bool config option into collectd conf which will enable adding prefix or not? Or you mean new option which will do the whole percent mapping or not depends on the flag?

@octo
Copy link
Member

octo commented Oct 9, 2017

@rpv-tomsk Yes, I'll review #2024

@rpv-tomsk
Copy link
Contributor

rpv-tomsk commented Oct 9, 2017

I propose to discuss adding of something similar to:

<Plugin ipmi>
  <Sensor "CPU2 Vcore system_board (7.2)">
      Type "voltage"
      TypeInstance "cpu2"
   </Sensor>
    <Instance>
       ....
    </Instance>
</Plugin>

That will allow to override all hardcoded mappings (which is done by sensor type or by unit type) in favour of sensor name (the value which is currently reported as a 'type instance').

@maryamtahhan
Copy link
Contributor

maryamtahhan commented Oct 10, 2017

@rpv-tomsk as regards your last proposal, I think you can do this with filtering chains in collectd... so even if you are not happy with the hardcoding you can change it before the metric gets published or even before it hits the cache... at least for the type instance...

@rpv-tomsk
Copy link
Contributor

rpv-tomsk commented Oct 10, 2017

Filtering chains... Some while ago there was a same suggestion about use of filtering chains for setting plugin name / plugin instance in 'generic plugins'.. I prefer to configure all things in one place and in a most simplest way (by a form). Chains are closer to a hack for me, not to a solution.

Replace at filtering chains is not optimal, it is much more optimal to make one match at sensor add, than do matching for each reported metric. Look for this like for presence of IgnoreSelected option - that task also can be solved by a filtering chains (filtering)...

Chains are powerful, of course, but....
That is mine IMHO only : -)

@octo
Copy link
Member

octo commented Oct 11, 2017

How do we proceed with this? #2024 has been merged and this PR needs a rebase, ideally pulling in the existing handling of discrete sensors. Then do a final review?

@vmytnykx
Copy link
Contributor Author

@octo, working on review comments changes now. Then I can re-base these changes based on the latest master. Do you prefer to do final review before re-base or after?

@octo
Copy link
Member

octo commented Oct 11, 2017

Given the complexity of the rebase, I think a review after rebase makes more sense in this case.

- Add support of new analog sensors:
  System Airflow, PS1 Input Power, MTT CPU1, MTT CPU2
- Extend list of sensors to support "memory" sensor type.
- Map sensors with a 'percent' flag to 'percent' type
  and add sensor type as a prefix to type_instance in
  this case. The type_instance of non % sensors will
  remain the same (just sensor name).

Signed-off-by: Mytnyk, Volodymyr <volodymyrx.mytnyk@intel.com>
@vmytnykx
Copy link
Contributor Author

Hi @octo, @rpv-tomsk,

I've addressed the review comments and changed the implementation as discussed in #2071 (comment)

The changes are re-based and squashed into one commit.

Thank you all for review :)

Regards,
Volodymyr

@rpv-tomsk
Copy link
Contributor

Hi, Volodymyr!

Thanks for update and your work on this.
What will you say about to make this message (1f88406#diff-88ee790203c729461340e7b84aa31bb5R451) more informative? The Collectd now tries to handle sensors `units of measurements', but that message does not show us these values. That will not give us enough data for new types mapping.

Mine proposal is to add units values to the message output, like was done here: rpv-tomsk@ba30d9b#diff-88ee790203c729461340e7b84aa31bb5R439

@vmytnykx
Copy link
Contributor Author

Hi @rpv-tomsk, the idea sounds good to me. I will fix it and update the PR.

Regarding the build failure, seems a Jenkins issue:

checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... configure: error: newly created file is older than distributed files!
Check your system clock
+ make -sk check
make: *** No rule to make target 'check'.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Skipping publisher since build result is FAILURE

Thanks and Regards,
Volodymyr

Change-Id: I5fc6056b2d0c5bb5cb1b28c3d24aaca163eb245b
Signed-off-by: Mytnyk, Volodymyr <volodymyrx.mytnyk@intel.com>
@vmytnykx
Copy link
Contributor Author

vmytnykx commented Oct 17, 2017

Hi @rpv-tomsk, I've addressed the comments and updated the PR.

Thanks for your review!

Regards,
Volodymyr

@vmytnykx vmytnykx closed this Oct 17, 2017
@vmytnykx vmytnykx reopened this Oct 17, 2017
@rpv-tomsk rpv-tomsk merged commit 7e86094 into collectd:master Oct 20, 2017
@rpv-tomsk
Copy link
Contributor

Hi Volodymyr!

Thanks for your work on this!

@maryamtahhan
Copy link
Contributor

Whoop whoop! Great work all, thanks for the perseverance

@maryamtahhan maryamtahhan deleted the feat_ipmi_analog branch October 20, 2017 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants