New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specification: Adding network details in hardware inventory #19
Conversation
@shtripat @r0h4n @shtripat @nnDarshan please review it |
1125d33
to
dc133ed
Compare
@r0h4n @shtripat @nnDarshan please review |
* Modify persister.py to persist the Network object in etcd. | ||
|
||
== Alternatives | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can mention about the few other network-utilities that you referred, also provide info about why you choose this over that.
* Create a new class called Network in persistence with respective attribute. | ||
* Modify manager.py to populate network details from node_inventory list to | ||
Network object. | ||
* Modify persister.py to persist the Network object in etcd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a bit implementation specific, can you have more like an overview of what you are going to change ?
* tendrl/node_agent/persistence/persister.py | ||
* requirements.txt | ||
* tendrl-node-agent.spec | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you write a couple of sentences about the impact, instead of mentioning the file names ?
|
||
None | ||
|
||
== Notifications/Monitoring impact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might have to monitor the status of the network interface, and alert when interfaces goes down of comes up. May be in future when alerting framework is ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. Whenever a state change is detected, the same needs to go to etcd's /alerts directory and correspondingly a custom handler needs to be written to handle the event.
Ideally, the procedure is anyone who detects an alert writes it to a socket that is opened by node-agent and the node-agent writes this alert to etcd after the corresponding handler has massaged it.
But since this alert is detected in node-agent itself, the node-agent can directly invoke an event handling gateway function(which will anyway be invoked for the messages in socket in usual path) which will take care of the rest of the things. Offcourse a custom handler to handle this event in a custom manner if required also needs to be written.
I would suggest to refer section 'Data Model Impact' of https://github.com/Tendrl/specifications/pull/18/files for the details about alert structure(Its still in review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anmolbabu @nnDarshan does this needs to covered as part of this? I think it come under the scope of another spec which talks about the alerting/monitoring of disks. Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alerting is just a part of the story. What about refreshing our inventory when the hardware changes, in general? Alerting is a side effect of these changes. The first impact must always be on our state store itself. Theoretically it's possible to set monitors on the changes in the state store and generate alerts there, but that's a different issue altogether. I'll review the alerting specification and see if it answers my question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the alerting spec. It does not answer my question. How are we tracking changes to the hardware inventory? We may need a short and a long term solution to this problem. @anmolbabu @nthomas-redhat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brainfunked Two ways you can handle this:
- Process the external events send by the agents(Console-2 was depending on the events from storaged)
- During the inventory sync, infer the change and emit alerts based on that
Utilisation data(any;network, disk etc.) collection and analysis(and generating alerts) is the responsibility of the monitoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In agreement with @nthomas-redhat
|
||
== Assignee(s) | ||
|
||
Primary assignee: TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mention your git hub name here, as you have already sent a PR for this
None | ||
|
||
== Data model impact | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can mention about the path where this details is stored in etcd ?
|
||
* Adding network details in node hardware inventory | ||
|
||
== Impacted modules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have missed api impact section, there you can mention that api for listing network interface to be added by api-developer.
0d45fe6
to
b7068d9
Compare
@nnDarshan @shtripat @r0h4n @Tendrl/qe please review it |
|
||
Hardware inventory details are used to show exact status of the particular | ||
node. Currently network details are missing in node hardware inventory. | ||
we have to add network details like IPv4, IPv6, netmask as a part of node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we dealing with ipv6 now?
|
||
=== Tendrl API impact: | ||
|
||
Api to get network details v2/keys/nodes/node_id/Network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to have an API to get the network of the node?. I don't think so. Rather this should be part of Node and retrieved through GetNodeList api
|
||
Modify persistence and manager modules to find and store active network | ||
interface details. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the definitions? Do we have an object defined for network?
Also any new atoms needs to be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats correct. We would need an object defined for sure.
ef3efa8
to
5647655
Compare
@nthomas-redhat ipv6 is not required but i feel it is better to keep those details for future use |
@nthomas-redhat @nnDarshan please review it |
@GowthamShanmugam , I am not seeing the changes I have requested. Also looks like the latest version posted here is not right? |
|
||
== Alternatives | ||
|
||
* Netifaces package is used to find active network interface in particular |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is talking about what mechanism you have used. Rather @nnDarshan meant you to add various tools which were evaluated and could be used for getting the details I feel.
|
||
== Data model impact | ||
|
||
Add network details in etcd with key nodes/node_id/Network/interface_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also definitions should get an object called Network I feel
|
||
== Notifications/Monitoring impact | ||
|
||
We might have to monitor the status of the network interface(up/down). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anmolbabu do we need more detail to be captured here??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think things like significance of this alert I assume it would be 'HIGH' can be captured + a sample alert can be populated here by referring to section 'Data Model Impact' of https://github.com/Tendrl/specifications/pull/18/files for the details about alert structure(Its still in review). Apart from that, from the alerting perspective I don't see any more requirement.
But, I have 1 question here, does this need to be graphed (binary status graph with up, down values) for user to see how the interface has been behaving. If yes, we need to maintain this as well in graphite and in such a case we need to see how node-agent can push a statistic to time-series db. Probably the performance-monitoring application needs to expose apis to push stats as well.
An alternative in my opinion is that if this can be implemented as a collectd plugin emitting non-zero higher value for down status and a zero for up(only to benefit from collectd's thresholding plugin that can emit an alert for values greater than configured threshold in our case say 0 or some value greater than 0) we can benefit from the normal statistics data and alert flows. Anyway since the status to int mapping is very much hard-coded within applications in tendrl space, the corresponding interpretations can be done wherever required.
Just that I am somehow not happy with performance-monitoring exposing a way to push any stat to graphite.
@shtripat what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would certainly prefer the option-2 of having a collectd plugin to push data to graphite (if we need, for sure I prefer this to be there in graphite. The only thing we need to see if real estate available for the same in dashboard)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anmolbabu @shtripat , As I commented earlier, this should be handled as part of monitoring, need not come under scope of this specification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nthomas-redhat But I feel if we are doing it from collectd then there is no need to do it from node-agent.
I feel,
- Node-agent can do an initial 1 time sync
- Thereafter, the alerting framework's disk alerts specific handler can set suitable states in etcd as well apart from the normal notification and other stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Node agent owns these objects under its namespace, and expects them as defined under etcd.
If any other Tendrl component or external module (collectd) wants to own the object, it has to implement it as per Tendrl Definitions, meaning: it should look the same in etcd as it was defined under the node-agent namepsace.
So, keep the objects under node-agent and let node-agent manage its state in etcd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anmolbabu Speaking of monitoring, we should track the network utilisation data per interface. It'll be useful for tracking down bottlenecks later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I already have a plugin to calculate network utilization in one of my PRs. And it currently calculates both interface level and average across interfaces and puts both to time-series db.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@nthomas-redhat i will remove IPV6 details from this, and i will add about modification in tendrl_definition |
@nthomas-redhat atoms are not required for this |
5647655
to
c900acb
Compare
@nthomas-redhat @shtripat @nnDarshan all changes are done.. please review it |
|
||
== Proposed change | ||
|
||
* From node_agent add a new function in pull_hardware_inventory to find active |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention about the tool that you will be using to get network details
3017b81
to
c3f65bd
Compare
@nnDarshan done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Good to me. @anmolbabu Could you please comment on the monitoring impact
@shtripat @nthomas-redhat @r0h4n @brainfunked review please |
5db0fa0
to
4913b8f
Compare
@nthomas-redhat @brainfunked @shtripat please review |
@nthomas-redhat @brainfunked @r0h4n please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as as long as ipv6 related things are finalized...
yes @brainfunked told me to collect all the possible details so i added ipv6 also and some other extra details also. |
@nthomas-redhat @brainfunked please review this |
|
||
== Alternatives | ||
|
||
* We can also use ethtool for getting network details. But ethool is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be aware, that ethtool provide very limited information if launched on virtual machine for virtio
type of NIC. I even think, that some information are not available at all for virtio
type of NIC (for example link Speed
).
I know, that this didn't looks like big problem for production environment (where probably all storage nodes will be physical machines), but it might be problem for testing, examination and demo purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dahorak ya i accept your point, it gives very limited details only, in initial stage we excepted ipv4 , ipv6 and subnet mask details only, at that time we just tried this tools.
``` | ||
* Add new object in tendrl defenition file with list. | ||
``` | ||
list: nodes/$Node_context.node.id/Network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the exact structure of the data in etcd? I think that there needs to be a global networks
hierarchy as well, where we maintain a list of networks, under which we point to each node that belongs to this network. This is in addition to the network details maintained under each node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brainfunked
Inside nodes/{node_id}/Network/{interface_name}
value : {"ipv4": ["ipv4address", ...],
"ipv6": ["ipv6address, ..."],
"netmask": ["subnet", ...],
"status":"up/down",
"interface_id": "",
"sysfs_id": "",
"device_link": "",
"interface_type": "",
"model": "",
"driver_modules": "",
"drive": "",
"hw_address": "",
"link_detected": ""
}
this is the structure for network details, This network details is different for the particular node. we cant give node belongs to this network. This network ipv4, ipv6 address will differ for another nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the IP being under the interface name will scale. You'd want to allow clients to directly filter nodes based on IPs and hostnames. You'd also need to enable an index of hostnames to node ids and hostnames to IP addresses. These need to be maintained based on changes in the IP addresses on individual nodes. Essentially, I'm looking at being able to obtain network details without relying on scanning all of the nodes.
Also, my comment is specifically towards https://redhat.invisionapp.com/share/2K8M4PQYZ#/screens/188984130. Here, we need to list the networks detected, and not the IP addresses per node. This is essentially a calculation based on the netmask for each of the detected IPs. It could be done on the fly by the API, or be stored permanently in etcd, should this information need to be known elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- global structure is:
suggestion :1
/networks/{subnet1}/{node_id1}
Value: [
{"node_id": "node_id",
"ipv4": ["ipv4address", ...],
"ipv6": ["ipv6address, ..."],
"netmask": ["subnet", ...],
"status":"up/down",
"interface_id": "",
"sysfs_id": "",
"device_link": "",
"interface_type": "",
"model": "",
"driver_modules": "",
"drive": "",
"hw_address": "",
"link_detected": ""
},
.....]
/networks/{subnet1}/{node_id2}
Value: [
{"node_id" : "node_id",
"ipv4": ["ipv4address", ...],
"ipv6": ["ipv6address, ..."],
"netmask": ["subnet", ...],
"status":"up/down",
"interface_id": "",
"sysfs_id": "",
"device_link": "",
"interface_type": "",
"model": "",
"driver_modules": "",
"drive": "",
"hw_address": "",
"link_detected": ""
},
.....]
Suggestion : 2
(read from etcd using network like /networks/{subnet1} and then append json in list that in list. Then dump in etcd)
/networks/{subnet1}
value: [
{"node_id" : "node_id",
"ipv4": ["ipv4address", ...],
"ipv6": ["ipv6address, ..."],
"netmask": ["subnet", ...],
"status":"up/down",
"interface_id": "",
"sysfs_id": "",
"device_link": "",
"interface_type": "",
"model": "",
"driver_modules": "",
"drive": "",
"hw_address": "",
"link_detected": ""
}, ...
]
/networks/{subnet2}
value: [ {"node_id" : "node_id",
"ipv4": ["ipv4address", ...],
"ipv6": ["ipv6address, ..."],
"netmask": ["subnet", ...],
"status":"up/down",
"interface_id": "",
"sysfs_id": "",
"device_link": "",
"interface_type": "",
"model": "",
"driver_modules": "",
"drive": "",
"hw_address": "",
"link_detected": ""
}, ...]
In second approach we can find no.of interface like len(network1), list length is the number of interface.
- node wise structure is
/nodes/{node_id}/Network/{interface_name}
value: [
{"ipv4": ["ipv4address", ...],
"ipv6": ["ipv6address, ..."],
"netmask": ["subnet", ...],
"status":"up/down",
"interface_id": "",
"sysfs_id": "",
"device_link": "",
"interface_type": "",
"model": "",
"driver_modules": "",
"drive": "",
"hw_address": "",
"link_detected": ""
}
]
Is that global structure is ok?
whether i have to keep node wise and global wise structure or global wise structure is enaf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally feel node level network info is required. Additionally you can have system level sub-net details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GowthamShanmugam Suggestion 2 looks good, also keep both /nodes and global /networks data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@r0h4n ok done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel Suggestion 2 is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @r0h4n.
930f714
to
7edb65d
Compare
@r0h4n @brainfunked @shtripat @nnDarshan please review this i have updated this spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
``` | ||
* Add new object in tendrl defenition file with list. | ||
``` | ||
list: nodes/$Node_context.node.id/Network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel Suggestion 2 is better.
LGTM |
7edb65d
to
bd25a77
Compare
@r0h4n please merge this |
"netmask": ["subnet", ...], | ||
"subnet": "subnet", | ||
"status":"up/down", | ||
"interface_id": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Details such as interface_id, sysfs_id etc. need not be present on a global level. On the global scope, the network information is used to identify which interfaces on which hosts belong to the network and what their IPs are, that's all. All the other details belong to specific network interfaces on specific nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brainfunked ok
``` | ||
/nodes/{node_id}/Network | ||
|
||
{"interface_name": {"ipv4": ["ipv4address", ...], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make it an array of ipaddr/netmask combination. No need to store netmasks separately.
Also, store the aliases and their corresponding addresses in a hash under the aliases key or some such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brainfunked ok
``` | ||
* Add new object in tendrl defenition file with list. | ||
``` | ||
list: nodes/$Node_context.node.id/Network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @r0h4n.
be88545
to
ec9d021
Compare
@brainfunked i have one doubt, if the interface is down then does not have ip and subnet mask, so we can't find out subnet mask, so in node wise i cant store it as subnet wise, better we store it as interface wise. I feel current implementation for node wise network details are ok. |
@brainfunked @r0h4n @shtripat @nthomas-redhat @nnDarshan yours suggestion please |
spcification-issue: Tendrl#41 Signed-off-by: GowthamShanmugam <gshanmug@redhat.com>
ec9d021
to
4be2959
Compare
@brainfunked @shtripat @r0h4n @nthomas-redhat @nnDarshan If i store subnet wise in node level then i can't say interface is up/down |
@nthomas-redhat @r0h4n @shtripat @nnDarshan @brainfunked if we dont want status of interface (up/down) then we can store subnet wise in node level also, please give your comments |
@GowthamShanmugam Please figure out how to uniquely identify networks/subnets/ports(interfaces) check https://github.com/openstack/heat/blob/master/heat/engine/resources/openstack/neutron/port.py https://github.com/openstack/neutron/blob/master/neutron/db/models_v2.py#L74 or check any python module handling networks and see their data model for networks/subnets/ports |
@r0h4n we can merge this |
Signed-off-by: GowthamShanmugam gshanmug@redhat.com