Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specification: Adding network details in hardware inventory #19

Merged
merged 1 commit into from Apr 28, 2017

Conversation

GowthamShanmugam
Copy link
Contributor

Signed-off-by: GowthamShanmugam gshanmug@redhat.com

@GowthamShanmugam
Copy link
Contributor Author

@shtripat @r0h4n @shtripat @nnDarshan please review it

@GowthamShanmugam GowthamShanmugam changed the title Specification for network details in hardware inventory Specification: Adding network details in hardware inventory Dec 7, 2016
@GowthamShanmugam
Copy link
Contributor Author

@r0h4n @shtripat @nnDarshan please review

* Modify persister.py to persist the Network object in etcd.

== Alternatives

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can mention about the few other network-utilities that you referred, also provide info about why you choose this over that.

* Create a new class called Network in persistence with respective attribute.
* Modify manager.py to populate network details from node_inventory list to
Network object.
* Modify persister.py to persist the Network object in etcd.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a bit implementation specific, can you have more like an overview of what you are going to change ?

* tendrl/node_agent/persistence/persister.py
* requirements.txt
* tendrl-node-agent.spec

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you write a couple of sentences about the impact, instead of mentioning the file names ?


None

== Notifications/Monitoring impact
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might have to monitor the status of the network interface, and alert when interfaces goes down of comes up. May be in future when alerting framework is ready.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Whenever a state change is detected, the same needs to go to etcd's /alerts directory and correspondingly a custom handler needs to be written to handle the event.

Ideally, the procedure is anyone who detects an alert writes it to a socket that is opened by node-agent and the node-agent writes this alert to etcd after the corresponding handler has massaged it.

But since this alert is detected in node-agent itself, the node-agent can directly invoke an event handling gateway function(which will anyway be invoked for the messages in socket in usual path) which will take care of the rest of the things. Offcourse a custom handler to handle this event in a custom manner if required also needs to be written.

I would suggest to refer section 'Data Model Impact' of https://github.com/Tendrl/specifications/pull/18/files for the details about alert structure(Its still in review)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anmolbabu @nnDarshan does this needs to covered as part of this? I think it come under the scope of another spec which talks about the alerting/monitoring of disks. Does this make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alerting is just a part of the story. What about refreshing our inventory when the hardware changes, in general? Alerting is a side effect of these changes. The first impact must always be on our state store itself. Theoretically it's possible to set monitors on the changes in the state store and generate alerts there, but that's a different issue altogether. I'll review the alerting specification and see if it answers my question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the alerting spec. It does not answer my question. How are we tracking changes to the hardware inventory? We may need a short and a long term solution to this problem. @anmolbabu @nthomas-redhat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brainfunked Two ways you can handle this:

  1. Process the external events send by the agents(Console-2 was depending on the events from storaged)
  2. During the inventory sync, infer the change and emit alerts based on that

Utilisation data(any;network, disk etc.) collection and analysis(and generating alerts) is the responsibility of the monitoring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In agreement with @nthomas-redhat


== Assignee(s)

Primary assignee: TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention your git hub name here, as you have already sent a PR for this

None

== Data model impact

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can mention about the path where this details is stored in etcd ?


* Adding network details in node hardware inventory

== Impacted modules
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have missed api impact section, there you can mention that api for listing network interface to be added by api-developer.

@GowthamShanmugam
Copy link
Contributor Author

GowthamShanmugam commented Dec 7, 2016

@nnDarshan @shtripat @r0h4n @Tendrl/qe please review it


Hardware inventory details are used to show exact status of the particular
node. Currently network details are missing in node hardware inventory.
we have to add network details like IPv4, IPv6, netmask as a part of node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we dealing with ipv6 now?


=== Tendrl API impact:

Api to get network details v2/keys/nodes/node_id/Network
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to have an API to get the network of the node?. I don't think so. Rather this should be part of Node and retrieved through GetNodeList api


Modify persistence and manager modules to find and store active network
interface details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the definitions? Do we have an object defined for network?
Also any new atoms needs to be added?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats correct. We would need an object defined for sure.

@GowthamShanmugam
Copy link
Contributor Author

@nthomas-redhat ipv6 is not required but i feel it is better to keep those details for future use

@GowthamShanmugam
Copy link
Contributor Author

@nthomas-redhat @nnDarshan please review it

@nthomas-redhat
Copy link
Contributor

nthomas-redhat commented Dec 13, 2016

@GowthamShanmugam , I am not seeing the changes I have requested. Also looks like the latest version posted here is not right?


== Alternatives

* Netifaces package is used to find active network interface in particular
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is talking about what mechanism you have used. Rather @nnDarshan meant you to add various tools which were evaluated and could be used for getting the details I feel.


== Data model impact

Add network details in etcd with key nodes/node_id/Network/interface_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also definitions should get an object called Network I feel


== Notifications/Monitoring impact

We might have to monitor the status of the network interface(up/down).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anmolbabu do we need more detail to be captured here??

Copy link
Contributor

@anmolbabu anmolbabu Dec 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think things like significance of this alert I assume it would be 'HIGH' can be captured + a sample alert can be populated here by referring to section 'Data Model Impact' of https://github.com/Tendrl/specifications/pull/18/files for the details about alert structure(Its still in review). Apart from that, from the alerting perspective I don't see any more requirement.

But, I have 1 question here, does this need to be graphed (binary status graph with up, down values) for user to see how the interface has been behaving. If yes, we need to maintain this as well in graphite and in such a case we need to see how node-agent can push a statistic to time-series db. Probably the performance-monitoring application needs to expose apis to push stats as well.

An alternative in my opinion is that if this can be implemented as a collectd plugin emitting non-zero higher value for down status and a zero for up(only to benefit from collectd's thresholding plugin that can emit an alert for values greater than configured threshold in our case say 0 or some value greater than 0) we can benefit from the normal statistics data and alert flows. Anyway since the status to int mapping is very much hard-coded within applications in tendrl space, the corresponding interpretations can be done wherever required.

Just that I am somehow not happy with performance-monitoring exposing a way to push any stat to graphite.

@shtripat what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would certainly prefer the option-2 of having a collectd plugin to push data to graphite (if we need, for sure I prefer this to be there in graphite. The only thing we need to see if real estate available for the same in dashboard)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anmolbabu @shtripat , As I commented earlier, this should be handled as part of monitoring, need not come under scope of this specification

Copy link
Contributor

@anmolbabu anmolbabu Dec 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nthomas-redhat But I feel if we are doing it from collectd then there is no need to do it from node-agent.

I feel,

  • Node-agent can do an initial 1 time sync
  • Thereafter, the alerting framework's disk alerts specific handler can set suitable states in etcd as well apart from the normal notification and other stuff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node agent owns these objects under its namespace, and expects them as defined under etcd.

If any other Tendrl component or external module (collectd) wants to own the object, it has to implement it as per Tendrl Definitions, meaning: it should look the same in etcd as it was defined under the node-agent namepsace.

So, keep the objects under node-agent and let node-agent manage its state in etcd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anmolbabu Speaking of monitoring, we should track the network utilisation data per interface. It'll be useful for tracking down bottlenecks later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I already have a plugin to calculate network utilization in one of my PRs. And it currently calculates both interface level and average across interfaces and puts both to time-series db.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@GowthamShanmugam
Copy link
Contributor Author

@nthomas-redhat i will remove IPV6 details from this, and i will add about modification in tendrl_definition

@GowthamShanmugam
Copy link
Contributor Author

@nthomas-redhat atoms are not required for this

@GowthamShanmugam
Copy link
Contributor Author

@nthomas-redhat @shtripat @nnDarshan all changes are done.. please review it


== Proposed change

* From node_agent add a new function in pull_hardware_inventory to find active
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention about the tool that you will be using to get network details

@GowthamShanmugam
Copy link
Contributor Author

@nnDarshan done

Copy link
Contributor

@nnDarshan nnDarshan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good to me. @anmolbabu Could you please comment on the monitoring impact

@GowthamShanmugam
Copy link
Contributor Author

@shtripat @nthomas-redhat @r0h4n @brainfunked review please

@GowthamShanmugam
Copy link
Contributor Author

@nthomas-redhat @brainfunked @shtripat please review

@GowthamShanmugam
Copy link
Contributor Author

@nthomas-redhat @brainfunked @r0h4n please review

Copy link
Member

@shtripat shtripat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as as long as ipv6 related things are finalized...

@GowthamShanmugam
Copy link
Contributor Author

yes @brainfunked told me to collect all the possible details so i added ipv6 also and some other extra details also.

@GowthamShanmugam
Copy link
Contributor Author

@nthomas-redhat @brainfunked please review this


== Alternatives

* We can also use ethtool for getting network details. But ethool is not
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be aware, that ethtool provide very limited information if launched on virtual machine for virtio type of NIC. I even think, that some information are not available at all for virtio type of NIC (for example link Speed).

I know, that this didn't looks like big problem for production environment (where probably all storage nodes will be physical machines), but it might be problem for testing, examination and demo purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dahorak ya i accept your point, it gives very limited details only, in initial stage we excepted ipv4 , ipv6 and subnet mask details only, at that time we just tried this tools.

```
* Add new object in tendrl defenition file with list.
```
list: nodes/$Node_context.node.id/Network
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the exact structure of the data in etcd? I think that there needs to be a global networks hierarchy as well, where we maintain a list of networks, under which we point to each node that belongs to this network. This is in addition to the network details maintained under each node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brainfunked
Inside nodes/{node_id}/Network/{interface_name}
value : {"ipv4": ["ipv4address", ...],
"ipv6": ["ipv6address, ..."],
"netmask": ["subnet", ...],
"status":"up/down",
"interface_id": "",
"sysfs_id": "",
"device_link": "",
"interface_type": "",
"model": "",
"driver_modules": "",
"drive": "",
"hw_address": "",
"link_detected": ""
}
this is the structure for network details, This network details is different for the particular node. we cant give node belongs to this network. This network ipv4, ipv6 address will differ for another nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the IP being under the interface name will scale. You'd want to allow clients to directly filter nodes based on IPs and hostnames. You'd also need to enable an index of hostnames to node ids and hostnames to IP addresses. These need to be maintained based on changes in the IP addresses on individual nodes. Essentially, I'm looking at being able to obtain network details without relying on scanning all of the nodes.

Also, my comment is specifically towards https://redhat.invisionapp.com/share/2K8M4PQYZ#/screens/188984130. Here, we need to list the networks detected, and not the IP addresses per node. This is essentially a calculation based on the netmask for each of the detected IPs. It could be done on the fly by the API, or be stored permanently in etcd, should this information need to be known elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brainfunked

  1. global structure is:

suggestion :1
/networks/{subnet1}/{node_id1}
Value: [
{"node_id": "node_id",
"ipv4": ["ipv4address", ...],
"ipv6": ["ipv6address, ..."],
"netmask": ["subnet", ...],
"status":"up/down",
"interface_id": "",
"sysfs_id": "",
"device_link": "",
"interface_type": "",
"model": "",
"driver_modules": "",
"drive": "",
"hw_address": "",
"link_detected": ""
},
.....]

/networks/{subnet1}/{node_id2}
Value: [
{"node_id" : "node_id",
"ipv4": ["ipv4address", ...],
"ipv6": ["ipv6address, ..."],
"netmask": ["subnet", ...],
"status":"up/down",
"interface_id": "",
"sysfs_id": "",
"device_link": "",
"interface_type": "",
"model": "",
"driver_modules": "",
"drive": "",
"hw_address": "",
"link_detected": ""
},
.....]

Suggestion : 2
(read from etcd using network like /networks/{subnet1} and then append json in list that in list. Then dump in etcd)

/networks/{subnet1}
value: [
{"node_id" : "node_id",
"ipv4": ["ipv4address", ...],
"ipv6": ["ipv6address, ..."],
"netmask": ["subnet", ...],
"status":"up/down",
"interface_id": "",
"sysfs_id": "",
"device_link": "",
"interface_type": "",
"model": "",
"driver_modules": "",
"drive": "",
"hw_address": "",
"link_detected": ""
}, ...
]
/networks/{subnet2}
value: [ {"node_id" : "node_id",
"ipv4": ["ipv4address", ...],
"ipv6": ["ipv6address, ..."],
"netmask": ["subnet", ...],
"status":"up/down",
"interface_id": "",
"sysfs_id": "",
"device_link": "",
"interface_type": "",
"model": "",
"driver_modules": "",
"drive": "",
"hw_address": "",
"link_detected": ""
}, ...]

In second approach we can find no.of interface like len(network1), list length is the number of interface.

  1. node wise structure is

/nodes/{node_id}/Network/{interface_name}
value: [
{"ipv4": ["ipv4address", ...],
"ipv6": ["ipv6address, ..."],
"netmask": ["subnet", ...],
"status":"up/down",
"interface_id": "",
"sysfs_id": "",
"device_link": "",
"interface_type": "",
"model": "",
"driver_modules": "",
"drive": "",
"hw_address": "",
"link_detected": ""
}
]

Is that global structure is ok?
whether i have to keep node wise and global wise structure or global wise structure is enaf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally feel node level network info is required. Additionally you can have system level sub-net details

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GowthamShanmugam Suggestion 2 looks good, also keep both /nodes and global /networks data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r0h4n ok done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel Suggestion 2 is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @r0h4n.

@GowthamShanmugam
Copy link
Contributor Author

@r0h4n @brainfunked @shtripat @nnDarshan please review this i have updated this spec

Copy link
Contributor

@nnDarshan nnDarshan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

```
* Add new object in tendrl defenition file with list.
```
list: nodes/$Node_context.node.id/Network
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel Suggestion 2 is better.

@shtripat
Copy link
Member

LGTM

@GowthamShanmugam
Copy link
Contributor Author

@r0h4n please merge this

"netmask": ["subnet", ...],
"subnet": "subnet",
"status":"up/down",
"interface_id": "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Details such as interface_id, sysfs_id etc. need not be present on a global level. On the global scope, the network information is used to identify which interfaces on which hosts belong to the network and what their IPs are, that's all. All the other details belong to specific network interfaces on specific nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```
/nodes/{node_id}/Network

{"interface_name": {"ipv4": ["ipv4address", ...],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make it an array of ipaddr/netmask combination. No need to store netmasks separately.

Also, store the aliases and their corresponding addresses in a hash under the aliases key or some such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```
* Add new object in tendrl defenition file with list.
```
list: nodes/$Node_context.node.id/Network
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @r0h4n.

@GowthamShanmugam
Copy link
Contributor Author

@brainfunked i have one doubt, if the interface is down then does not have ip and subnet mask, so we can't find out subnet mask, so in node wise i cant store it as subnet wise, better we store it as interface wise. I feel current implementation for node wise network details are ok.

@GowthamShanmugam
Copy link
Contributor Author

GowthamShanmugam commented Jan 14, 2017

@brainfunked @r0h4n @shtripat @nthomas-redhat @nnDarshan yours suggestion please
i feel in node wise interface wise structure is ok. In global wise we can keep subnet, because i am storing only which interfaces have subnet in global.

spcification-issue: Tendrl#41

Signed-off-by: GowthamShanmugam <gshanmug@redhat.com>
@GowthamShanmugam
Copy link
Contributor Author

GowthamShanmugam commented Jan 14, 2017

@brainfunked @shtripat @r0h4n @nthomas-redhat @nnDarshan If i store subnet wise in node level then i can't say interface is up/down

@GowthamShanmugam
Copy link
Contributor Author

@nthomas-redhat @r0h4n @shtripat @nnDarshan @brainfunked if we dont want status of interface (up/down) then we can store subnet wise in node level also, please give your comments

@r0h4n
Copy link
Contributor

r0h4n commented Jan 18, 2017

@GowthamShanmugam Please figure out how to uniquely identify networks/subnets/ports(interfaces)

check https://github.com/openstack/heat/blob/master/heat/engine/resources/openstack/neutron/port.py

https://github.com/openstack/neutron/blob/master/neutron/db/models_v2.py#L74

or check any python module handling networks and see their data model for networks/subnets/ports

@GowthamShanmugam
Copy link
Contributor Author

@r0h4n we can merge this

@r0h4n r0h4n merged commit 97e7ad0 into Tendrl:master Apr 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants