Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TASK] add database type graphite #65

Merged
merged 6 commits into from
Jun 1, 2017

Conversation

andir
Copy link
Contributor

@andir andir commented May 26, 2017

I've spent the last night hacking on a graphite database backend.

}

node_prefix := MeasurementNode + `.` + stats.NodeID + `.` + strings.Replace(nodeinfo.Hostname, ".", "__", -1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a small inline function to dry up your code:

addField := func(name string, value interface{}) {
	fields = append(fields, graphigo.Metric{Name: node_prefix + "." + name, Value: value})
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better 👍

{Name: node_prefix + ".proc.running", Value: stats.Processes.Running},
{Name: node_prefix + ".clients.wifi", Value: stats.Clients.Wifi},
{Name: node_prefix + ".clients.wifi24", Value: stats.Clients.Wifi24},
{Name: node_prefix + ".clients.wifi5", Value: stats.Clients.Wifi5},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use wifi11a and wifi11g in the name.
The old labels wifi24 and wifi5 are deprecated.

return
}
} else {
defer c.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer c.Close() vor die Schleife schreiben?


func (c *Connection) addWorker() {
defer c.wg.Done()
for c.Connection != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hier kannst du einfach über die Punkte iterieren:
for point := range c.points {

}

func (c *Connection) Close() {
if c.client.Connection != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hier den Channel schließen close(c.points)

@andir
Copy link
Contributor Author

andir commented May 27, 2017

I still have to run some tests with our local network - waiting on more storage on our graphite node ;-)
I'll report back once that tests was successful

@corny
Copy link
Member

corny commented May 27, 2017

OK, thanks a lot. Your code looks fine for me now.

addField("proc.running", stats.Processes.Running)
addField("clients.wifi", stats.Clients.Wifi)
addField("clients.wifi11a", stats.Clients.Wifi24)
addField("clients.wifi11g", stats.Clients.Wifi5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just 11a and 11g instead of wifi11a and wifi11g? It implies wifi. Any other opinions?

@genofire genofire added the wip label May 27, 2017
@genofire genofire changed the title WIP: Added graphite backend [TASK] add database type graphite - WIP May 27, 2017
"github.com/fgrosse/graphigo"
)

func (c *Connection) InserGlobals(stats *runtime.GlobalStats, time time.Time) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a tee in the Morning 😄 : InsertGlobals

@genofire
Copy link
Member

Without the tipping mistake, it looks good for me as well.
I think we could merge very soon.
(Could i suggest you to write some test for this package? - You could mock to addPoint of the graphite package and run functions of Connection)

@coveralls
Copy link

coveralls commented May 27, 2017

Coverage Status

Coverage remained the same at 56.506% when pulling 79c0c22 on freifunk-darmstadt:add-graphite into dc24c8b on FreifunkBremen:master.

@andir
Copy link
Contributor Author

andir commented May 27, 2017

I'll try to write a bunch of tests. It probably will be a copy of the influxdb stuff (for the most part)…

@genofire
Copy link
Member

For me you this PR has a go.
Whould you squash and merge it @corny if @andir say he is ready?

@andir
Copy link
Contributor Author

andir commented May 27, 2017

I've just tried to figure how I can easily mock out that function... Any ideas? I've only seen some ugly hacks or passing the function as a parameter to the real implementation so I can pass the mock in the tests..

@andir
Copy link
Contributor Author

andir commented May 28, 2017

I just realized that yanic is currently dieing whenever the carbon-cache/relay (graphite backend) is being restarted. The influxdb backend seems to have a similar issue - since log.Fatal is used?

Are there any plans to support reconnect?

@genofire
Copy link
Member

Plans, No. A reconnect was not necessary till now. There is a RFC: #61
But it does not depends on influxdb. It is just the way we handle a lost of connection.

A reconnect would be a restart of yanic by systemd.

@andir
Copy link
Contributor Author

andir commented May 30, 2017

I'll try to get some reconnect code in during the following days.

In the mean time I discovered that often datapoints are missing in graphite (even after raising all the carbon limits): https://stats.test.h4ck.space/dashboard/db/new-dashboard-copy?orgId=1

Any ideas where that comes from? Yanic is set to query nodes every minute. Graphite is configured for 1 datapoint every 60 seconds.
2017-05-30-094910_2548x821_scrot

The version statistics look okay but the others are often missing data.

addField("time.idle", int64(stats.Idletime))
addField("proc.running", stats.Processes.Running)
addField("clients.wifi", stats.Clients.Wifi)
addField("clients.wifi11a", stats.Clients.Wifi24)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11a is 5GHz and 11g is 2.4GHz, so they are interchanged. But why not keeping wfi5 and wifi24 as used in database/graphite/global.go and database/influxdb/node.go? Additional 5GHz is not only 11a but even 11n and 11ac and 2.4GHz is even 11b and 11n.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Then revert it to wifi24 and wifi5.

@genofire genofire changed the title [TASK] add database type graphite - WIP [TASK] add database type graphite May 31, 2017
@genofire genofire added review and removed wip labels May 31, 2017
@genofire
Copy link
Member

genofire commented May 31, 2017

@andir say in IRC, That he is ready with this PR.

The missing Point we think the problem is, how graphite and yanic works.
Explain/example:
Yanic is configuration to collect in 1m.
If yanic was started at 00:03:32 he try to collect every time x:x:32 per multicast and x:x:02 per unicast (if a node did not answer on multicast)

Buy graphite wants to have x:x:0 so a value are dropped or avg, because the response of yanic on a multicast and unicast get the same timeslot in graphite and the previous or next timeslot get empty.

A possible solution configuration graphite = 2x yanic request.

@corny
Copy link
Member

corny commented May 31, 2017

@andir Does the response of the node really reach yanic? Please add a Printf for debugging or use tcpdump with the node address.

@andir
Copy link
Contributor Author

andir commented Jun 1, 2017

The wifi metrics are reverted.
I'll try to construct a test case for yanic and the "lost" metrics later today..

@coveralls
Copy link

coveralls commented Jun 1, 2017

Coverage Status

Coverage decreased (-0.2%) to 56.272% when pulling 52919a2 on freifunk-darmstadt:add-graphite into dc24c8b on FreifunkBremen:master.

@corny
Copy link
Member

corny commented Jun 1, 2017

If yanic was started at 00:03:32 he try to collect every time x:x:32 per multicast and x:x:02 per unicast (if a node did not answer on multicast)

Simply adding a delay before the first run would solve this problem. This should be an seperate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants