Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TASK] add database type graphite #65

Merged
merged 6 commits into from Jun 1, 2017

Conversation

Projects
None yet
5 participants
@andir
Copy link
Contributor

commented May 26, 2017

I've spent the last night hacking on a graphite database backend.

}

node_prefix := MeasurementNode + `.` + stats.NodeID + `.` + strings.Replace(nodeinfo.Hostname, ".", "__", -1)

This comment has been minimized.

Copy link
@corny

corny May 26, 2017

Member

Use a small inline function to dry up your code:

addField := func(name string, value interface{}) {
	fields = append(fields, graphigo.Metric{Name: node_prefix + "." + name, Value: value})
}

This comment has been minimized.

Copy link
@andir

andir May 26, 2017

Author Contributor

Added.

This comment has been minimized.

Copy link
@corny

corny May 26, 2017

Member

Much better 👍

{Name: node_prefix + ".proc.running", Value: stats.Processes.Running},
{Name: node_prefix + ".clients.wifi", Value: stats.Clients.Wifi},
{Name: node_prefix + ".clients.wifi24", Value: stats.Clients.Wifi24},
{Name: node_prefix + ".clients.wifi5", Value: stats.Clients.Wifi5},

This comment has been minimized.

Copy link
@corny

corny May 26, 2017

Member

Better use wifi11a and wifi11g in the name.
The old labels wifi24 and wifi5 are deprecated.

return
}
} else {
defer c.Close()

This comment has been minimized.

Copy link
@corny

corny May 27, 2017

Member

defer c.Close() vor die Schleife schreiben?


func (c *Connection) addWorker() {
defer c.wg.Done()
for c.Connection != nil {

This comment has been minimized.

Copy link
@corny

corny May 27, 2017

Member

Hier kannst du einfach über die Punkte iterieren:
for point := range c.points {

}

func (c *Connection) Close() {
if c.client.Connection != nil {

This comment has been minimized.

Copy link
@corny

corny May 27, 2017

Member

hier den Channel schließen close(c.points)

@corny

corny approved these changes May 27, 2017

@andir

This comment has been minimized.

Copy link
Contributor Author

commented May 27, 2017

I still have to run some tests with our local network - waiting on more storage on our graphite node ;-)
I'll report back once that tests was successful

@corny

This comment has been minimized.

Copy link
Member

commented May 27, 2017

OK, thanks a lot. Your code looks fine for me now.

addField("proc.running", stats.Processes.Running)
addField("clients.wifi", stats.Clients.Wifi)
addField("clients.wifi11a", stats.Clients.Wifi24)
addField("clients.wifi11g", stats.Clients.Wifi5)

This comment has been minimized.

Copy link
@corny

corny May 27, 2017

Member

Maybe just 11a and 11g instead of wifi11a and wifi11g? It implies wifi. Any other opinions?

@genofire genofire added the wip label May 27, 2017

@genofire genofire changed the title WIP: Added graphite backend [TASK] add database type graphite - WIP May 27, 2017

"github.com/fgrosse/graphigo"
)

func (c *Connection) InserGlobals(stats *runtime.GlobalStats, time time.Time) {

This comment has been minimized.

Copy link
@genofire

genofire May 27, 2017

Member

I need a tee in the Morning 😄 : InsertGlobals

@genofire

This comment has been minimized.

Copy link
Member

commented May 27, 2017

Without the tipping mistake, it looks good for me as well.
I think we could merge very soon.
(Could i suggest you to write some test for this package? - You could mock to addPoint of the graphite package and run functions of Connection)

@coveralls

This comment has been minimized.

Copy link

commented May 27, 2017

Coverage Status

Coverage remained the same at 56.506% when pulling 79c0c22 on freifunk-darmstadt:add-graphite into dc24c8b on FreifunkBremen:master.

@andir

This comment has been minimized.

Copy link
Contributor Author

commented May 27, 2017

I'll try to write a bunch of tests. It probably will be a copy of the influxdb stuff (for the most part)…

@genofire

This comment has been minimized.

Copy link
Member

commented May 27, 2017

For me you this PR has a go.
Whould you squash and merge it @corny if @andir say he is ready?

@andir

This comment has been minimized.

Copy link
Contributor Author

commented May 27, 2017

I've just tried to figure how I can easily mock out that function... Any ideas? I've only seen some ugly hacks or passing the function as a parameter to the real implementation so I can pass the mock in the tests..

@andir

This comment has been minimized.

Copy link
Contributor Author

commented May 28, 2017

I just realized that yanic is currently dieing whenever the carbon-cache/relay (graphite backend) is being restarted. The influxdb backend seems to have a similar issue - since log.Fatal is used?

Are there any plans to support reconnect?

@genofire

This comment has been minimized.

Copy link
Member

commented May 28, 2017

Plans, No. A reconnect was not necessary till now. There is a RFC: #61
But it does not depends on influxdb. It is just the way we handle a lost of connection.

A reconnect would be a restart of yanic by systemd.

@andir

This comment has been minimized.

Copy link
Contributor Author

commented May 30, 2017

I'll try to get some reconnect code in during the following days.

In the mean time I discovered that often datapoints are missing in graphite (even after raising all the carbon limits): https://stats.test.h4ck.space/dashboard/db/new-dashboard-copy?orgId=1

Any ideas where that comes from? Yanic is set to query nodes every minute. Graphite is configured for 1 datapoint every 60 seconds.
2017-05-30-094910_2548x821_scrot

The version statistics look okay but the others are often missing data.

addField("time.idle", int64(stats.Idletime))
addField("proc.running", stats.Processes.Running)
addField("clients.wifi", stats.Clients.Wifi)
addField("clients.wifi11a", stats.Clients.Wifi24)

This comment has been minimized.

Copy link
@kb-light

kb-light May 30, 2017

Contributor

11a is 5GHz and 11g is 2.4GHz, so they are interchanged. But why not keeping wfi5 and wifi24 as used in database/graphite/global.go and database/influxdb/node.go? Additional 5GHz is not only 11a but even 11n and 11ac and 2.4GHz is even 11b and 11n.

This comment has been minimized.

Copy link
@corny

corny May 31, 2017

Member

You're right. Then revert it to wifi24 and wifi5.

@genofire genofire changed the title [TASK] add database type graphite - WIP [TASK] add database type graphite May 31, 2017

@genofire genofire added review and removed wip labels May 31, 2017

@genofire

This comment has been minimized.

Copy link
Member

commented May 31, 2017

@andir say in IRC, That he is ready with this PR.

The missing Point we think the problem is, how graphite and yanic works.
Explain/example:
Yanic is configuration to collect in 1m.
If yanic was started at 00:03:32 he try to collect every time x:x:32 per multicast and x:x:02 per unicast (if a node did not answer on multicast)

Buy graphite wants to have x:x:0 so a value are dropped or avg, because the response of yanic on a multicast and unicast get the same timeslot in graphite and the previous or next timeslot get empty.

A possible solution configuration graphite = 2x yanic request.

@corny

This comment has been minimized.

Copy link
Member

commented May 31, 2017

@andir Does the response of the node really reach yanic? Please add a Printf for debugging or use tcpdump with the node address.

@andir

This comment has been minimized.

Copy link
Contributor Author

commented Jun 1, 2017

The wifi metrics are reverted.
I'll try to construct a test case for yanic and the "lost" metrics later today..

@coveralls

This comment has been minimized.

Copy link

commented Jun 1, 2017

Coverage Status

Coverage decreased (-0.2%) to 56.272% when pulling 52919a2 on freifunk-darmstadt:add-graphite into dc24c8b on FreifunkBremen:master.

@corny

This comment has been minimized.

Copy link
Member

commented Jun 1, 2017

If yanic was started at 00:03:32 he try to collect every time x:x:32 per multicast and x:x:02 per unicast (if a node did not answer on multicast)

Simply adding a delay before the first run would solve this problem. This should be an seperate issue.

@corny corny merged commit 88975d2 into FreifunkBremen:master Jun 1, 2017

1 of 2 checks passed

coverage/coveralls Coverage decreased (-0.2%) to 56.272%
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.