Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats #72

Merged
merged 7 commits into from
Feb 22, 2018
Merged

Stats #72

merged 7 commits into from
Feb 22, 2018

Conversation

grahamc
Copy link
Member

@grahamc grahamc commented Feb 12, 2018

  • Adds parameters to events, so they can be more than just dumb strings
  • Adds a stats consumer to send them to prometheus
  • Adds an event for when target branch doesn't evaluate, and notes what the branch is. We can then alert on if master breaks.
  • Adds a metric for how long target branches take to evaluate
  • Adds a ticker for how many times the evaluation finishes, allowing for tracking that indeed evaluations are finishing
  • Adds some initial metrics (commented out) for Capture Nix stats during eval #67

So the build generation code is a bit ugly / scary, but it generates simple code like this:

use std::collections::HashMap;
use std::sync::Arc;
use std::sync::Mutex;
#[derive(Serialize, Deserialize, Debug, Clone)]
#[serde(rename_all="kebab-case")]
pub enum Event {
  StatCollectorLegacyEvent(String),
  StatCollectorBogusEvent,
  EvaluationDuration(u64)
}


#[derive(Debug, Clone)]
pub struct MetricCollector {
  stat_collector_legacy_event: Arc<Mutex<HashMap<(String, String),u64>>>,
  stat_collector_bogus_event: Arc<Mutex<HashMap<(String),u64>>>,
  evaluation_duration: Arc<Mutex<HashMap<(String),u64>>>
}


impl MetricCollector {
  pub fn new() -> MetricCollector {
    MetricCollector {
      stat_collector_legacy_event: Arc::new(Mutex::new(HashMap::new())),
      stat_collector_bogus_event: Arc::new(Mutex::new(HashMap::new())),
      evaluation_duration: Arc::new(Mutex::new(HashMap::new()))
    }

  }

  pub fn record(&self, instance: String, event: Event) {
    match event {

      Event::StatCollectorLegacyEvent(event) => {
        let mut accum_table = self.stat_collector_legacy_event
          .lock()
          .expect("Failed to unwrap metric mutex for stat_collector_legacy_event");
        let accum = accum_table
          .entry((event, instance))
          .or_insert(0);
        *accum += 1;
      }
 ,

      Event::StatCollectorBogusEvent => {
        let mut accum_table = self.stat_collector_bogus_event
          .lock()
          .expect("Failed to unwrap metric mutex for stat_collector_bogus_event");
        let accum = accum_table
          .entry((instance))
          .or_insert(0);
        *accum += 1;
      }
 ,

      Event::EvaluationDuration(value) => {
        let mut accum_table = self.evaluation_duration
          .lock()
          .expect("Failed to unwrap metric mutex for evaluation_duration");
        let accum = accum_table
          .entry((instance))
          .or_insert(0);
        *accum += value;
      }
 
    }

  }
pub fn prometheus_output(&self) -> String {
    let mut output = String::new();

      output.push_str("# HELP ofborg_stat_collector_legacy_event Number of received legacy events
");
      output.push_str("# TYPE ofborg_stat_collector_legacy_event counter
");

      let table = self.stat_collector_legacy_event.lock()
        .expect("Failed to unwrap metric mutex for stat_collector_legacy_event");
      let values: Vec<String> = (*table)
        .iter()
        .map(|(&(ref event, ref instance), value)| {
          let kvs: Vec<String> = vec![
            format!("event=\"{}\"", event),
            format!("instance=\"{}\"", instance)
          ];
          format!("ofborg_stat_collector_legacy_event{{{}}} {}", kvs.join(","), value)
        })
        .collect();
      output.push_str(&values.join("
"));
      output.push_str("
");
 

      output.push_str("# HELP ofborg_stat_collector_bogus_event Number of received unparseable events
");
      output.push_str("# TYPE ofborg_stat_collector_bogus_event counter
");

      let table = self.stat_collector_bogus_event.lock()
        .expect("Failed to unwrap metric mutex for stat_collector_bogus_event");
      let values: Vec<String> = (*table)
        .iter()
        .map(|(&ref instance, value)| {
          let kvs: Vec<String> = vec![
            format!("instance=\"{}\"", instance)
          ];
          format!("ofborg_stat_collector_bogus_event{{{}}} {}", kvs.join(","), value)
        })
        .collect();
      output.push_str(&values.join("
"));
      output.push_str("
");
 

      output.push_str("# HELP ofborg_evaluation_duration Amount of time spent running evaluations
");
      output.push_str("# TYPE ofborg_evaluation_duration counter
");

      let table = self.evaluation_duration.lock()
        .expect("Failed to unwrap metric mutex for evaluation_duration");
      let values: Vec<String> = (*table)
        .iter()
        .map(|(&ref instance, value)| {
          let kvs: Vec<String> = vec![
            format!("instance=\"{}\"", instance)
          ];
          format!("ofborg_evaluation_duration{{{}}} {}", kvs.join(","), value)
        })
        .collect();
      output.push_str(&values.join("
"));
      output.push_str("
");
 return output;
  }
}

@grahamc
Copy link
Member Author

grahamc commented Feb 16, 2018

So this sort of sucks because it doesn't send 0 values for events that happen rarely. For example, the ofborg_target_branch_fails_evaluation is null until there is a problem. This makes it hard-or-impossible to alert on because changes and rate return null, and then 0.

Perhaps it could always send a default of zero for each metric on every send, but then builders are sending eval metrics and evaluators are sending build metrics.

A possible solution to that is subdivide the metrics in to groups which are always ok to send together.

This would also increase bandwidth usage, but maybe that is okay.

@grahamc
Copy link
Member Author

grahamc commented Feb 22, 2018

This is imperfect but gets us part of the way. I'd be happy to see something rip it out and replace it.

@grahamc grahamc merged commit d38c703 into next Feb 22, 2018
@grahamc grahamc deleted the stats branch February 22, 2018 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant