Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BP-36: Stats documentation annotation #1786

Merged
merged 4 commits into from
Nov 13, 2018
Merged

Conversation

sijie
Copy link
Member

@sijie sijie commented Nov 5, 2018

Descriptions of the changes in this PR:

Motivation

A common ask from people using bookkeeper is how they can monitor bookies and bookkeeper clients, what kind of metrics that bookkeeper exposes
and what are the important metrics. Currently bookkeeper doesn't provide any metrics page for guiding people on monitoring bookkeeper services.

In order to help people on this, we need to provide a few documentation pages about metrics. However if we just write such pages, those pages
can quickly get out-of-dated when code is changed. The proposal here is to seek a programming way for generating metrics related pages.

Master Issue: #1785


In order to uphold a high standard for quality for code contributions, Apache BookKeeper runs various precommit
checks for pull requests. A pull request can only be merged when it passes precommit checks. However running all
the precommit checks can take a long time, some trivial changes don't need to run all the precommit checks. You
can check following list to skip the tests that don't need to run for your pull request. Leave them unchecked if
you are not sure, committers will help you:

  • [skip bookkeeper-server bookie tests]: skip testing org.apache.bookkeeper.bookie in bookkeeper-server module.
  • [skip bookkeeper-server client tests]: skip testing org.apache.bookkeeper.client in bookkeeper-server module.
  • [skip bookkeeper-server replication tests]: skip testing org.apache.bookkeeper.replication in bookkeeper-server module.
  • [skip bookkeeper-server tls tests]: skip testing org.apache.bookkeeper.tls in bookkeeper-server module.
  • [skip bookkeeper-server remaining tests]: skip testing all other tests in bookkeeper-server module.
  • [skip integration tests]: skip docker based integration tests. if you make java code changes, you shouldn't skip integration tests.
  • [skip build java8]: skip build on java8. ONLY skip this when ONLY changing files under documentation under site.
  • [skip build java9]: skip build on java9. ONLY skip this when ONLY changing files under documentation under site.


Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

If this PR is a BookKeeper Proposal (BP):

  • Make sure the PR title is formatted like:
    <BP-#>: Description of bookkeeper proposal
    e.g. BP-1: 64 bits ledger is support
  • Attach the master issue link in the description of this PR.
  • Attach the google doc link if the BP is written in Google Doc.

Otherwise:

  • Make sure the PR title is formatted like:
    <Issue #>: Description of pull request
    e.g. Issue 123: Description ...
  • Make sure tests pass via mvn clean apache-rat:check install spotbugs:check.
  • Replace <Issue #> in the title with the actual Issue number.

*Motivation*

A common ask from people using bookkeeper is how they can monitor bookies and bookkeeper clients, what kind of metrics that bookkeeper exposes
and what are the important metrics. Currently bookkeeper doesn't provide any metrics page for guiding people on monitoring bookkeeper services.

In order to help people on this, we need to provide a few documentation pages about metrics. However if we just write such pages, those pages
can quickly get out-of-dated when code is changed. The proposal here is to seek a programming way for generating metrics related pages.
@sijie
Copy link
Member Author

sijie commented Nov 6, 2018

run bookkeeper-server replication tests

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful

I left just one question

/**
* Documenting the stats.
*/
@Retention(RetentionPolicy.RUNTIME)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why runtime?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because current approach requires them to be RUNTIME. Otherwise it doesn't work.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
I have just seen the implementation

Now that you have already implemented the whole tool it is better to keep it.
Another approach would have been to implement a Java Annotation processor or to use ASM

@sijie
Copy link
Member Author

sijie commented Nov 6, 2018

Another approach would have been to implement a Java Annotation processor or to use ASM

was thinking about that as well. however that is too much work for me :(

@eolivelli
Copy link
Contributor

@sijie I totally agree. It's not worth and it is not a big deal. Thanks

@dlg99
Copy link
Contributor

dlg99 commented Nov 7, 2018

@sijie This is a really awesome start and improves user experience a lot at the beginning.
It does not address one major question that I've seen asked (internally) many times by everyone looking at metrics as they start digging deeper: what is the relationship between the metrics, specifically for the latency metrics.
I've seen a couple of internal efforts to automate it / cc @athanatos @nicmichael
One of the examples below.
Maybe addition of more metadata will help us build similar picture or at least provide description that shows such relations, something like:

@StatsDoc(name = JOURNAL_QUEUE_LATENCY, 
    help = "time spent in the journal queue", 
    parent = JOURNAL_ADD_ENTRY, 
    happensAfter = null)

@StatsDoc(name = JOURNAL_PROCESS_TIME_LATENCY, 
    help = "time spent processing data in the journal", 
    parent = JOURNAL_ADD_ENTRY, 
    happensAfter = JOURNAL_QUEUE_LATENCY)

metrics

@sijie
Copy link
Member Author

sijie commented Nov 7, 2018

@dlg99 good suggestion.

what is the difference between parent and happenAfter?

@dlg99
Copy link
Contributor

dlg99 commented Nov 7, 2018

@sijie i.e. addEntry does bookieAddEntry and journalAddEntry so it is a parent for both.
ate the same time we want to show that bookieAddEntry happens before journalAddEntry to i.e. order these metrics correctly in the documentation and describe the sequence fo events there.

parent defines hierarchy and happensAfter simply orders sequence of siblings.

@sijie
Copy link
Member Author

sijie commented Nov 8, 2018

@dlg99 gotcha. I will update the BP

@sijie
Copy link
Member Author

sijie commented Nov 8, 2018

@dlg99 updated the BP with your suggestion. I will add those two in the annotation after #1787 is in.

*
* @return the metric name of an operation that happens after the operation of this metric.
*/
default String happenAfter() { return ""; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, grammar: "happensAfter" (though check with native speakers anyways)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I was planning to type happensAfter. however it becomes happenAfter :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@sijie
Copy link
Member Author

sijie commented Nov 9, 2018

run pr validation

@sijie
Copy link
Member Author

sijie commented Nov 11, 2018

marked this BP as accepted with 3 +1 and 0 -1.

@sijie sijie merged commit 1590ca3 into apache:master Nov 13, 2018
sijie added a commit that referenced this pull request Nov 13, 2018
…s exposed by bookkeeper

Descriptions of the changes in this PR:



### Motivation

A common ask from people using bookkeeper is how they can monitor bookies and bookkeeper clients, what kind of metrics that bookkeeper exposes
and what are the important metrics. Currently bookkeeper doesn't provide any metrics page for guiding people on monitoring bookkeeper services.

In order to help people on this, we need to provide a few documentation pages about metrics. However if we just write such pages, those pages
can quickly get out-of-dated when code is changed.

### Changes

- Introduce an annotation `StatsDoc` for annotating the counters/gauges/opstats in the source code.
- Provide a tool to generate the stats and their documentation into a yaml file.

The yaml file will be used by the website for rendering a metrics reference page.

### Results

```
"server":
  "bookie_BOOKIE_READ_ENTRY_BYTES":
    "description": |-
      bytes stats of ReadEntry on a bookie
    "type": |-
      OPSTATS
  "bookie_WRITE_BYTES":
    "description": |-
      total bytes written to a bookie
    "type": |-
      COUNTER
  "bookie_BOOKIE_ADD_ENTRY":
    "description": |-
      operations stats of AddEntry on a bookie
    "type": |-
      OPSTATS
  "bookie_BOOKIE_RECOVERY_ADD_ENTRY":
    "description": |-
      operation stats of RecoveryAddEntry on a bookie
    "type": |-
      OPSTATS
  "bookie_BOOKIE_ADD_ENTRY_BYTES":
    "description": |-
      bytes stats of AddEntry on a bookie
    "type": |-
      OPSTATS
  "bookie_BOOKIE_FORCE_LEDGER":
    "description": |-
      total force operations occurred on a bookie
    "type": |-
      COUNTER
  "bookie_READ_BYTES":
    "description": |-
      total bytes read from a bookie
    "type": |-
      COUNTER
  "bookie_BOOKIE_READ_ENTRY":
    "description": |-
      operation stats of ReadEntry on a bookie
    "type": |-
      OPSTATS
```

Master Issue: #1786




Reviewers: Ivan Kelly <ivank@apache.org>, Jia Zhai <None>

This closes #1787 from sijie/stats_generator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants