Skip to content
This repository was archived by the owner on Mar 27, 2020. It is now read-only.

Service for gathering and reporting metrics#1

Merged
Scooletz merged 4 commits intodevelopfrom
program-service
Apr 10, 2017
Merged

Service for gathering and reporting metrics#1
Scooletz merged 4 commits intodevelopfrom
program-service

Conversation

@Scooletz
Copy link
Copy Markdown
Contributor

@Scooletz Scooletz commented Apr 5, 2017

Connects to https://github.com/Particular/LaunchWave.DevOpsAdvancedMonitoring/issues/48

This PR implements the Program Service that:

  • uses bootstrapper to be run as a service and as a console
  • consumes messages (for now of type JsonMetricsContext)
  • dispatches them to an asynchronous consumer (ConcurrentQueue)
  • enables registering consumers
  • gathers raw metrics

@Scooletz
Copy link
Copy Markdown
Contributor Author

Scooletz commented Apr 6, 2017

Ping @SzymonPobiega
Could you review it?

/// <summary>
/// The reporting message.
/// </summary>
public class MetricReport : ICommand
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could simply by IMessage.

/// <summary>
/// The reporting message.
/// </summary>
public class MetricReport : ICommand
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should de-serialize directly to JsonMetricsContext instead of JObject. It should be possible if we use Newtonsoft serializer as daniel pointed out.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we assume here that for now we ignore the potential extensions, but potentially will use it in the future?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensions?

Copy link
Copy Markdown
Contributor

@weralabaj weralabaj Apr 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably "customizations" would be a better word. I mean that even though officially that could be anything, we still make an assumption this will be the exact type used, and if NSB.Metrics adds/modifies some data, then we'll break. Correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After taking a look how the raw data were reported with NServiceBusReceivedMetricContext I think that I agree with @weralabaj . The reporting could be done as a TolerantReader that simply extracts JProperty with name Context and store the whole JObject payload in a dictionary. This would remove our dependency on the Metrics leaving this data as raw as possible.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you are trying to achieve here. NServiceBus.Metrics depends on Metrics.NET and uses it to generate Json. We will use Metrics.NET here internally to generate derived metrics (most likely). What would we gain from pretending we don't use Metrics.NET?

readonly PublisherConsumer<MetricReport> publisherConsumer;

public MetricsHandler(PublisherConsumer<JsonMetricsContext> publisherConsumer)
public MetricsHandler(PublisherConsumer<MetricReport> publisherConsumer)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe out of scope of this PR but why going through PublisherConsumer instead of injecting NServiceBusReceivedMetricContext to the handler via container?

Copy link
Copy Markdown
Contributor Author

@Scooletz Scooletz Apr 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to get NServiceBusReceivedMetricContext registered as a consumer as every other consumer. Would it be ok?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we don't expect any other consumers, do we?

@weralabaj
Copy link
Copy Markdown
Contributor

LGTM

@Scooletz Scooletz changed the base branch from master to develop April 7, 2017 12:22
@Scooletz
Copy link
Copy Markdown
Contributor Author

Scooletz commented Apr 7, 2017

Recent commits' changes:

  • code aligned with the new format of the message from the metrics plugin
  • raw data are reported with raw JObject JSON structures
  • Metrics.NET removed as they are not needed for now.

@Scooletz
Copy link
Copy Markdown
Contributor Author

Scooletz commented Apr 7, 2017

Ping @weralabaj @SzymonPobiega

@weralabaj
Copy link
Copy Markdown
Contributor

LGTM. As for the type, I think until proven it's impossible/too much effort, we should not rely on Metrics.NET types. ATM it seems it'll be simpler that way.

@SzymonPobiega
Copy link
Copy Markdown
Member

👍. Let's get this in.

@Scooletz Scooletz changed the title [WIP] Service for gathering and reporting metrics Service for gathering and reporting metrics Apr 10, 2017
@Scooletz Scooletz merged commit 782aad3 into develop Apr 10, 2017
@Scooletz Scooletz deleted the program-service branch April 14, 2017 08:36
tmasternak added a commit that referenced this pull request Jan 4, 2018
* Service for gathering and reporting metrics (#1)

Service for gathering and reporting metrics

* no assemblyinfo.cs (#5)

* Aligning RawDataProvider and other internal elements to values partially passed as headers. (#8)

* Http endpoint reporting metrics (#7)

* Http endpoint configuration exposed in App.config.

* Throw an exception on bad port.

* Packaging up the transports and exe into a zip for scmu deploy

* Cleanup proj file

* Add standard SC icon

* Roll back Json.Net version as ASQ nuget is locked at lower version

* Fix bug deriving host URI

* use same assembyresolve as SC so we can avoid polluting app.config

* Fix build

* Remove redundant delete and be more restricitive as to what gets zipped

* Try to resolve gitversiontask build issue by adding mostly empty AssemblyInfo.cs

* Add missing buildsupport files

* Make transport selectable via config

* Move endpoint name to settings

* Clean up unused async/await

* Add hosting commands for run and setup

* Add app.config settings and logic for log path

* Change Nuget to only contain the zip we need to ship with SCMU

* Fix packaging inception. Moved Nuget packager to different cs proj

* Debugging  TC build issue

* Cleanup

* Add Particular Licensing

* FIx logging rules and suppress NSB license log messages

* Update Particular.Licensing.Sources to latest

* Rationalize all settings to common prefix

* Move ConnectionString to dedicated config element

* Cleanup

* Make error queue configurable

* Make Log Path dependent on Endpoint Name

* Ironically fix tests after moments after saying everything was all good

* Disable localhost being rewritten as more permissive URLACL

When "localhost" is specified the default nancy behaviour is to set the
UrlAcl to "http://+:1234/" rather than "http://localhost:1234".  The "+"
is more permissive and allows the Url to be accessed off the corrent
server.  By disabling this the computer name, "+" or "*" must be
explicitly used in the config.  This is in line with how SC works

* Change default log location to current exe's location.  This is only applicable in debug. See Comment

* cleanup

* Introduce queue-length metric (#10)

* http protocol prefix removed from host name in app.config (#18)

* Expose License and Version information via api

* sqlserver transport configuration

* cors headers

* Adding wip version of api that could be consumed by diagrams

* minimal data storage for metrics

* exposing data for diagrams

* Writing test to verify if the buffer works correctly when passing more data than size as well if the data inserted can be retrieved

* Adding test to DiagramDataProvider to verify if it correctly parse given data

* Fixing failing test

* tests clean-up

* pleasing inspections

* handling missing and null metric values in data provider

* test clean-up

* exposing last raw critical and processing times recorded

* remvoing IConsumeLongValueOccurrences interface

* storing last timing data in 15s buckets

* Removing unused usings and fixing 1 test

* storing CriticalTime

* raw data metrics

* removing IDataConsumer IDataProducer and using NServiceBus handlers instead

* timing store tests

* timing integration tests

* exposing timings in format expected by endpoint overview page

* switching default concurrency limit to 10 (att limit kept at 1)

* switching to NSB default concurrency limit (number of logical cores)

* switching from /driagrams/data to /diagrams

* removing IBuilder dependency from NancyTask and Bootstrapper

* fixing inconsistent namespaces

* using RegisterSigleton instead of RegisterComponent with DependencyLifecycle.SingleInstance

* fixing singleton registration

* better IntervalId calculation

* handling unknow LongValueOccurence message types

* Updating times moved to MeasurementInterval

* better removal of invalid intervals

* introducting base class for nancy modules

* dto for monitored endpoints list

* fixing att - adding proper content-type header value to http client

* removing dead-code

* switching to func in GetOrAdd in TimingStore to prevent unnecessary allocations

* using custom header to specify metric type instead of EnclosedMessageTypes

* swiching to newest NSericeBus.Metrics

* Storing timing data per endpoint instance (#30)

* passing instanceId from metrics plugin to metric stores
* timings per instanceId - unfinished
* timings per instanceId
* fixing errors afer cherry picking
* removing redundant .ToArray
* yeld return in TimingStore for better readability
* removing un-used parameters in QueueLenghtDataStore
* more readable endpointNames query in TimingAggregator

* Endpoint details API (#31)

* Add initial endpoint details call

* aggregations for physical instances

* timing stores moved from aggregatro method params to constructor deps

* monitored endpoint details tests

* pleasing inspections

* CHange == to Equals to use overriden Equals implementation

* Add autofac for NSB and Nancy (#32)

* Adding queue length diagram (#29)

* Adding initial storage for queue length, quick and dirty

* Adding EndpointInstanceId

* snapshoting queue-length data

* fixing tests

* message builder in queue-length tests

* test for value snapshotting logic

* improvements to integration tests

* CHanging value to double for consistency and making initial values for XAxis

* Implementing changes to new Module structure, finishing off merging

* extracting GetString method in ATT

* extracting common query parts in queue lenght aggregation

* remvoing obsolete comment

* interating over Keys collection changed to interating over dictionary

* testing queuelength based on monitored endpoint api

* removing queue-length api module

* more precise att for timings

* simplified queue length aggregation logic

* folding UpdateReceiveSequence parameters into VirtualQueueId

* less calcuations when pushing current queue length values to the endpoint storage

* declarative virtual queue length calculations

* queue length store getting endpointInstanceId instead of endpointName

* More memory friendly time series processing (#34)

* Use pooling for messages

* TimingsStore reimplemented with simpler Measurement and RWL locking.

* New way of creating messages introduced to tests.

* Locking moved indised of the Measurement.  Fix in ReportTimeIntervals to properly use TotalMeasurements value

* Proper calculations and assignment of intervals when reporting with TimingStore. Now, the countdown approach is used to traverse as many epochs as needed.

* Tests enhancements

* Iterating over keys and retrieving value one by one replaced by a non-locking enumerator usage.

* EndpointInstanceTimings passed as a param to the reporter.

* Clear method extracted.

* no allocs for no messages

* Minor remarks fixed

* Retries metric (#37)

* adding retries deserializer

* generic message poll

* base class for raw message deserialization

* occurrence message handler

* removing unused parameter from timing store

* interval store independent of incoming message type

* nameology - changing TimingStore to IntervalStore

* retries store

* I see dead code

* retries metrics values exposed via api

* base class for api smoke tests

* missing endpoint name concatenation

* proper aggregation for retries

* fixing att

* Simplified Deserializers

* split aggregator

* removing unused file

* switching from explicit new to container TimingAggregator creation

* fixes to retries aggregation logic

* yet another typo fix

* unknownoccurrencemessagetype as unrecoverable exception

* better att

* base method in rawserializer for entries

* Queue length based on interval store (#40)

* queue length store based on IntervalStore

* removing linear monitored values type

* queue-length snapshoting

* filtering out empty endpoint instances

* proper condition for endpoint filtering

* endpoint tracking moved out of store to handlers

* no duplicates in EndpointsRegistry

* fixes to queue-length interval aggregation logic

* moving endpoint instance tracking to dedicated handler

* yet another fix to queue-length aggregations

* Refactoring IntervalStores (#41)

* refactorings

* refactoring namespace structure

* reverting MetricsReport namespace change

* Namespace adjustments

* QueueLength average calculation now takes into consideration only intervals that had data reported for.

* Simplify get instances

* making ATT behavior more predictable

* better assertion for queue length att

* json response tracking in ATT

* adding delay to timing att handler

* introducing variable history interval store

* history query parameter in api module

* Remove lookup dictionary

* Remove default values

* Remove ToArray

* Switch from concurrent to regular dictionary

* Fix request parameter loading

* Measurements per second introduced to address any throughput (#47)

* Performance tests of core Monitoring components (#42)

* Performance tests with histograms and minor optimizations of hot code paths.

* A message builder extracted for the QueueLength and a few other minor extractions/refactorings

* Performance tests are explicit no more. Artbitral Assertions on the means are added.

* Minor code inspections addressed

* TagExtensions minor refactor. PerformanceTests doubled merging of histograms removed.

* VariableHistoryIntervalStore introduced to performance tests

* Add Request object to nancy pipeline

* Throughput should be calculated like Retries (average: over all periods, data points: summarized by endpoint) and divided by the number of seconds

* Endpoint instance activity recorded and reported in MonitoredEndpointInstance as IsStale property

* Read staleness from config

* Add IsStale to logical endpoint

* Fix tests

* Appease code inspections

* Changing default port to 33633

* fixing failing build

* fixing build second try

* fixing build thrid try

* tagged longvalueoccurence deserializer

* refactoring deserializer

* missing csproj update

* tag decoder extracted to static field

* IsFull on RawMessage

* Metrics breakdown by message type (#54)

* introductin generic breakdown type to intervalStore

* making variablehistorystore generic

* processing time and critical time broken down by message type

* introducting IProvideBreakdownBy interface

* removing IProvideMetric

* de-serialization fixes

* simplifying tests

* supporting message type tag in retries metrics reports

* deleting test refactroing leftovers

* updating to nsb.metrics alpha-0016

*  comparision support for EndpiontMessageType

* simple renamings

* metrics stores with private fields for breakdowns

* retries processing moved to separate handler

* pooled message release moved to behavior

* metrics store lookup moved outside of closure

* message type registry

* generic breakdown registry

* Per message type performance test.

* fixing retries store update logic

* BreakdownRegistry recalculation optimized.

* Percentile properly asked for with '50' rather than '0.5' (#58)

* Percentile properly asked for with '50' rather than '0.5'

* Threshold adjusted

* exposting metrics digets via endpoint details api (#57)

* Renaming collections to be in plurarl

* Adding graph information to MonitoredEndpointDetails

* better naming

* more generic details query

* Minor fixes after integration tests (#55)

* cope with empty message type registry for logical endpoint

* remvoing unused metric message types

* correct concurrent access to breakdowns collection

* removing unused using and variable

* detailed metric collection as hash-set

* single interval lookup call on endpoint-details url

* time axis points calcualtion in separate method

* Clarifying the company name in the license

* Adding Queue length information to be displayed in the large graph (#59)

* Basic Implementation

* Package Learning Transport in zip

* Current interval data not included in interval returned by IntervalStore (#63)

* do not include current interval in returned metrics data

* adding intervalsize as part of public contract (HistoryPeriod)

* better naming for epoch variables

* moving epoch variable declaration closer to usage

* default input queue changed to Particular.Monitoring (#62)

* Retries and Throughput avg counted using unique aggregation intervals (#65)

* Retries and Throughput avg counted using unique aggregation intervals

* test fixes

* Turning off any http caching for proxies and clients (#66)

* http response headers turning off any http caching for proxies and clients

* adding no-store and Pragma: no-cache

* failing fast on critical error (#67)

* Smoke tests (#50)

* ASQ smoke tests

* ASQ tests are green

* Project renamed to make it aligned to ASQ

* Adding tests to RabbitMQ

* ASQ moved to a right directory

* ASB smoke tests introduced. Still, 1 tests to go green.

* Bumping up metrics for ASB makes the failing test green.

* Learning Transport smoke tests added

* SQLServer smoke tested

* Removal of assemblyinfos

* Move to smoke tests folder

* use latest licensing package

* use alpha5 package of operations.licensing

* Update licensing to 1.0.0

* Scheduled retries aggregated to rate per second (#70)

* scheduled retries aggregated to rate per second

* removing unused aggregation function

* [WIP] 1 min history period (#69)

1 min history period

* 1 min period configured with 60 intervals and 2 intervals of delay (#71)

* All history periods with 60 intervals (#72)

* each history period with 60 intervals

* increasing allowed query times in perf tests

* Select default topology for ASB (#74)

* Select default topology

* explicit type override

* fixing overrides lookup

* 150 ms query limit for biggest performance test (#75)

* Disable Legacy Retries satellite (#81)

* exposing all metrics for large graphs view (#73)

* EndpointName configuration option (#82)

* sparating endpointName and servcieName configuration values

* endpointName stored in app.config

* More realistic performance tests (#83)

* Queries to the store are not properly delayed, as it would happen in real world scenario with SP pulling for data from time to time.

* Increasing reporters upper bound

* increase query time limit

* Structured message type in API (#78)

* message types broken down

* message type parsing

* type parsing based on AssemblyName

* caching parsed message types

* pleasing inspection checker

* better message type cache lookup

* removing messagetype instance caching

* adding id to MonitoredMessageType for correlation purposes (#84)

* Instance metrics are aggregated by EndpointName and InstanceId (#85)

* instance metrics are aggregated by endpointName and instanceId

* aggregation test

* switching version to beta (#86)

* ReceiveOnly transaction mode (#88)

* Return instance name instead of logical endpoint name (#89)

* return instance name instead of logical endpoint name

* fixing badly constructed endpointInstanceId
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants