In [20]:
%reload_ext Kqlmagic

<IPython.core.display.Javascript object>

In [21]:
%kql AzureDataExplorer://code;cluster='vso';database='vso'

<IPython.core.display.Javascript object>

In [22]:
_Service_ = 'kalypso'
_ScaleUnit_ = 'kalypso-wcus-1'
_AlertTime_ = '2019-09-20 20:45'
_UpperOutlier_ = 5
_LowerOutlier_ = -5
_RunResult_ = 'Succeeded'

In [23]:
dynamic_tsg_transition_scope = ['Succeeded', 'UserError', 'PlatformError']

In [24]:
from IPython.display import Markdown, display
def printmd(string):
    display(Markdown(string))

In [25]:
printmd("# TSG : Monitor State Transition : {0}".format(_RunResult_))
printmd("## Related Monitors")
printmd("This monitor is fired when kalypso monitor, [Monitor Transition](https://kalypsocus.vstskalypso.visualstudio.com//summary?containerId=5a0432f7-b311-4c7e-810d-3152b0215c8c&selectedPage=summary&monitorId=59787740-e5d6-4740-abdc-df6ef1ec8488), detects an [anomaly](https://kusto.azurewebsites.net/docs/query/series-outliersfunction.html?q=series_outlie), sudden increase of number of monitors failing with Platform error or User error or sudden decrease of number of monitors succeeding, in monitor runs.")

# TSG : Monitor State Transition : Succeeded

## Related Monitors

This monitor is fired when kalypso monitor, [Monitor Transition](https://kalypsocus.vstskalypso.visualstudio.com//summary?containerId=5a0432f7-b311-4c7e-810d-3152b0215c8c&selectedPage=summary&monitorId=59787740-e5d6-4740-abdc-df6ef1ec8488), detects an [anomaly](https://kusto.azurewebsites.net/docs/query/series-outliersfunction.html?q=series_outlie), sudden increase of number of monitors failing with Platform error or User error or sudden decrease of number of monitors succeeding, in monitor runs.

In [26]:
printmd("## Reason for current monitor alert")
printmd("The present monitor run has detected anomaly in number of monitor runs having <span style='color:red;font-weight:bold'>{}</span>".format(_RunResult_))


## Reason for current monitor alert

The present monitor run has detected anomaly in number of monitor runs having <span style='color:red;font-weight:bold'>Succeeded</span>

In [27]:
if _RunResult_ not in dynamic_tsg_transition_scope:
    printmd("Transition type, <b>{}</b>, is not supported in this synamic TSG.".format(_RunResult_))
    

In [28]:
if _RunResult_ == "Succeeded":
    printmd("When this alert fires, usually the <b>UserError state transition</b> monitor and/or the <b>PlatformError state transition</b> monitor also fires an alert.")
    printmd("* If one of these two fired, please refer to the corresponding state transition monitor and STOP HERE (ignore the below analysis)")
    printmd("* Otherwise, continue with the below investigation")

When this alert fires, usually the <b>UserError state transition</b> monitor and/or the <b>PlatformError state transition</b> monitor also fires an alert.

* If one of these two fired, please refer to the corresponding state transition monitor and STOP HERE (ignore the below analysis)

* Otherwise, continue with the below investigation

In [29]:
printmd("In the below figure, outlierPossibility measures the likeliness of failMonitorCount value to be an outlier. The threshold for outlierPossibility is set to {} to {}. This alert is fired because outlier possibility has exceeded or dropped below the threshold value.".format(_UpperOutlier_, _LowerOutlier_))

In the below figure, outlierPossibility measures the likeliness of failMonitorCount value to be an outlier. The threshold for outlierPossibility is set to 5 to -5. This alert is fired because outlier possibility has exceeded or dropped below the threshold value.

In [30]:
%%kql
let _alertTime_ = _AlertTime_;
let endTime = todatetime(_alertTime_);
let _upperOutlier_ = _UpperOutlier_;
let _lowerOutlier_ = _LowerOutlier_;
let _service_ = _Service_;
let _scaleUnit_ = _ScaleUnit_;
let _runResult_ = _RunResult_;
JobHistory
| where TIMESTAMP > bin(endTime - 1d, 5m) and TIMESTAMP <= bin(endTime - 30m, 5m) // take proper time range to avoid spikes around the time boundary
| where Service == _service_
| where ScaleUnit == _scaleUnit_
| where Plugin == "Microsoft.VisualStudio.Services.Kalypso.Jobs.ZeroRowMonitor"
| where ResultMessage contains "Monitor run result" 
| project TIMESTAMP, JobName, Plugin, ResultMessage 
| extend RunResult = case(ResultMessage contains "Monitor run result: Succeeded", "Succeeded",
ResultMessage contains "Monitor run result: SignaledWithError", "SignaledWithError",
ResultMessage contains "Monitor run result: Signaled", "Signaled",
ResultMessage contains "Monitor run result: UserError", "UserError",
ResultMessage contains "Monitor run result: PlatformError", "PlatformError",
"Default")
| where RunResult == _runResult_
| summarize count() by bin(TIMESTAMP, 5m)
| summarize OutlierTimeStamp=makelist(TIMESTAMP, 50000), failMonitorCount = makelist(count_, 50000) 
| extend outlierPossibility=series_outliers(failMonitorCount)
| mvexpand OutlierTimeStamp to typeof(datetime), failMonitorCount to typeof(double), outlierPossibility to typeof(double) limit 50000
| extend Threshold = iff(_runResult_ == "UserError" or _runResult_ == "PlatformError", _upperOutlier_, _lowerOutlier_)
| render timechart with (title = "Number of monitors failing over last 24 hours", ycolumns = failMonitorCount, outlierPossibility, Threshold)

FigureWidget({
    'data': [{'line': {'color': 'rgb(31, 118, 179)', 'width': 1},
              'name': 'failMo…

## Top 5 Exceptions thrown by monitors failing with UserError
ProductTrace table will log the exceptions occurred during monitor execution. Let's examine ProductTrace to get the top exceptions thrown by failing monitors.

In [31]:
%%kql
let _alertTime_ = _AlertTime_;
let endTime = todatetime(_alertTime_);
let _upperOutlier_ = _UpperOutlier_;
let _lowerOutlier_ = _LowerOutlier_;
let _service_ = _Service_;
let _scaleUnit_ = _ScaleUnit_;
let _runResult_ = _RunResult_;
ProductTrace
| where Service == _service_ and ScaleUnit == _scaleUnit_ and ExceptionType != ""
| where TIMESTAMP > endTime - 1h and TIMESTAMP < endTime
| join kind=inner (
    JobHistory
    | where Service == _service_ and ScaleUnit == _scaleUnit_
    | where TIMESTAMP > todatetime(_alertTime_) - 1h
    | where Plugin == "Microsoft.VisualStudio.Services.Kalypso.Jobs.ZeroRowMonitor"
    | where ResultMessage contains "Monitor run result" 
    | project ActivityId, TIMESTAMP, JobName, Plugin, ResultMessage 
    | extend RunResult = case(ResultMessage contains "Monitor run result: Succeeded", "Succeeded",
        ResultMessage contains "Monitor run result: SignaledWithError", "SignaledWithError",
        ResultMessage contains "Monitor run result: Signaled", "Signaled",
        ResultMessage contains "Monitor run result: UserError", "UserError",
        ResultMessage contains "Monitor run result: PlatformError", "PlatformError",
        "Default")
    | where RunResult == _runResult_
) on ActivityId
| project TIMESTAMP, ExceptionType, Message, JobName, ResultMessage
| summarize count() by ExceptionType
| top 5 by count_

ExceptionType,count_
Microsoft.TeamFoundation.Framework.Server.HostShutdownException,3


In [32]:
df_exception = _kql_raw_result_.to_dataframe()
top_5_exception_types = df_exception['ExceptionType'].tolist()
if  "Kusto.Data.Exceptions.KustoClientTimeoutException" in top_5_exception_types or "Kusto.Data.Exceptions.KustoClientException" in top_5_exception_types:
    printmd("Issue is from Kusto side. Check the health of the cluster. If the health of the cluster is bad, raise a ticket againts kusto team")
elif "Kusto.Data.Exceptions.SyntaxException" in top_5_exception_types or "Kusto.Data.Exceptions.KustoBadRequestException" in top_5_exception_types or "Kusto.Data.Exceptions.EntityNotFoundException" in top_5_exception_types:
    printmd("Bad monitors might have been added. Monitors are failing because the trigger query has syntax errors. As a mitigation step, work with monitor owners to disable the monitor and ask them to fix the query.")
elif "Kusto.Data.Exceptions.KustoRequestDeniedException" in top_5_exception_types:
    printmd("The app id used to query data from kusto endpoint doesn't have enough permissions. Work with monitor owner to add kalypso's appId / appId stored in keyvault to kusto cluster")
elif "Kusto.Data.Exceptions.SemanticException" in top_5_exception_types:
    printmd("Top exception thrown is SemanticException. This might be because of user error or access issue. Follow below steps to mitigate the issue.")
else:
    printmd("Unknown exception. Please investigate further and update the dynamic TSG")

Unknown exception. Please investigate further and update the dynamic TSG

In [33]:
printmd("Examine the error message in stack trace.")

Examine the error message in stack trace.

In [34]:
%%kql
let _alertTime_ = _AlertTime_;
let endTime = todatetime(_alertTime_);
let _upperOutlier_ = _UpperOutlier_;
let _lowerOutlier_ = _LowerOutlier_;
let _service_ = _Service_;
let _scaleUnit_ = _ScaleUnit_;
let _runResult_ = _RunResult_;
let _exceptionTypes_ = top_5_exception_types;
ProductTrace
| where Service == _service_ and ScaleUnit == _scaleUnit_ and ExceptionType != ""
| where TIMESTAMP > endTime - 1h and TIMESTAMP < endTime
| join kind=inner (
    JobHistory
    | where Service == _service_ and ScaleUnit == _scaleUnit_
    | where TIMESTAMP > endTime - 1h and TIMESTAMP < endTime
    | where Plugin == "Microsoft.VisualStudio.Services.Kalypso.Jobs.ZeroRowMonitor"
    | where ResultMessage contains "Monitor run result" 
    | project ActivityId, TIMESTAMP, JobName, Plugin, ResultMessage 
    | extend RunResult = case(ResultMessage contains "Monitor run result: Succeeded", "Succeeded",
        ResultMessage contains "Monitor run result: SignaledWithError", "SignaledWithError",
        ResultMessage contains "Monitor run result: Signaled", "Signaled",
        ResultMessage contains "Monitor run result: UserError", "UserError",
        ResultMessage contains "Monitor run result: PlatformError", "PlatformError",
        "Default")
    | where RunResult == _runResult_
) on ActivityId
| project TIMESTAMP, ExceptionType, Message, JobName, ResultMessage
| where ExceptionType in (_exceptionTypes_)
| limit 10

TIMESTAMP,ExceptionType,Message,JobName,ResultMessage
2019-09-20 19:56:54.829144+00:00,Microsoft.TeamFoundation.Framework.Server.HostShutdownException,"Microsoft.TeamFoundation.Framework.Server.HostShutdownException: VS403201: The organization is currently offline as it is being moved to another enterprise. Once the move has been completed the organization will come back online. Try accessing the organization at a later time. Activity Id: 7b36f2c8-498a-4dcd-a474-d4ff478bf08c.  at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.<HandleResponseAsync>d__53.MoveNext() in D:\v2.0\P1\_work\7\s\Vssf\Client\WebApi\VssHttpClientBase.cs:line 935 --- End of stack trace from previous location where exception was thrown ---  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()  at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)  at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.<SendAsync>d__51.MoveNext() in D:\v2.0\P1\_work\7\s\Vssf\Client\WebApi\VssHttpClientBase.cs:line 883 --- End of stack trace from previous location where exception was thrown ---  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()  at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)  at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.<SendAsync>d__47`1.MoveNext() in D:\v2.0\P1\_work\7\s\Vssf\Client\WebApi\VssHttpClientBase.cs:line 755 --- End of stack trace from previous location where exception was thrown ---  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()  at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)  at Microsoft.VisualStudio.Services.Location.Client.LocationHttpClient.<GetConnectionDataAsync>d__6.MoveNext() in D:\v2.0\P1\_work\7\s\Vssf\Client\WebApi\HttpClients\LocationHttpClient.cs:line 73 --- End of stack trace from previous location where exception was thrown ---  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()  at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)  at Microsoft.VisualStudio.Services.Location.Server.RemoteLocationDataProvider.FetchLocationData(IVssRequestContext requestContext) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\RemoteLocationDataProvider.cs:line 297  at Microsoft.VisualStudio.Services.Location.Server.LocationDataProvider.<GetLocationData>b__43_0(IVssRequestContext request, String key) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\LocationDataProvider.cs:line 1395  at Microsoft.VisualStudio.Services.Location.Server.LocationDataCache`1.GetLocationData(IVssRequestContext requestContext, T cacheKeyIdentifier, Func`3 loadData) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\LocationCache.cs:line 152  at Microsoft.VisualStudio.Services.Location.Server.LocationDataProvider.GetLocationData(IVssRequestContext requestContext) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\LocationDataProvider.cs:line 1382  at Microsoft.VisualStudio.Services.Location.Server.RemoteLocationDataProvider..ctor(IVssRequestContext requestContext, ILocationDataCache`1 locationCache, String remoteLocationUrl) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\RemoteLocationDataProvider.cs:line 41  at Microsoft.VisualStudio.Services.Location.Server.LocationService.CreateRemoteDataProvider(IVssRequestContext requestContext, String location) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\LocationService.cs:line 404  at Microsoft.VisualStudio.Services.Location.Server.LocationService.ResolveLocationData(IVssRequestContext requestContext, Guid serviceAreaIdentifier) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\LocationService.cs:line 312  at Microsoft.VisualStudio.Services.Location.Server.LocationService.GetLocationData(IVssRequestContext requestContext, Guid serviceAreaIdentifier, Boolean throwOnMissingArea) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\LocationService.cs:line 221  at Microsoft.TeamFoundation.Framework.Server.ClientProvider.CreateClient(IVssRequestContext requestContext, Type requestedType, Guid serviceAreaId, Guid serviceIdentifier) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\BusinessLogic\ClientProvider.cs:line 355  at Microsoft.TeamFoundation.Framework.Server.ClientProvider.GetClientImpl[T](IVssRequestContext requestContext, Guid serviceAreaId, Guid serviceIdentifier) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\BusinessLogic\ClientProvider.cs:line 274  at Microsoft.TeamFoundation.Framework.Server.ClientProvider.GetClient[T](IVssRequestContext requestContext) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\BusinessLogic\ClientProvider.cs:line 222  at Microsoft.VisualStudio.Services.Identity.FrameworkIdentityStore.<>c__DisplayClass27_1.<ReadIdentities>b__3() in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Identity\FrameworkIdentityStore.cs:line 1277  at Microsoft.VisualStudio.Services.Identity.FrameworkIdentityStore.SafeReadIdentities(IVssRequestContext requestContext, Func`1 happyPath, Func`1 onExceptionResults) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Identity\FrameworkIdentityStore.cs:line 1550  at Microsoft.VisualStudio.Services.Identity.FrameworkIdentityStore.ReadIdentities(IVssRequestContext requestContext, IdentityDomain hostDomain, IList`1 identityIds, QueryMembership queryMembership, IEnumerable`1 propertyNameFilters, Boolean includeRestrictedVisibility) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Identity\FrameworkIdentityStore.cs:line 1261  at Microsoft.VisualStudio.Services.Identity.FrameworkIdentityService.ReadIdentities(IVssRequestContext requestContext, IList`1 identityIds, QueryMembership queryMembership, IEnumerable`1 propertyNameFilters, Boolean includeRestrictedVisibility) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Identity\FrameworkIdentityService.cs:line 886  at Microsoft.VisualStudio.Services.Identity.CompositeIdentityService.ReadIdentities(IVssRequestContext requestContext, IList`1 identityIds, QueryMembership queryMembership, IEnumerable`1 propertyNameFilters, Boolean includeRestrictedVisibility) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Identity\CompositeIdentityService.cs:line 534  at Microsoft.VisualStudio.Services.Kalypso.Server.IdentityHelper.GetIdentities(IVssRequestContext requestContext, IdentityService identityService, IEnumerable`1 identityIds) in D:\v2.0\P1\_work\7\s\Kalypso\Service\Server\IdentityHelper.cs:line 127  at Microsoft.VisualStudio.Services.Kalypso.Server.IdentityHelper.QueryIdentities(IVssRequestContext context, IList`1 identityIds) in D:\v2.0\P1\_work\7\s\Kalypso\Service\Server\IdentityHelper.cs:line 27  at Microsoft.VisualStudio.Services.Kalypso.Server.ServiceHelpers.FillIdentityInformation(IVssRequestContext requestContext, ItemMetadata itemMetadata) in D:\v2.0\P1\_work\7\s\Kalypso\Service\Server\Health\Monitors\ServiceHelpers.cs:line 21","Monitor: name: VssHealthAgent auto-mitigation, category: QueryResultMonitor",Running ZeroRowMonitor: VssHealthAgent auto-mitigation. container: OrgSearch RunId: 00000000-0000-0000-0000-000000000000 Stage 1 of 1 Monitor run result: Succeeded
2019-09-20 19:56:54.828661+00:00,Microsoft.TeamFoundation.Framework.Server.HostShutdownException,"Microsoft.TeamFoundation.Framework.Server.HostShutdownException: VS403201: The organization is currently offline as it is being moved to another enterprise. Once the move has been completed the organization will come back online. Try accessing the organization at a later time. Activity Id: 7b36f2c8-498a-4dcd-a474-d4ff478bf08c.  at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.<HandleResponseAsync>d__53.MoveNext() in D:\v2.0\P1\_work\7\s\Vssf\Client\WebApi\VssHttpClientBase.cs:line 935 --- End of stack trace from previous location where exception was thrown ---  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()  at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)  at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.<SendAsync>d__51.MoveNext() in D:\v2.0\P1\_work\7\s\Vssf\Client\WebApi\VssHttpClientBase.cs:line 883 --- End of stack trace from previous location where exception was thrown ---  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()  at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)  at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.<SendAsync>d__47`1.MoveNext() in D:\v2.0\P1\_work\7\s\Vssf\Client\WebApi\VssHttpClientBase.cs:line 755 --- End of stack trace from previous location where exception was thrown ---  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()  at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)  at Microsoft.VisualStudio.Services.Location.Client.LocationHttpClient.<GetConnectionDataAsync>d__6.MoveNext() in D:\v2.0\P1\_work\7\s\Vssf\Client\WebApi\HttpClients\LocationHttpClient.cs:line 73 --- End of stack trace from previous location where exception was thrown ---  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()  at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)  at Microsoft.VisualStudio.Services.Location.Server.RemoteLocationDataProvider.FetchLocationData(IVssRequestContext requestContext) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\RemoteLocationDataProvider.cs:line 297  at Microsoft.VisualStudio.Services.Location.Server.LocationDataProvider.<GetLocationData>b__43_0(IVssRequestContext request, String key) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\LocationDataProvider.cs:line 1395  at Microsoft.VisualStudio.Services.Location.Server.LocationDataCache`1.GetLocationData(IVssRequestContext requestContext, T cacheKeyIdentifier, Func`3 loadData) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\LocationCache.cs:line 152  at Microsoft.VisualStudio.Services.Location.Server.LocationDataProvider.GetLocationData(IVssRequestContext requestContext) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\LocationDataProvider.cs:line 1382  at Microsoft.VisualStudio.Services.Location.Server.RemoteLocationDataProvider..ctor(IVssRequestContext requestContext, ILocationDataCache`1 locationCache, String remoteLocationUrl) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\RemoteLocationDataProvider.cs:line 41  at Microsoft.VisualStudio.Services.Location.Server.LocationService.CreateRemoteDataProvider(IVssRequestContext requestContext, String location) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\LocationService.cs:line 404  at Microsoft.VisualStudio.Services.Location.Server.LocationService.ResolveLocationData(IVssRequestContext requestContext, Guid serviceAreaIdentifier) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\LocationService.cs:line 312  at Microsoft.VisualStudio.Services.Location.Server.LocationService.GetLocationData(IVssRequestContext requestContext, Guid serviceAreaIdentifier, Boolean throwOnMissingArea) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Location\LocationService.cs:line 221  at Microsoft.TeamFoundation.Framework.Server.ClientProvider.CreateClient(IVssRequestContext requestContext, Type requestedType, Guid serviceAreaId, Guid serviceIdentifier) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\BusinessLogic\ClientProvider.cs:line 355  at Microsoft.TeamFoundation.Framework.Server.ClientProvider.GetClientImpl[T](IVssRequestContext requestContext, Guid serviceAreaId, Guid serviceIdentifier) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\BusinessLogic\ClientProvider.cs:line 274  at Microsoft.TeamFoundation.Framework.Server.ClientProvider.GetClient[T](IVssRequestContext requestContext) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\BusinessLogic\ClientProvider.cs:line 222  at Microsoft.VisualStudio.Services.Identity.FrameworkIdentityStore.<>c__DisplayClass27_1.<ReadIdentities>b__3() in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Identity\FrameworkIdentityStore.cs:line 1277  at Microsoft.VisualStudio.Services.Identity.FrameworkIdentityStore.SafeReadIdentities(IVssRequestContext requestContext, Func`1 happyPath, Func`1 onExceptionResults) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Identity\FrameworkIdentityStore.cs:line 1550  at Microsoft.VisualStudio.Services.Identity.FrameworkIdentityStore.ReadIdentities(IVssRequestContext requestContext, IdentityDomain hostDomain, IList`1 identityIds, QueryMembership queryMembership, IEnumerable`1 propertyNameFilters, Boolean includeRestrictedVisibility) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Identity\FrameworkIdentityStore.cs:line 1261  at Microsoft.VisualStudio.Services.Identity.FrameworkIdentityService.ReadIdentities(IVssRequestContext requestContext, IList`1 identityIds, QueryMembership queryMembership, IEnumerable`1 propertyNameFilters, Boolean includeRestrictedVisibility) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Identity\FrameworkIdentityService.cs:line 886  at Microsoft.VisualStudio.Services.Identity.CompositeIdentityService.ReadIdentities(IVssRequestContext requestContext, IList`1 identityIds, QueryMembership queryMembership, IEnumerable`1 propertyNameFilters, Boolean includeRestrictedVisibility) in D:\v2.0\P1\_work\7\s\Vssf\Sdk\Server\Identity\CompositeIdentityService.cs:line 499","Monitor: name: VssHealthAgent auto-mitigation, category: QueryResultMonitor",Running ZeroRowMonitor: VssHealthAgent auto-mitigation. container: OrgSearch RunId: 00000000-0000-0000-0000-000000000000 Stage 1 of 1 Monitor run result: Succeeded
2019-09-20 19:56:54.824423+00:00,Microsoft.TeamFoundation.Framework.Server.HostShutdownException,,"Monitor: name: VssHealthAgent auto-mitigation, category: QueryResultMonitor",Running ZeroRowMonitor: VssHealthAgent auto-mitigation. container: OrgSearch RunId: 00000000-0000-0000-0000-000000000000 Stage 1 of 1 Monitor run result: Succeeded


In [35]:
if "Kusto.Data.Exceptions.SemanticException" in top_5_exception_types:
    printmd('if message is of format "Principal xxxxxx is not authorized to access database dbxxxxx.", add principle ID mentioned to the database or report the issue to Kalyso V-Team at kalypsovt@microsoft.com')

### Distribution of exceptions
To understand the severity of the issue let's see number of monitors and containers that are being affected

In [36]:
%%kql
let _alertTime_ = _AlertTime_;
let endTime = todatetime(_alertTime_);
let _upperOutlier_ = _UpperOutlier_;
let _lowerOutlier_ = _LowerOutlier_;
let _service_ = _Service_;
let _scaleUnit_ = _ScaleUnit_;
let _runResult_ = _RunResult_;
JobHistory
| where TIMESTAMP > bin(endTime - 1h, 5m) and TIMESTAMP <= bin(endTime - 30m, 5m) // take proper time range to avoid spikes around the time boundary
| where Service == _service_
| where ScaleUnit == _scaleUnit_
| where Plugin == "Microsoft.VisualStudio.Services.Kalypso.Jobs.ZeroRowMonitor"
| where ResultMessage contains "Monitor run result" 
| project TIMESTAMP, JobName, Plugin, ResultMessage 
| extend RunResult = case(ResultMessage contains "Monitor run result: UserError", "UserError",
ResultMessage contains "Monitor run result: PlatformError", "PlatformError",
"Default")
| where RunResult == _runResult_
| parse kind=regex ResultMessage with "Running ZeroRowMonitor: "monitor_name". container: "container_name
| summarize by monitor_name, container_name
| summarize NumberOfImpactedMonitors = dcount(monitor_name), NumberOfImpactedContainers = dcount(container_name)

NumberOfImpactedMonitors,NumberOfImpactedContainers
0,0
