# Overview
The source for this notebook can be found at https://github.com/NASA-NAVO/servicemon/blob/main/servicemon/analysis/notebooks/ExplorePerformanceData.ipynb.

NAVO has started regularly querying some TAP and Cone Search services to collect data on their response times.  So far this is mostly NAVO services, but also includes a CDS 2MASS cone search for comparison.  (Some Chandra Source Catalog queries are also done, but due to sparse sky coverage these need to be adjusted.)

The queries are done using the `servicemon` application (https://servicemon.readthedocs.io/en/latest/), and are executed from several different locations.  The AWS instrumentation is handled with the software at https://github.com/NASA-NAVO/AWS_servicemon.  The results are written to a TAP-accessible database currently running at IPAC.  

### Collaborating
Now that all can examine the monitoring data and run additional tests, all can contribute:

 - Analyzing response data.
  - Explaining anomalous or poor measurements (is the issue with the service, network, servicemon, etc.?)
  - Addressing major performance issues.
  
 - Developing other plots, analysis or alerts.
  - What plots (like those below?) are worth posting somewhere on web?
 
 - Maintaining the operational monitoring.
 
 - Monitoring parameters
  - Should services be added/removed from the monitoring list?
  - Is the current cadence OK?
  - Do we agree that we should add in non-random queries so we have more concrete comparison data?
  - TAP queries are currently all async.  We've found this adds significant overhead in some cases.  Should routine or one off monitoring include sync TAP queries?
  
  
 - Open Development participation in `servicemon` and `AWS_servicemon`.
  - Filing, discussing and prioritizing issues.
  - Bug fix and feature development.
  
### Known Issues/Action Items
Short term:
 - Probably should add some non-random queries to suite.

Longer term:
 - Consider controlling inputs on a per service basis due to differences in density and coverage.
 - Support non-positional queries (TAP, DataLink, non-VO services)
 
# What Tests Are Run
All of the parameters of the queries are configurable, but below is what is currently running.  TAP queries now are all async.
### Services
| base_name | service_type |
| --- | --- |
| CDS_2MASS | cone |   
| Chandra_CSC | cone |  
| Chandra_CSC | tap |   
| HEASARC_swiftmastr | cone |  
| HEASARC_swiftmastr | tap |  
| HEASARC_xmmssc | cone |   
| HEASARC_xmmssc | tap | 
| IPAC_2MASS | cone |  
| IPAC_2MASS | tap |  
| IPAC_WISE | cone |   
| IPAC_WISE | tap | 
| NED_NED | cone |  
| NED_NED | tap | 
| STScI_2MASS | cone |  
| STScI_PanSTARRS | tap |  
| STScI_PanSTARRS | xcone |  
| STScI_ObsTAP | tap |  
| STScI_WISE | cone |

### When and what cones?
A set of 10 random cone queries, with radii ranging from 0 to 0.25 degrees, is run for each service every 6 hours.  The exact hours are staggered by location.

We should change this to include (or only use) fixed cones, so that we can compare the exact same queries over time.  (`servicemon` can be run with fixed or random targets.)
### From Where
The queries are run from the following AWS regions:

'ap-northeast-1', 'ap-southeast-2', 'eu-west-3', 'sa-east-1', 'us-east-1','us-west-2'

Due to testing, the database may also contain scattered results from other locations.
                
# Result Data Available via TAP
The TAP service at http://navo01.ipac.caltech.edu/TAP has a table called `navostats2` with one row per query run by servicemon. This table contains data starting on about April 6, 2021.  

<div class="alert alert-block alert-info">
    <p>For legacy data from February and March 2021, there is also an older table called navostats which contains results from Feb 2, 2021 to Mar 27, 2021, with slightly different column names as detailed in <a href="https://github.com/NASA-NAVO/servicemon/issues/47">https://github.com/NASA-NAVO/servicemon/issues/47</a>. </p>
    <p>For more please see <a href="https://nasa-navo.github.io/ExplorePerformanceData_original_columns.html">https://nasa-navo.github.io/ExplorePerformanceData_original_columns.html</a></p>
</div>

__Note:__  The VOSI endpoints have not yet been implemented for this service, so PyVO and Topcat will complain during metadata gathering, __but__ both both PyVO and Topcat can be used to query this service, and all the `TAP_SCHEMA` tables are implemented, so those can be used to query metadata.

The following columns are available:

## Query Description
#### Query Input
| __column_name__ | __datatype__ | __format__ | __description__ |
| --- | --- | --- | --- |
| __`ra`__ | _double_ | _20.6f_ | Right Ascension of the query cone region. |
| __`dec`__ | _double_ | _20.6f_ | Declination of the query cone region. |
| __`sr`__  | _double_ | _20.6f_ | Radius of the query cone region (deg). |
| __`adql`__ | _char_ | _300s_ | For TAP queries this is the full ADQL query that was done.  Empty for non-TAP queries. |

#### Other Query Metadata
| __column_name__ | __datatype__ | __format__ | __description__ |
| --- | --- | --- | --- |
| __`access_url`__ | _char_ _300s_ | The base URL of the service. |
| __`base_name`__ | _char_ | _20s_ | A short name of the service given by the `servicemon` configuration files.  Not yet consistent for all services.
| __`service_type`__ | _char_ | _20s_ | While other values are possible, the main service types we're tracking now are _tap_, _cone_, and _xcone_ which is like cone, but not VO-compliant. |
| __`location`__ | _char_ | _80s_ | Self-declared location of the monitoring service (e.g., AWS region). |
| __`start_time`__ | _char_ | _30s_ | The data and time that the query was started (format='%Y-%m-%d %H:%M:%S.%f'). |
| __`end_time`__ | _char_ | _30s_ | The data and time that the query was completed (format='%Y-%m-%d %H:%M:%S.%f'). |

## Query Results
**Note that these values may empty for certain types of query failures.**
#### Timing
| __column_name__ | __datatype__ | __format__ | __description__ |
| --- | --- | --- | --- |
| __`do_query_dur`__ | _double_ | _20.6f_ | Time to an HTTP response indicating that the query is complete, but prior to the results being streamed back to the client. |
| __`stream_to_file_dur`__ | _double_ | _20.6f_ | Time to download the the results after the HTTP response indicating that the query was complete. |
| __`query_total_dur`__ | _double_ | _20.6f_ | Total time from query start to query end including download time. |
| __`extra_dur0_name`__ | _char_ | _20s_ | "tap_submit" for async tap results, null otherwise. |
| __`extra_dur0_value`__ | _double_ | _20.6f_ | Duration of submitting the TAP submit request for async tap results, null otherwise. |
| __`extra_dur1_name`__ | _char_ | _20s_ | "tap_run" for async tap results, null otherwise. |
| __`extra_dur1_value`__ | _double_ | _20.6f_ | Duration of submitting the TAP run request for async tap results, null otherwise. |
| __`extra_dur2_name`__ | _char_ | _20s_ | "tap_wait" for async tap results, null otherwise. |
| __`extra_dur2_value`__ | _double_ | _20.6f_ | Duration of submitting and waiting for the TAP wait query for async tap results, null otherwise. |
| __`extra_dur3_name`__ | _char_ | _20s_ | "tap_raise_if_error" for async tap results, null otherwise. |
| __`extra_dur3_value`__ | _double_ | _20.6f_ | Duration of calling the pyvo [AsyncTAPJob.raise_if_error()](https://pyvo.readthedocs.io/en/latest/api/pyvo.dal.AsyncTAPJob.html#pyvo.dal.AsyncTAPJob.raise_if_error) function for async tap results, null otherwise. |
| __`extra_dur4_name`__ | _char_ | _20s_ | "tap_fetch_response" for async tap results, null otherwise. |
| __`extra_dur4_value`__ | _double_ | _20.6f_ | Duration of calling the pyvo [AsyncTAPJob.fetch_result()](https://pyvo.readthedocs.io/en/latest/api/pyvo.dal.AsyncTAPJob.html#pyvo.dal.AsyncTAPJob.fetch_result) function for async tap results (does not include the time to actually retrieve the data and save it in a file), null otherwise. |

#### Result metadata
| __column_name__ | __datatype__ | __format__ | __description__ |
| --- | --- | --- | --- |
| __`num_columns`__ | _integer_ | _9d_ | Number of FIELDs in the result VOTable. |
| __`num_rows`__ | _integer_ | _9d_ | Number of rows in the result VOTable. |
| __`size`__ | _integer_ | _10d_ | Size of the result VOTable (bytes). |


# Querying and Plotting the Data
## Imports
This code requires an environment that includes servicemon, bokeh and pandas.

In [None]:
from bokeh.plotting import output_file, output_notebook, show, reset_output

from servicemon.analysis.stat_queries import StatQueries
from servicemon.analysis.basic_plotting import create_service_plots, create_source, create_plot_location_shapes

## Sample Plotting Functions
The class and functions described in this [API document](https://servicemon.readthedocs.io/en/latest/analysis-api.html) support making queries, converting our query results to pandas, then plotting some sample plots using bokeh, both in a notebook and on a web page.  That API is used by the code below.

In [None]:
sq = StatQueries()

services = sq.get_name_service_pairs()

create_service_plots(sq, services, start_time='2021-04-22', end_time='2021-04-25')

# More Plot Ideas

 - Differentiate based on where the query originated.  E.g., plot different shape or color based on the location value.
 - Plot durations versus time of day, overlaying multiple days, to look for trends based on time of day.  Since the queries are started at different hours depending on the location, plotting the locations with different colors/shapes as above could also be helpful.

#### Sample plot differentiating location by shape

#### Try out the location plots

In [None]:
reset_output()
output_notebook()

sq = StatQueries()

query = """
select * from navostats2
where location in (
   'ap-northeast-1',
   'ap-southeast-2',
   'eu-west-3',
   'sa-east-1',
   'us-east-1',
   'us-west-2'
)
"""
data = sq.do_query(query)
source = create_source(data)

plot = create_plot_location_shapes(source)
show(plot)
