Switch branches/tags
AnomalyTimelinesView-Json ExtendAutoTuneFrameworkToTakeExtraParameters LLConsumerPrototype RefinedAutoTuneThirdeyeComponents TE_endpoints_bugfix add-config-backup-tool add-config-merge-tool add-support-for-routing-options-through-query-options add-unit-tests-for-SortedInvertedIndexBasedFilterOperator adding_msg_delay_metrics adding_startree_realtime_support adl adlFix allow-multiple-options-clause anomaly-filters anomalyFeedImp anomalyFunctionAutotune auto-reload automatic-offline-label-replay avoid-npe-on-response-block-excepion baseline-drawing-tool baseline-expose batch-upload-segments bothRefactors broker_cache byte-array cache-table-schemas-in-broker change-llc-segment-name-ordering change-log-level-for-transient-llc-consumption-exceptions changing-log-level-for-transient-llc-exceptions changingHadoopVersion cleanup columnar-segment-builder controller-auto-clean-segments controllerApis convert_to_csv_fed2 create-alert-bug-fix create-alert createSegment dag-execution daily-detection-property-update database_fix datetime-functions default_onboard_setting deleteapi deprecate_time_use_datetime deprecate_timespec_round_3 derivingAvroSchemas deserializingSimpleDatae dimension-analysis-fix3 docs dynamic_fasthll ekg ekg2 emit-metric-when-querying-unknown-columns enable-anomaly-link execution-stats exposePreviewAnomalyScore faster-scan-statistics fix-catch-throwable fix-rebalance-missing-replicas fix-remove fix-thousand-separator fix_DST_for_Anomalies_tab fix_time_conversion fixReplicaRebalanceTests fixing_dictionary_creation format-json-responses functionAutotuneEndpoint general-bug-fix01 general-bug-fix02 groupby_udf_hack hadoop-convert-to-csv hadoopRename handle-null-znrecord-for-rt hide-anomalies-disabled-alerts holiday_alert_mode https image-embed-email improve-controller-instance-toggle-test improve-missing-segments in_clause jfim-patch-3 lib-migration-tool makeInputConfigurable master merge-cover-metadata messWithAzkaban migrationTool min_max_comparable min_max_string minionExecutorRefactoring new-anomaly-labels new-create-alert newResultDTO newSegmentCreation nodict-config onboard-tasks onboard_endpoints onboarding-classic-task-execution onboarding-task opal pbnj pbnjCons prefetch primitive-mutable-dictionary-pr psfToOrc query-runner-pr quotaCheckAPI quotaConfig rca-advanced-dimensions05-filters01 rca-advanced-dimensions05b-tablefixes rca-default-investigation-period rca-dimension-table-bugs01 realtime-segment-build-heap-measurements refactor-int-pair remove-bitmap-cache resource_allocation revert-1603-dataframe-object-series revert-1613-rootcause-entity-mapping-editor-datatables revert-1687-anomaly-filters revert-2498-detection-abort-task-on-interrupt rle robust-pipeline-2 rollup-segment-converter scanMVOptimization schemas scrAndhscmrj segment-merge-rollup-demo segmentFetcherInterface segmentUnavailable self-serve-2228 self-serve-acceptance-test04 self-serve-alert-report01 self-serve-alert-report02 self-serve-create-dimensions self-serve-debug self-serve-edit-notification-group2 self-serve-filters self-serve-fixes7 self-serve-graph-fix01 self-serve-import-logic self-serve-import self-serve-import2 self-serve-manage-alerts self-serve-onboarding10 self-serve-onboarding14 self-serve-tuning-flow01 separate-helix-pinot-controller server-startup-check serverSideChanges set_aggregation_epoch_time simulate_period_and_monitoring_window split-inclause star-tree-benchmark star-tree-pr start-batch-upload startreeViewer startup-perf-optimizations stop_validation_if_stop string_fwd subm support_no_dictionary synchronousBackfill tdigest-support te-sphynx-docs01 te_anomaly_default_view te_anomaly_fn_change_params te_anomaly_fn_update te_auth_ui_fix te_endpoints_update te_relevance_endpoints teradata_api thirdeye-temp thirdeye-ui thread-safe-fwd-index time-pruner-fix time-rollup-pr tokenbucket trackIndexWork use-dataframe-in-thirdeye-response use-total-broker-count valueInPql virtualColFix
Nothing to show
Find file History

README.md

Introduction to ThirdEye

Build Status license

ThirdEye is an integrated tool for realtime monitoring of time series and interactive root-cause analysis. It enables anyone inside an organization to collaborate on effective identification and analysis of deviations in business and system metrics. ThirdEye supports the entire workflow from anomaly detection, over root-cause analysis, to issue resolution and post-mortem reporting.

What is it for? (key features)

Online monitoring and analysis of business and system metrics from multiple data sources. ThirdEye comes batteries included for both detection and analysis use cases. It aims to minimize the Mean-Time-To-Detection (MTTD) and Mean-Time-To-Recovery (MTTR) of production issues. ThirdEye improves its detection and analysis performance over time from incremental user feedback.

Detection

  • Detection toolkit based on business rules and exponential smoothing
  • Realtime monitoring of high-dimensional time series
  • Native support for seasonality and permanent change points in time series
  • Email alerts with 1-click feedback for automated tuning of detection algorithms

Root-Cause Analysis

  • Collaborative root-cause analysis dashboards
  • Interactive slice-and-dice of data, correlation analysis, and event identification
  • Reporting and archiving tools for anomalies and analyses
  • Knowledge graph construction over time from user feedback

Integration

  • Connectors for continuous time series data from Pinot and CSV
  • Connectors for discrete event data sources, such as holidays from Google calendar
  • Plugin support for detection and analysis components

What isn't it? (limitations)

ThirdEye maintains a dedicated meta-data store to capture data sources, anomalies, and relationships between entities but does not store raw time series data. It relies on systems such as Pinot, RocksDB, and Kafka to obtain both realtime and historic time series data.

ThirdEye does not replace your issue tracker - it integrates with it. ThirdEye supports collaboration but focuses on the data-integration aspect of anomaly detection and root-cause analysis. After all, your organization probably already has a well-oiled issue resolution process that we don't want to disrupt.

ThirdEye is not a generic dashboard builder toolkit. ThirdEye attempts to bring overview data from different sources into one single place on-demand. In-depth data about events, such as A/B experiments and deployments, should be kept in their respective systems. ThirdEye can link to these directly.

Quick start

ThirdEye supports an interactive demo mode for the analysis dashboard. These steps will guide you to get started.

1: Prerequisites

You'll need Java 8+, Maven 3+, and NPM 3.10+

2: Build ThirdEye

git clone https://github.com/linkedin/pinot.git
cd pinot/thirdeye
chmod +x install.sh run-frontend.sh run-backend.sh reset.sh
./install.sh

Note: The build of thirdeye-frontend may take several minutes

3: Run ThirdEye frontend

./run-frontend.sh

4: Start an analysis

Point your favorite browser to

http://localhost:1426/app/#/rootcause?metricId=1

Note: ThirdEye in demo mode will accept any credentials

5: Have fun

Available metrics in demo mode are:

  • business::puchases
  • business::revenue
  • tracking::adImpressions
  • tracking::pageViews

Note: These metrics are regenerated randomly every time you launch ThirdEye in demo mode

6: Shutdown

You can stop the ThirdEye dashboard server anytime by pressing Ctrl + C in the terminal

Start ThirdEye with Pinot

0: Prerequisites

Run through the Quick Start guide and shut down the frontend server process.

1: Update the data sources configuration

Insert the connector configuration for Pinot in thirdeye-pinot/config/data-sources/data-sources-config.yml. Your config should look like this:

dataSourceConfigs:
  - className: com.linkedin.thirdeye.datasource.pinot.PinotThirdEyeDataSource
    properties:
        zookeeperUrl: 'myZkCluster.myDomain:12913/pinot-cluster'
        clusterName: 'myDemoCluster'
        controllerConnectionScheme: 'https'
        controllerHost: 'myPinotController.myDomain'
        controllerPort: 10611
        cacheLoaderClassName: com.linkedin.thirdeye.datasource.pinot.PinotControllerResponseCacheLoader
    metadataSourceConfigs:
      - className: com.linkedin.thirdeye.auto.onboard.AutoOnboardPinotMetadataSource

  - className: com.linkedin.thirdeye.datasource.mock.MockThirdEyeDataSource
    ...

Note: You'll have to change the host names and port numbers according to your setup

2: Enable Pinot auto-onboarding

Update the thirdeye-pinot/config/detector.yml file to enable auto onboarding of pinot data sets.

autoload: true

3: Run the backend worker to load all supported Pinot data sets

./run-backend.sh

Note: This process may take some time. The worker process will print log messages for each data set schema being processed. Schemas must contain a timeFieldSpec in order for ThirdEye to onboard it automatically

4: Stop the backend worker

By pressing Ctrl-C in the terminal

5: Run ThirdEye frontend

./run-frontend.sh

4: Start an analysis

Point your favorite browser to

http://localhost:1426/app/#/rootcause

and type any data set or metric name (fragment) in the search box. Auto-complete will now list the names of matching metrics. Select any metric to start an investigation.

Welcome to ThirdEye

ThirdEye for production settings

ThirdEye relies on a central meta data store to coordinate its workers and frontend processes. The first step towards moving ThirdEye into production should therefore be the setup of a dedicated (MySQL) database instance. You can use the thirdeye-pinot/src/resources/schema/create-schema.sql script to create your tables. Then, update the thirdeye-pinot/config/persistence.yml file with path and credentials. Once you have a dedicated database instance, you can run backend and frontend servers in parallel.

The next step could be the configuration of the holiday auto-loader. The holiday auto loader connects to the Google Calendar API. Once you obtain an API token, place it in thirdeye-pinot/config/holiday-loader-key.json and in thirdeye-pinot/config/detector.yml set holidayEventsLoader: true. Once the backend worker is restarted, it will periodically update the local cache of holiday events for ThirdEye's detection and Root-Cause Analysis components.

More information

More information coming. In the meantime, use your favorite web search engine to search for 'Pinot ThirdEye' articles and blog posts.