Skip to content

Releases: chaos-genius/chaos_genius

chaos-genius v0.5.1-alpha

22 Mar 13:27
v0.5.1
bfe7b38
Compare
Choose a tag to compare

🎉 Release Notes for Chaos Genius 0.5.1

We are doing a minor release to tackle a few issues you have raised.

Bug Fixes 🐛

  • Fix the routing in the onboarding flow
  • Correct the message in the onboarding flow 📝
  • Fix styling the charts 📈 where y-axis labels are being cut

Improvement ✨

  • Added the sidebar link for Joining the Slack Community 👥
  • Added the Add KPI FAQs in the onboarding screen 📄

To upgrade your CG instance, follow the commands here.

chaos-genius v0.5.0-alpha

15 Mar 20:07
v0.5.0
d64d025
Compare
Choose a tag to compare

Release Notes for Chaos Genius 0.5.0

Hi everyone! We’re excited to announce the release of Chaos Genius 0.5.0 with some highly requested features since the start along with more bug squashing.

Some of the key highlights for CG 0.5.0:

  1. Hourly Anomaly Detection and Alerting
  2. Event Alerts: understand and get alerted on changes in data.
  3. Druid support (Experimental).
  4. We have also squashed a bunch of bugs.

To upgrade your CG instance, follow the commands here.

A big thanks to @playsimple, @fampay-tech, @KShivendu, @coindcx-gh, @GRANTOSMO, @athul-osmo, @rsohlot

🎉 New Features

Hourly Anomaly Detection and Alerting

  • Since many users are running Chaos Genius on live data, they need faster anomaly detection and alerting, so we've added support to run anomaly detection and alerting at an hourly level.
  • Now users have the ability to run Anomaly Detection on KPIs every hour and observe real-time changes to their data.
  • Users can get hourly updates on slack or email by setting up hourly alerts for their KPIs.

image
image

Event Alerts

  • You can now set up alerts to monitor various events in your data.
  • Alerts can be set up for the following events:
    • If you have a new entry added to your data.
    • If there is any change (addition or deletion) to your data.
    • If you have missing data
  • You can set the alert frequency to either daily or hourly.

image

Druid support (Experimental)

  • Druid is an open-source data store which can run super fast queries on data and provides data ingestion and fast data aggregation.
  • We currently only support the sum and count aggregations with druid but we’ll be adding mean and unique soon, so keep on the lookout for that.
  • The supported authorizations in druid are anonymous (without username, password) and basic auth

Bug fixes

  • Back-filling of missing data not occurring at the edges for anomaly #730
  • Sometimes alerts are not triggered even though anomalies exist in sub-dimensions #743
  • Expected range in alerts can be confusing when negative numbers are involved #725
  • Fixed search bug across multiple pages #826
  • Fixed bug which didn't accept test query payload #823

chaos-genius v0.4.1-alpha

23 Feb 14:30
04cfe2e
Compare
Choose a tag to compare

🐛 Release Notes for Chaos Genius 0.4.1

We are releasing a small release to tackle all the issues you raised. The issues tackled include:

  • Add hyperlink for troubleshooting URL in add KPI with embedded video
  • Fixed %Inf to - in KPI Home
  • Remove global loader from Add KPI form and change the text in the dropdown - no options to loading
  • Incorrect yAxis label formatting when value below 1000 in waterfall charts
  • 1 dependabot security vulnerability fix - other cannot be fixed
  • Slack alerts where config is not setup

chaos-genius v0.4.0-alpha

17 Feb 03:47
v0.4.0
662f14a
Compare
Choose a tag to compare

Release Notes for Chaos Genius 0.4.0 (Public Release)

  1. What's New
  2. New Features
  3. Bug Fixes

✨ What's New?

Hi everyone, we hope 2022 is off to a great start. With Chaos Genius 0.4.0 - we have a BIG announcement to make. After spending weeks working with you all working on making the product - easy to use and stable, we're finally opening up our repos for public access !! Don't forget to star us 🌟

A BIG thank you to each one of you for your excitement about the product and diligent feedback. And we are grateful to @KShivendu, @danielefrigo, @coindcx-gh, @playsimple & @fampay-tech for the feedback for this release 🎶 🙌

Some of the key highlights for CG 0.4.0:

  1. Simpler and faster install: 70% lesser storage, 50% faster installation
  2. DeepDrills: New time cuts addition (WoW, MoM, QoQ)
  3. Daily Alerts Report
  4. Alerts Dashboard
  5. Timezone support
  6. We also fixed a bunch of bugs.

To upgrade your CG instance, follow the commands here.

🎉 New Features

Faster & Simpler Install

  • We are adding a default installation setup that is lighter, faster and works with fewer resources.
  • Decreased storage requirements by over 5.1 GB (70% of earlier storage)
  • Faster install time - which is on average halved from previous versions
  • (Optional) - all 3rd party SaaS connectors have been made optional and can be accessed now by docker-compose.thirdparty.yml

Deep Drills

  • We introduced new time cuts for DeepDrills like WoW, WTD, MoM, MTD, QoQ, QTD to reflect business needs - these can be also be configured from the env variables.
  • We also improved UI for Deepdrills to make the data more actionable and intuitive.
  • Thanks for the feedback (Dushyant, Anmol, Shree)

Daily Alerts Report

  • As many of our users are monitoring 1000s of KPIs (incl. sub-dimensions) - this can lead to multiple emails being triggered for each individual alert
  • For ease of use and to make these alerts more actionable, you can now select to get these alerts in a Daily Alerts Digest
  • The Alerts are now also presented in natural language (Thanks @KShivendu for all the amazing feedback on Alerts)

image

Alerts Dashboard

  • As many of our users are monitoring 1000s of KPIs (incl. sub-dimensions) - this can lead to a large number of alerts and can cause alert fatigue
  • To manage a large number of alerts better, we've enabled a Alerts Dashboard where you can access all your alerts for the last 7 days. You can filter by Alert Configurations, Recipient Email, KPI and Dates
  • For more details you can jump into the Alerts Dashboard available at api/digest or via the Alerts screen by clicking the header "Alerts"

Timezone Support

  • Can set up a reporting time zone for metrics now on which the results are displayed.

  • Currently our analytics & alerts scheduler runs on server time - which can often create confusion if the server timezone is different from the reporting timezone. To tackle this issue, we now also display the timezone for all scheduler settings.

  • We are in the process of adding new features to enhance timezone including native handling of timezone aware data.

    🐛 Bug fixes

  • DQ Anomaly Metrics should not be displayed when we do count aggregation on a categorical column #575

  • Fix the sorting logic in the KPI and Dashboard #590

  • Add a modal popup after successful addition of datasources #616

  • Add a Loader in Add KPI Screen while selecting dropdowns #612

  • Human Readable numbers for Chart axis labels and values, Chart tooltip format uniformity #582

  • Fix severity computation robustness for anomalies for ML & Stats models #535

  • Duplicate entry for the last date in db while running Anomaly Detection daily #451

  • Anomaly Charts highlight entire time series as anomaly #417

  • Subtitles for Time input field #696

  • TimeZone Changes for Deepdrills graph and Anomaly #695

  • Flickering in hierarchial charts(grid) #688

  • Inconsistent time series for panel & anomalies for hourly data #678

  • Alert digests & subdimensional anomaly do not have consistent anomalies #677

  • Alerts toasts getting Retriggered on navigating back to the alerts page #699

  • Third-party datasources do not sync when chaosgenius-db port is not exposed #720

  • Enable all time cuts by default in DeepDrills #721

  • KPI is not added to non-default dashboards that are selected in Create KPI Page. #729

  • NaN output for Mean aggregations in RCA when one of the groups is empty #731

chaos-genius v0.3.0-alpha

07 Jan 15:19
ad6759f
Compare
Choose a tag to compare

Release Notes for Chaos Genius 0.3.0

  • What's New
  • New Features
  • Bug Fixes

✨ What's New?

A very happy new year to everyone in the Chaos Genius Community! We are bringing another big upgrade to the Chaos Genius experience based on the love and feedback we've received from you!! ❤️

Thank you - @mvaerle, @omriAl, @KrishnaSistla, @gxu-kangaroo, @adwate, @coindcx-gh, @davidhayter-karhoo, @csoni111!

With Chaos Genius 0.3.0, you can now scale Chaos Genius across multiple teams in the organization with our Dashboards feature. You can get even more powerful insights with Anomaly Detection supported at a sub-population level. In the task manager, you can now also view the errors occurring in your analytics - so no more going through the logs :) We also fixed a bunch of bugs.

To upgrade your CG instance, follow the commands here.

We look forward to continue building with all the support from our Community! Thank you and wish you all and your families a safe and stellar 2022! 🎈

🎉 New Features

Anomaly detection at a sub-group level

So far we've only supported anomaly detection at an Overall KPI level. Based on community requests (thanks @gxu-kangaroo , @coindcx-gh), we now run anomaly detection and alerting for anomalies in KPIs at the top 250 sub-population groups (this is configurable). See below a quick snapshot of the anomaly alert:

  • Subdimensional email and slack alerts #516
  • feat(anomaly): add subdim level anomaly #512

Teams Dashboards (EE Feature)

We launched our first EE feature by allowing users to create separate dashboards for different groups of KPIs. This can be used to segregate analytics across different teams, customers, or any other groups.

Using this, customers are already using Chaos Genius to live monitor over 1000s of KPIs daily. Something which was not possible using the traditional BI tools!

This is an EE feature. So feel free to reach out to founders@chaosgenius.io for pricing & other details.

  • feat: dashboard functionality along with some fixes #520

Detailed Error Reporting in Task Status - no more looking at Logs to debug.

We have tried to reduce the need to go through the logs by allowing users to find the status of their analytics run directly from the Chaos Genius portal. You can now also see the Error that's occurring and debug yourself or share the screenshot with the Chaos Genius team.

  • Store task & subtask status and create a view for it for streamlined troubleshooting #459

Analytics now can run on data for (t-1)

As a default, we run analytics at a t-2 offset from the current day to allow for data pipelines to complete & the data to be backfilled for the day.

We now allow this to be set by changing DAYS_OFFSET_FOR_ANALTYICS to 1. As a default, it is set to 2.

  • feat: updating t-2 to t-(k-1) for analytics #523

Define KPIs without any dimensions

Earlier it was not possible to define a KPI w/o selecting the dimensions. We have now made that optional. In a KPI w/o defined dimensions, you will not get DeepDrills analysis. In Anomaly you will get the overall KPI anomaly but Drill Downs will not be available.

  • feat(anomaly): added anomaly without subdim support #435

Support for custom schema as a table

Thanks, @mvaerle for the feature suggestion. Earlier to setup a KPI on custom schema only query was an option. We now allow custom (non-public) schema to be added as a table KPI for ease of KPI setup. We support Postgres, Snowflake & Redshift for this.

  • Selecting a custom schema when adding KPIs #359

Better defaults for analytics params

Based on user feedback from @gxu-kangaroo, @davidhayter-karhoo @fampay-tech, we have modified the defaults for these analytics params.

As a default, we would now consider any dimension with up to 1000 cardinality. Also, we now consider the top 250 sub-groups for anomaly detection at a sub-dimensional level. Learn more about these params here.

MAX\_SUBDIM\_CARDINALITY=1000
MAX\_FILTER\_SUBGROUPS\_ANOMALY=250   
  • chore: update config params #567

🐛 Bug Fixes

  • Query for KPI validation & analytics is not of consistent format for time ranges #457
  • [BUG] Large panel values in DeepDrills overflow in to the graph #479
  • Anomaly screen not getting redirected to setting when anomaly is not setup #537
  • Handle fallback screen for 0 KPI #539
  • Drilldown collapse/expand not working for zero dimension #543
  • Form Validation is not working correctly in create dashboard screen #542
  • Subdimensional Anomaly tab shows incorrect fallback before valid data is fetched #544
  • [BUG] "All" dashboard should not be allowed to be deleted #553
  • Edit KPI screen should have the dashboard they are pinned to selected in dropdown menu #540
  • Flask API endpoints shouldn't have the trailing slash #576

chaos-genius v0.2.0-alpha

09 Dec 15:58
2f6758d
Compare
Choose a tag to compare

Release Notes for Chaos Genius 0.2.0

  • What's New
  • New Model(s)
  • New Features
  • Bug Fixes

✨ What's New?

Thank you all for the feedback on Chaos Genius 0.1.3. We're releasing Chaos Genius 0.2.0 today.

Our main focus for the upgrade was on covering edge cases for making DeepDrills and Anomaly Detection work on as varied datasets as possible, adding Task Status monitoring to enable users to detect if any analytics is failing and other bug fixes.

Key highlights being:

  • Detailed status tracking for analytics for faster detection & debugging (cc: @bouke-nederstigt , @gxu-kangaroo, @davidhayter-karhoo, @mvaerle)
  • Configurations for edge cases like older data sets, smaller data sets, enabling KPI definition w/o dimensions etc. (cc: @davidhayter-karhoo)
  • DeepDrills handling for missing data, NULL/NaN values (cc: @davidhayter-karhoo)
  • New Anomaly Detection model - EWMA
  • Error & Analytics - Config for enabling Sentry & PostHog for Error handling & Analytics (cc: @coindcx-gh)
  • Improved Alerting logic
  • Bug Fixes
    • Data Sources not showing on installation (cc: @omriAl, @nsankar)
    • Other Bug Fixes

We're happy to inform you that we've reached a Community Size of 50 with teams from 10 different time zones in such a short period of time! We look forward to working closely with all of you to support your use cases before we open up to the Public.

🧮 New Model(s)

We added a new model for Anomaly Detection - EWMA. Exponentially Weighted Moving Average (EWMA) is a statistic that averages the data in a way that gives less and less weight to data as they are further removed in time. EWMA is better suited for cases where the data is largely static and then can have sudden state change.

  • feat(anomaly): add EWMA Model (#428)

🎉 New Features

Task and status observability on your Analytics

There are various unique reasons which can sometimes lead to analytics failing - e.g. Database access/authorization error, network error, incomplete data. While we are covering as many edge cases as possible, adding a Task Status is our first step towards faster incident detection. We be adding more features to it including exact errors & diagnoses when the analytics fails. The task status on local installation should be available at http://127.0.0.1:8080/api/status/

image

  • Store task & subtask status and create a view for it for streamlined troubleshooting (#459)
  • Observable tasks deepdrills (#446)

Error handling and user analytics to give better support (sentry, posthog)

In order to identify the error sooner, you can now configure your Sentry account by updating the parameter SENTRY_DSN in docker-compose.yml. We can also provide you our Sentry token so we can closely monitor any issues you maybe facing.

We've also added Posthog - an open-source analytics tool, to capture user activity to help us better inform the product roadmap as we open up our repos for public access. We enabled an option for anonymizing the data before sharing. It is also possible to disable Posthog.

  • Init the sentry integration (#357)
  • Posthog user identification & redirection (#462)

More dataset configurations/missing data support

In the previous versions, there were analytics failures in cases where there was no data for the past 5 days. We call this 'Slack length'. We've made this value configurable (MAX_DEEPDRILLS_SLACK_DAYS and MAX_ANOMALY_SLACK_DAYS) in the docker-compose.yml and updated the default to 14 days. This parameter helps us to perform anomaly detection on the latest data for the most accurate results.

In our previous versions, we also required users to select dimensions as a mandatory field. We've now made this optional. You need to specify dimensions only if you need sub-dimensional insights.

  • Make slack configurable for DeepDrills and Anomaly (#434)
  • Remove the mandatory option for the dimension (#445)

Robust DeepDrills for missing data & errors

Our first implementation of DeepDrills required complete datasets with last 60 days of data to run successfully. We've enhanced DeepDrills to be more granular in order to work with incomplete data sets & handle missing data.

  • Handle DeepDrills analytics failures gracefully with partial analytics in case of subtask errors (#458)
  • Account for NaN & NULL values in DeepDrill analysis (#437)

Improved alerting logic

We've enhanced our alert logic to instantly trigger alert once an anomaly is detected. We've also made a few improvements in the alert format. We'll continue to build out the alerting functionality in our future releases.

  • Update the anomaly alert implementation (#467)
  • Change the email format for more clarity (#477)

Improved analytics indexing

We have optimized our indexes to provide faster drill-downs for large KPIs & dimensions.

  • Add the analytics data index (#461)

🐛 Bug Fixes

  • Handle KPI queries with trailing semicolon for KPI validation & analytics (#429)
  • Validate the duplicate column in the result dataset of query defined KPI (#441)
  • Snowflake connector mentions setting up with a hostname, where the hostname is actually not required (#438) (cc: @joshuataylor)
  • Metric columns having NaN's in first 10 or higher rows fails KPI Validation (#444)
  • Validation for the dimension column in the add KPI screen (#450)
  • DeepDrills fails for KPI with no dimensions defined (#468)
  • Handle empty data in comparison dataframe for mean aggregation in DeepDrills (#494)

chaos-genius v0.1.3-alpha

18 Nov 12:48
7b9b2c3
Compare
Choose a tag to compare

Release Notes for Chaos Genius 0.1.3

  • What's New
  • New Connector(s)
  • New Features
  • Bug Fixes

✨ What's New?

We're excited to announce the release of new and improved Chaos Genius 0.1.3. We want to sincerely thank our early users for their feedback (shout to @gxu-kangaroo @davidhayter-karhoo, @mvaerle, @nitsujri, @miike, @coindcx-gh) and to all our contributors for their relentless effort towards improving Chaos Genius.

Chaos Genius 0.1.3 is focused on improving the onboarding process and improving compatibility for large datasets and varied sub-population types.

Key highlights being:

  • Amazon Redshift Integration
  • Global Configuration to support handling large datasets (aggregated views upto 10M rows) and varied sub-populations (1-250 subgroups)
  • Optimized data fetching for large datasets
  • Improving Anomaly Detection via handling missing data points in time series, higher number of drill-downs, higher cardinality support (1000+) and enhancements in Anomaly Detector Configuration
  • DeepDrills bug fixes
  • Improved logging
  • Other bug fixes

🔌 New Connector(s)

With the 0.1.3 release, Chaos Genius now supports Amazon Redshift as a data source. With this Chaos Genius now works with the 3 major data warehouses - Snowflake, BigQuery and Amazon Redshift.

Please find the documentation for Redshift here.

We will soon release public data sets on Redshift for our community to test out!

  • Add the redshift connector (#348)

🎉 New Features

Global Configuration to support large datasets & varied sub-group characteristics

Using a global configuration setting, Chaos Genius can now enable support for aggregated views upto 10M rows and varied sub-group characteristics (1-250+ subgroups). This will enable config control over the statistical filtering calculations that are carried out while running both DeepDrills and Anomaly Detection at a sub-group level.

Chaos Genius team will be happy to help you set up the configuration.

  • Fine Grained control on Anomaly Detection for different series_type (#324)
  • Add support for subgroup calculation global config in anomaly detection core (#341)
  • Make population calculation & statistical filtering parameters globally configurable (#340)
  • Make population calculation & statistical filtering parameters globally configurable (#340)

Anomaly Detection Enhancements

Missing Data in Time Series

Handling missing data points in time-series analysis is a hairy problem. Chaos Genius 0.1.3 now handles missing data points as zero while plotting the time-series graphs and identifying anomalies. We will continue to invest more deeply on this going forward by adding missing data alerts which might get undetected in certain algorithms.

  • Handle completely missing data in time series as zero (#367)

Higher Cardinality support for Dimensions definition

We've further optimized subgroup time series creation to handle higher cardinality dimensions. We now support dimensions with 1000+ cardinality. Earlier large cardinality dimensions were excluded from the analysis. We'll continue to optimize it further over upcoming releases.

  • Refactor anomaly detection subgroup detection to handle higher cardinality (#350)

Higher Number of Drill-downs in Anomaly Detection

While investigating Anomalies via Drill Downs, Chaos Genius now gives 10 most relevant sub-groups sorted by relevance (mix of anomaly severity & sub-group population) - this number is also configurable. We also upgraded the algorithms used to create these sub-groups.

Going forward, we will enable this by configuration and also enable multi-dimensional drill downs as you detect the top drivers causing anomalies in your time-series.

  • Enable support for a higher number of drill downs (#319)
  • Create new algorithm for subgroup list generation (#351)

Support for Multivariate Subdimensional groups

Chaos Genius now supports the ability to detect anomalies on multivariate subdimensional groups that are mutually exclusive. All possible permutations for selected dimensions are selected, statistically filtered based on population characteristics - anomaly detection & drill downs are now available for these as an option.

We'll continue investing in subdimensional anomaly detection including clustering & grouping for subdimensions that behave alike.

  • Configurable sub-dimension settings for anomaly detection (#349)

Improved UX for Anomaly Settings for hourly time-series

Chaos Genius now offers improved UX for setting anomaly detection configuration for hourly time-series.

You can now specify the historical data by number of days instead of units of frequency of the time-series - e.g. 7 days instead of 168 hours if you need to train hourly time-series data for last week :)

  • Set anomaly period's value in days, irrespective of frequency (#336)

Optimized Data Fetching for Large Datasets

In the current release, we've added optimization for fetching large datasets by adding chunk size specifications. Data is fetched in chunks (currently param is set to 50,000) and then merged into a single dataframe.

  • Benchmark & enable chunk size for pandas data fetching (#332)

Enhanced Logging

We're working extensively to improve the logging for Chaos Genius. In the current release, we've centralized the logging, added an option for Fluentd logs for persistence and now also include data params in the logs in order to identify edge cases where the analytics might be failing to run.

  • Centralized logging and spawning of loggers throughout the flask app (#313)
  • Fluentd for persistence
  • Data params passed in logs for easier replication of edge case issues

In subsequent releases, we plan to enable the status of all the tasks.

Other enhancements

  • Added nginx based front-end deployment
    • Update docker-compose for 0.1.3 release (#419)
  • Global configuration for multidimensional drill-downs
    • Make multidimensional drill down to be configurable for DeepDrills (#369)
  • Improved Error Message copy in UI
    • fix: Update the error messages, disable event kpi alert fix, anomaly setting fixes (#321)
    • Error message & integer type changed (#284)

🐛 Bug Fixes

  • DeepDrill UI fixes
    • Count & size columns in the DeepDrills table are swapped (#310)
  • Anomaly interface fixes
    • Anomaly drill down graphs only display integer values (#306)
    • Changes in the Edit Anomaly Settings (#311)
    • Make analytics charts more descriptive and consistent. (#372)
  • Snowflake metadata ingestion issue raised by Grant Xu
    • Using Snowflake timestamp when casted can create issues while adding KPI (#320)
  • Handle when KPI only has 1 subgroup
    • UnboundLocalError when a KPI has only one subgroup (#342)
  • Handle edge cases data with multiple frequencies
  • Fix the edit functionality for data sources
    • Data Source isn't being updated properly (#308)
  • Other UI fixes
    • Modified timestamp isn't coming in the alert (#309)
  • Fixes to improve handling of missing or incomplete data
    • RCA Infinite Loop when KPI is query based (#344)
    • Data Padding causes issue with anomaly detection values (#353)
    • RCA saving fails if there is a NaN value present (#347)
  • Fixes to handle anomaly & DeepDrill edge cases
    • Incorrect confidence intervals for anomaly detection after the first training session (#388)
    • Inconsistent analytics occurs between hourly panel metrics, DeepDrill & anomaly data (#390)
    • Wrong Start Dates for Anomaly (#399)
    • Anomaly training not till expected end date (#400)
    • Missing data point in DeepDrills when we have missing data (#413)
    • Inconsistent Last Updated between Anomaly and Deepdrills (#411)
  • Validation sequence logic & platform update for KPI addition
    • Adding KPI with incorrect columns does not produce the correct errors (#391)
    • SQL error while adding snowflake KPI (#397)
    • Truncated error output for out of bounds error in KPI validation (#398)
    • KPI validation for datetime column does not work (#405)

chaos-genius v0.1.2-alpha

11 Oct 16:20
71385e1
Compare
Choose a tag to compare

The first public release for the alpha version 🔮

This release includes:

  • Data source connections for database, data warehouses & third party sources
  • KPI creation with multiple dimensions for analysis
  • DeepDrills across the data with multidimensional waterfalls
  • Anomaly detection along with multidimensional drill-downs & data quality checks
  • Alerting based on the severity threshold on email and Slack