Skip to content

WIP: Support hot update of MAL#7039

Closed
dmsolr wants to merge 1 commit intoapache:masterfrom
dmsolr:mal-hot-update
Closed

WIP: Support hot update of MAL#7039
dmsolr wants to merge 1 commit intoapache:masterfrom
dmsolr:mal-hot-update

Conversation

@dmsolr
Copy link
Member

@dmsolr dmsolr commented May 30, 2021

  • If this pull request closes/resolves/fixes an existing issue, replace the issue number. Closes #.
  • Update the CHANGES log.

#6897

}

// todo javadoc
public void remove(ModuleManager manager, String metricName, Class<? extends Metrics> metricsClass) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip, we need to check whether the worker would be GC successfully to avoid potential OOM risk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, once this gets removed, we don't touch the storage, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is.

@wu-sheng
Copy link
Member

Let's be clear, in this PR, we focus on MAL update through physical file changing, right? We need to update the doc to indicate this.

@wu-sheng
Copy link
Member

@dmsolr Any update about this?

@dmsolr
Copy link
Member Author

dmsolr commented Jun 28, 2021

SkyWalking Metrics data comes from MeterSystem and OALEngine. The source of MeterSystem includes Receiver and Fetcher.

we had many receivers includes Zabbix, OpenTelmetry, JVM, and Meter. ..

Currently, we have had the Prometheus Fetcher which scraps metrics sampling from the Prometheus exporter. It could convert Prometheus Metrics to MeterSystem's Sample by MAL rules.

image

There are 3 types of operators, ADD, UPDATE and DELETE.

  1. For ADD a new metrics, not conflict
  2. For DELETE, there are 2 options, 1. None operator, 2. Always drop table.
  3. For UPDATE, For now, we provided 3 options(Option c as default).
    1. Always keep old data(table)
    2. Always discard old data(table)
    3. Interrupted when table updated

Above description is not suitation for ElasticSearch. In RMDBS(mysql/postgresql), tidb and influxdb. A metrics represent a model, and a table(measuremnt in influxdb). And the table name is the mode name. So, for update,

  1. Rename table name after a metrics rule udpated for keeping old data in DB.
  2. Drop the table before creating a new metrics writing for discard old data.
  3. None operation without the table structure changed. Otherwise, an exception is thrown for interrupting the OAP.

In ElasticSearch(after #6499), have many difference with other storage plugins. SW has put all metrics into one index. And identified metrics data by the function name and mode name.

  1. For ADD a new metrics, not conflict also.
  2. For DELETE a metrics, 1. Waiting for TTL to clean up. 2. Execution a query for delete.
  3. For UPDATE,
    1. Waiting for TTL to sweep.
    2. Execution delete statement.

In ElasticSearch case, there is no error cause by MAL updated.

@wu-sheng
Copy link
Member

Interrupted when table updated

Does this mean OAP exiting?

@dmsolr
Copy link
Member Author

dmsolr commented Jun 28, 2021

Yes. If OAP has had the same name rule exist and its expression different, we call it update.

@wu-sheng
Copy link
Member

Yes. If OAP has had the same name rule exist and its expression different, we call it update.

This could happen from time to time in the debug mode, if this keeps exiting, I think this feature is going to useless.

@wu-sheng
Copy link
Member

Yes. If OAP has had the same name rule exist and its expression different, we call it update.

This could happen from time to time in the debug mode, if this keeps exiting, I think this feature is going to useless.

It is better to disable it if you have any concerns. But don't run in exit.

@dmsolr
Copy link
Member Author

dmsolr commented Jun 28, 2021

Yes. If OAP has had the same name rule exist and its expression different, we call it update.

This could happen from time to time in the debug mode, if this keeps exiting, I think this feature is going to useless.

It is better to disable it if you have any concerns. But don't run in exit.

We talk about non-elasticsearch case, right? (ES should not interrupt because MAL updated)

I got your point and agreed. That means do not interrupt OAP server. So, rename the table. WDYT?
To rename the table means users have to drop these old tables( because these unmanaged in TTL).

@wu-sheng
Copy link
Member

Yes. If OAP has had the same name rule exist and its expression different, we call it update.

This could happen from time to time in the debug mode, if this keeps exiting, I think this feature is going to useless.

It is better to disable it if you have any concerns. But don't run in exit.

We talk about non-elasticsearch case, right? (ES should not interrupt because MAL updated)

I got your point and agreed. That means do not interrupt OAP server. So, rename the table. WDYT?

To rename the table means users have to drop these old tables( because these unmanaged in TTL).

I think directly dropping table is fine. We could disable hot update in default.

@dmsolr
Copy link
Member Author

dmsolr commented Aug 11, 2021

The PR still takes a long time to complete, so I closed it first before it is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants