Skip to content

Commit

Permalink
[feature] support auto collect metrics by prometheus task (#1342)
Browse files Browse the repository at this point in the history
  • Loading branch information
tomsun28 committed Nov 21, 2023
1 parent 682026f commit f8598d1
Show file tree
Hide file tree
Showing 296 changed files with 2,115 additions and 767 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,7 @@ git pull upstream master
- **[warehouse](https://github.com/dromara/hertzbeat/tree/master/warehouse)** 提供监控数据仓储服务
> 采集指标结果数据管理,数据落盘,查询,计算统计。
- **[alerter](https://github.com/dromara/hertzbeat/tree/master/alerter)** 提供告警服务
> 告警计算触发,监控状态联动,告警配置,告警通知。
> 告警计算触发,任务状态联动,告警配置,告警通知。
- **[web-app](https://github.com/dromara/hertzbeat/tree/master/web-app)** 提供可视化控制台页面
> 监控告警系统可视化控制台前端
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

### Features

* Combines **monitoring, alarm, and notification** features into one platform, and supports monitoring for web service, database, os, middleware, cloud-native, network and more.
* Combines **monitoring, alarm, and notification** features into one platform, and supports monitoring for web service, program, database, cache, os, webserver, middleware, bigdata, cloud-native, network, custom and more.
* Easy to use and agentless, offering full web-based operations for monitoring and alerting with just a few clicks, all at zero learning cost.
* Makes protocols such as `Http, Jmx, Ssh, Snmp, Jdbc` configurable, allowing you to collect any metrics by simply configuring the template `YML` file online. Imagine being able to quickly adapt to a new monitoring type like K8s or Docker simply by configuring online with HertzBeat.
* High performance, supports horizontal expansion of multi-collector clusters, multi-isolated network monitoring and cloud-edge collaboration.
Expand Down
2 changes: 1 addition & 1 deletion README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

### 特点

-**监控+告警+通知** 为一体,支持对应用服务,数据库,操作系统,中间件,云原生,网络等监控阈值告警通知一步到位
-**监控+告警+通知** 为一体,支持对应用服务,应用程序,数据库,缓存,操作系统,大数据,中间件,Web服务器,云原生,网络,自定义等监控阈值告警通知一步到位
- 易用友好,无需 `Agent`,全 `WEB` 页面操作,鼠标点一点就能监控告警,零上手学习成本。
-`Http,Jmx,Ssh,Snmp,Jdbc` 等协议规范可配置化,只需在浏览器配置监控模版 `YML` 就能使用这些协议去自定义采集想要的指标。您相信只需配置下就能立刻适配一款 `K8s``Docker` 等新的监控类型吗?
- 高性能,支持多采集器集群横向扩展,支持多隔离网络监控,云边协同。
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ public class CalculateAlarm {
* The alarm in the process is triggered
* 触发中告警信息
* key - monitorId+alertDefineId 为普通阈值告警 | The alarm is a common threshold alarm
* key - monitorId 为监控状态可用性可达性告警 | Indicates the monitoring status availability reachability alarm
* key - monitorId 为任务状态可用性可达性告警 | Indicates the monitoring status availability reachability alarm
*/
private final Map<String, Alert> triggeredAlertMap;
/**
Expand Down Expand Up @@ -137,7 +137,7 @@ private void calculate(CollectRep.MetricsData metricsData) {
String app = metricsData.getApp();
String metrics = metricsData.getMetrics();
// If the indicator group whose scheduling priority is 0 has the status of collecting response data UN_REACHABLE/UN_CONNECTABLE, the highest severity alarm is generated to monitor the status change
// 先判断调度优先级为0的指标组采集响应数据状态 UN_REACHABLE/UN_CONNECTABLE 则需发最高级别告警进行监控状态变更
// 先判断调度优先级为0的指标组采集响应数据状态 UN_REACHABLE/UN_CONNECTABLE 则需发最高级别告警进行任务状态变更
if (metricsData.getPriority() == 0) {
handlerAvailableMetrics(monitorId, app, metricsData);
}
Expand Down Expand Up @@ -378,7 +378,7 @@ private void handlerAvailableMetrics(long monitorId, String app, CollectRep.Metr
} else {
// Check whether an availability or unreachable alarm is generated before the association monitoring
// and send a clear alarm to clear the monitoring status
// 判断关联监控之前是否有可用性或者不可达告警,发送恢复告警进行监控状态恢复
// 判断关联监控之前是否有可用性或者不可达告警,发送恢复告警进行任务状态恢复
String notResolvedAlertKey = monitorId + CommonConstants.AVAILABILITY;
Alert notResolvedAlert = notRecoveredAlertMap.remove(notResolvedAlertKey);
if (notResolvedAlert != null) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,17 +42,17 @@ public interface AlertDefineBindDao extends JpaRepository<AlertDefineMonitorBind

/**
* Deleting alarms based on monitoring IDs defines monitoring associations
* 根据监控ID删除告警定义监控关联
* 根据监控任务ID删除告警定义监控关联
*
* @param monitorId Monitor Id 监控ID
* @param monitorId Monitor Id 监控任务ID
*/
void deleteAlertDefineMonitorBindsByMonitorIdEquals(Long monitorId);

/**
* Delete alarm definition monitoring association based on monitoring ID list
* 根据监控ID列表删除告警定义监控关联
* 根据监控任务ID列表删除告警定义监控关联
*
* @param monitorIds Monitoring ID List 监控ID列表
* @param monitorIds Monitoring ID List 监控任务ID列表
*/
void deleteAlertDefineMonitorBindsByMonitorIdIn(Set<Long> monitorIds);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,8 @@ public interface AlertDefineDao extends JpaRepository<AlertDefine, Long>, JpaSpe

/**
* Query the alarm definition list associated with the monitoring ID
* 根据监控ID查询与之关联的告警定义列表
* @param monitorId 监控ID
* 根据监控任务ID查询与之关联的告警定义列表
* @param monitorId 监控任务ID
* @param app 监控类型
* @param metrics 指标组
* @return Alarm Definition List | 告警定义列表
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,16 +34,16 @@
public interface AlertMonitorDao extends JpaRepository<Monitor, Long>, JpaSpecificationExecutor<Monitor> {

/**
* Query the monitoring status of a specified monitoring state | 查询指定监控状态的监控
* @param status 监控状态
* Query the monitoring status of a specified monitoring state | 查询指定任务状态的监控
* @param status 任务状态
* @return Monitor the list | 监控列表
*/
List<Monitor> findMonitorsByStatusIn(List<Byte> status);


/**
* Query the monitoring status of a specified monitoring state | 查询指定监控状态的监控
* @param status 监控状态
* Query the monitoring status of a specified monitoring state | 查询指定任务状态的监控
* @param status 任务状态
* @return Monitor the list | 监控列表
*/
List<Monitor> findMonitorsByStatus(Byte status);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ public interface AlertDefineService {
/**
* Obtain alarm definition information
* 获取告警定义信息
* @param alertId Monitor the ID | 监控ID
* @param alertId Monitor the ID | 监控任务ID
* @return AlertDefine
* @throws RuntimeException An exception was thrown during the query | 查询过程中异常抛出
*/
Expand Down Expand Up @@ -100,8 +100,8 @@ public interface AlertDefineService {

/**
* Query the alarm definitions that match the specified indicator group associated with the monitoring ID
* 查询与此监控ID关联的指定指标组匹配的告警定义
* @param monitorId Monitor the ID | 监控ID
* 查询与此监控任务ID关联的指定指标组匹配的告警定义
* @param monitorId Monitor the ID | 监控任务ID
* @param app Monitoring type | 监控类型
* @param metrics Index group | 指标组
* @return field - define[]
Expand All @@ -110,8 +110,8 @@ public interface AlertDefineService {

/**
* Query the alarm definitions that match the specified indicator group associated with the monitoring ID
* 查询与此监控ID关联的可用性告警定义
* @param monitorId Monitor the ID | 监控ID
* 查询与此监控任务ID关联的可用性告警定义
* @param monitorId Monitor the ID | 监控任务ID
* @param app Monitoring type | 监控类型
* @param metrics Index group | 指标组
* @return field - define[]
Expand Down
6 changes: 3 additions & 3 deletions alerter/src/main/resources/alerter_zh_CN.properties
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@
# See the License for the specific language governing permissions and
# limitations under the License.

alerter.availability.recover = 可用性告警恢复通知, 监控状态已恢复正常
alerter.availability.recover = 可用性告警恢复通知, 任务状态已恢复正常
alerter.alarm.recover = 告警恢复通知
alerter.notify.title = HertzBeat告警通知
alerter.notify.target = 告警目标对象
alerter.notify.monitorId = 所属监控ID
alerter.notify.monitorName = 所属监控名称
alerter.notify.monitorId = 所属监控任务ID
alerter.notify.monitorName = 所属任务名称
alerter.notify.priority = 告警级别
alerter.notify.triggerTime = 告警触发时间
alerter.notify.times = 告警触发次数
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,7 @@
import org.dromara.hertzbeat.common.entity.message.CollectRep;

/**
* Specific indicator group collection implementation abstract class
* 具体的指标组采集实现抽象类
* Specific metrics group collection implementation abstract class
*
* @author tomsun28
*
Expand All @@ -32,12 +31,11 @@ public abstract class AbstractCollect {

/**
* Real acquisition implementation interface
* 真正的采集实现接口
*
* @param builder response builder
* @param appId App monitoring ID 应用监控ID
* @param app Application Type 应用类型
* @param metrics Metric group configuration 指标组配置
* @param appId App monitoring ID
* @param app Application Type
* @param metrics Metric group configuration
* return response builder
*/
public abstract void collect(CollectRep.MetricsData.Builder builder, long appId, String app, Metrics metrics);
Expand Down
Loading

0 comments on commit f8598d1

Please sign in to comment.