Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] agent mem pprof看不到符号表 #6068

Open
2 of 3 tasks
qyzhaoxun opened this issue Apr 10, 2024 · 12 comments
Open
2 of 3 tasks

[BUG] agent mem pprof看不到符号表 #6068

qyzhaoxun opened this issue Apr 10, 2024 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@qyzhaoxun
Copy link

Search before asking

  • I had searched in the issues and found no similar feature requirement.

DeepFlow Component

Agent

What you expected to happen

希望能通过heap pprof文件知道具体内存占用情况

How to reproduce

  1. 基于branch v6.4 cherry-pick 2036076
    最终修改见 https://github.com/qyzhaoxun/deepflow/tree/v6.4
  2. 使用容器编译
docker run --privileged --rm -it -v     $(pwd):/deepflow hub.deepflow.yunshan.net/public/rust-build bash -c     "cd /deepflow/agent && cargo build"
  1. 构建容器镜像
FROM registry.cn-hongkong.aliyuncs.com/deepflow-ce/deepflow-agent:v6.4
RUN rm /usr/bin/deepflow-agent
ADD ./deepflow-agent.tgz /usr/bin/
  1. 获取heap pprof文件并生成svg
    profile

DeepFlow version

No response

DeepFlow agent list

v6.4
agent使用standalone模式启动

Kubernetes CNI

不涉及

Operation-System/Kernel version

"Ubuntu 22.04 LTS"
Linux VM-11-12-ubuntu 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@qyzhaoxun qyzhaoxun added the bug Something isn't working label Apr 10, 2024
@qyzhaoxun
Copy link
Author

另外这里提个需求,希望可以把heap pprof做成配置项合并到主干和v6.4,默认不开启,但是有需求的话可以通过配置开启

@yuanchaoa
Copy link
Contributor

@qyzhaoxun 有什么报错么 ? 生成svg的命令也发一下吧

@qyzhaoxun
Copy link
Author

qyzhaoxun commented Apr 10, 2024

jeprof --svg ./deepflow-agent ./agent.profile >profile.svg

没有报错

@yuanchaoa
Copy link
Contributor

1
screenshot-20240411-105000
image

@yuanchaoa
Copy link
Contributor

image

@yuanchaoa
Copy link
Contributor

cargo.toml里开启的debug=true, 编译的agent都是带有符号表的,可以检查下,上面的heap都是用你的分支生成的是可以的;另外容器内运行可能有问题可以试试直接在主机上运行
image

@qyzhaoxun
Copy link
Author

我这里在容器环境运行,有什么需要额外配置的吗? @yuanchaoa

@qyzhaoxun
Copy link
Author

另外这里对编译的命令有要求吗?cargo build --release,这里需要添加--release吗?

@yuanchaoa
Copy link
Contributor

是的 用cargo build --release

@qyzhaoxun
Copy link
Author

qyzhaoxun commented Apr 11, 2024

想问下,如果使用容器这里应该怎么pprof,我这边是采集的容器的heap文件,然后在节点上执行的jeprof @yuanchaoa
我这里用--release方式build,还是找不到对应符号表

@yuanchaoa
Copy link
Contributor

agent内存pprof是通过这个库实现的:https://crates.io/crates/jemalloc_pprof
其中有段说明应该对你有帮助:

image

@sharang
Copy link
Member

sharang commented Apr 19, 2024

@qyzhaoxun 如下 agent 配置可用于降低内存 https://github.com/deepflowio/deepflow/blob/main/server/agent_config/example.yaml

cBPF 采集哪些网卡

## Regular Expression for TAP (Traffic Access Point)
## Length: [0, 65535]
## Default:
##   Localhost:   lo
##   Common NIC:  eth.*|en[osipx].*
##   QEMU VM NIC: tap.*
##   Flannel:     veth.*
##   Calico:      cali.*
##   Cilium:      lxc.*
##   Kube-OVN:    [0-9a-f]+_h$
## Note: Regular expression of NIC name for collecting traffic
#tap_interface_regex: ^(tap.*|cali.*|veth.*|eth.*|en[osipx].*|lxc.*|lo|[0-9a-f]+_h)$

默认也会采集 lo 网卡,如果不需要的话,去掉可降低内存消耗。

cBPF 忽略哪些流量

## Traffic Capture Filter
## Length: [1, 512]
## Note: If not configured, all traffic will be collected. Please
##   refer to BPF syntax: https://biot.com/capstats/bpf.html
#capture_bpf:

如果明确知道有些流量不需要关心,可以配置 bpf 表达式过滤

cBPF 流量采集截断和应用协议解析截断 ⭐️

## Maximum Packet Capture Length
## Unit: bytes. Default: 65535. Range: [128, 65535]
## Note: DPDK environment does not support this configuration.
#capture_packet_size: 65535

## Protocol Identification Maximun Packet Length
## Default: 1024. Bpf Range: [256, 65535], Ebpf Range: [256, 8192]
## Note: The maximum data length used for application protocol identification,
##   note that the effective value is less than or equal to the value of
##   capture_packet_size.
#l7_log_packet_size: 1024

目前我们的应用协议解析最大支持解析 8192 字节,因此这两个配置可以统一为 1024 ~ 8192 之间某个值。降低 capture_packet_size 有助于降低内存。

关闭隧道解析的尝试

## Decapsulation Tunnel Protocols
## Default: [1, 2], means VXLAN and IPIP. Options: 1 (VXLAN), 2 (IPIP), 3 (GRE), 4 (Geneve)
#decap_type:
#- 1
#- 2

有助于降低 CPU 消耗

关闭 X-Forwarded-For、X-Request-ID、TraceID、SpanID 的解析

## HTTP Real Client Key
## Default: X-Forwarded-For.
## Note: It is used to extract the real client IP field in the HTTP header,
##   such as X-Forwarded-For, etc. Leave it empty to disable this feature.
#http_log_proxy_client: X-Forwarded-For

## HTTP X-Request-ID Key
## Default: X-Request-ID
## Note: It is used to extract the fields in the HTTP header that are used
##   to uniquely identify the same request before and after the gateway,
##   such as X-Request-ID, etc. This feature can be turned off by setting
##   it to empty.
#http_log_x_request_id: X-Request-ID

## TraceID Keys
## Default: traceparent, sw8.
## Note: Used to extract the TraceID field in HTTP and RPC headers, supports filling
##   in multiple values separated by commas. This feature can be turned off by
##   setting it to empty.
#http_log_trace_id: traceparent, sw8

## SpanID Keys
## Default: traceparent, sw8.
## Note: Used to extract the SpanID field in HTTP and RPC headers, supports filling
##   in multiple values separated by commas. This feature can be turned off by
##   setting it to empty.
#http_log_span_id: traceparent, sw8

若不关心 l7_flow_log 中的这些字段,可以关闭

降低 cBPF 缓冲区大小 ⭐️

  ###############
  ## AF_PACKET ##
  ###############
  ## AF_PACKET Blocks Switch
  ## Note: When tap_mode != 2, you need to explicitly turn on this switch to
  ##   configure 'afpacket-blocks'.
  #afpacket-blocks-enabled: false

  ## AF_PACKET Blocks
  ## Default: 128, Range: [8, +oo)
  ## Note: deepflow-agent will automatically calculate the number of blocks
  ##   used by AF_PACKET according to max_memory, which can also be specified
  ##   using this configuration item. The size of each block is fixed at 1MB.
  #afpacket-blocks: 128

默认会根据 max-memory 计算一个合适的 afpacket-blocks( agent 日志里能看到),如果还希望降低内存,可以明确配置。一个 block = 1MB。

降低 eBPF 缓冲区大小 ⭐️

    ## eBPF dispatch ring size
    ## Default: 65536. Range: [8192, 131072]
    ## Note: The size of the ring cache queue, The value is 2^n ( n range [13, 17] ).
    ##   If the value is between 2^n and 2^(n+1), it will be automatically adjusted by the ebpf configurator to the minimum value (2^n).
    #ring-size: 65536

可以认为这里的 1 个单位(是一个指针)对应的存储空间最大可能是 l7_log_packet_size 的大小(默认是 1KB)。即默认情况下这里最大会有 64K * 1KB = 64MB 的内存消耗。

其他可以降低数据量的配置

https://deepflow.io/docs/zh/best-practice/reduce-storage-overhead/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants