Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] 无限启动新线程,消耗巨量内存 #393

Open
3 tasks done
Basstorm opened this issue Dec 3, 2023 · 11 comments
Open
3 tasks done

[Bug Report] 无限启动新线程,消耗巨量内存 #393

Basstorm opened this issue Dec 3, 2023 · 11 comments

Comments

@Basstorm
Copy link

Basstorm commented Dec 3, 2023

Checks

  • I have searched the existing issues
  • I have read the documentation
  • Is it your first time sumbitting an issue

Current Behavior

启用后会慢慢无限开启新线程,消耗巨量内存,这是启动1天后的进程status

root@R66S:~# cat /proc/10167/status
Name:   dae-wing
Umask:  0022
State:  S (sleeping)
Tgid:   10167
Ngid:   0
Pid:    10167
PPid:   1
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 2048
Groups:
NStgid: 10167
NSpid:  10167
NSpgid: 1
NSsid:  1
VmPeak:  1649004 kB
VmSize:  1649004 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    169484 kB
VmRSS:    136764 kB
RssAnon:          118888 kB
RssFile:           17876 kB
RssShmem:              0 kB
VmData:   436344 kB
VmStk:       132 kB
VmExe:     25144 kB
VmLib:       720 kB
VmPTE:       904 kB
VmSwap:        0 kB
HugetlbPages:          0 kB
CoreDumping:    0
THP_enabled:    1
Threads:        1774
SigQ:   0/3853
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: fffffffc7fc1feff
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
NoNewPrivs:     0
Seccomp:        0
Seccomp_filters:        0
Speculation_Store_Bypass:       not vulnerable
SpeculationIndirectBranch:      unknown

可以看到Threads已经有1774个了,PID占了非常多
image
image

关联issue:sbwml/luci-app-daed-next#1

Expected Behavior

No response

Steps to Reproduce

No response

Environment

  • Daed version:
root@R66S:~# dae-wing --version
daed-next version 2023-10-25-952b1c9
  • OS (e.g cat /etc/os-release):
immportalwrt 23.05-rc4
  • Kernel (e.g. uname -a):
root@R66S:~# uname -a
Linux R66S 5.15.132 #0 SMP PREEMPT Sun Oct 1 02:21:58 2023 aarch64 GNU/Linux

  • Others:

配置文件:

global {
    ##### Software options.

    # tproxy port to listen on. It is NOT a HTTP/SOCKS port, and is just used by eBPF program.
    # In normal case, you do not need to use it.
    tproxy_port: 12345

    # Set it true to protect tproxy port from unsolicited traffic. Set it false to allow users to use self-managed
    # iptables tproxy rules.
    tproxy_port_protect: true

    # If not zero, traffic sent from dae will be set SO_MARK. It is useful to avoid traffic loop with iptables tproxy
    # rules.
    so_mark_from_dae: 0

    # Log level: error, warn, info, debug, trace.
    log_level: warning

    # Disable waiting for network before pulling subscriptions.
    disable_waiting_network: true


    ##### Interface and kernel options.

    # The LAN interface to bind. Use it if you want to proxy LAN.
    # Multiple interfaces split by ",".
    lan_interface: eth0

    # The WAN interface to bind. Use it if you want to proxy localhost.
    # Multiple interfaces split by ",". Use "auto" to auto detect.
    wan_interface: eth1

    # Automatically configure Linux kernel parameters like ip_forward and send_redirects. Check out
    # https://github.com/daeuniverse/dae/blob/main/docs/en/user-guide/kernel-parameters.md to see what will dae do.
    auto_config_kernel_parameter: true


    ##### Node connectivity check.

    # Host of URL should have both IPv4 and IPv6 if you have double stack in local.
    # First is URL, others are IP addresses if given.
    # Considering traffic consumption, it is recommended to choose a site with anycast IP and less response.
    #tcp_check_url: 'http://cp.cloudflare.com'
    tcp_check_url: 'http://cp.cloudflare.com,1.1.1.1'

    # The HTTP request method to `tcp_check_url`. Use 'HEAD' by default because some server implementations bypass
    # accounting for this kind of traffic.
    tcp_check_http_method: HEAD

    # This DNS will be used to check UDP connectivity of nodes. And if dns_upstream below contains tcp, it also be used to check
    # TCP DNS connectivity of nodes.
    # First is URL, others are IP addresses if given.
    # This DNS should have both IPv4 and IPv6 if you have double stack in local.
    #udp_check_dns: 'dns.google.com:53'
    udp_check_dns: 'dns.google.com:53,8.8.8.8,1.1.1.1'

    check_interval: 30s

    # Group will switch node only when new_latency <= old_latency - tolerance.
    check_tolerance: 50ms


    ##### Connecting options.

    # Optional values of dial_mode are:
    # 1. "ip". Dial proxy using the IP from DNS directly. This allows your ipv4, ipv6 to choose the optimal path
    #       respectively, and makes the IP version requested by the application meet expectations. For example, if you
    #       use curl -4 ip.sb, you will request IPv4 via proxy and get a IPv4 echo. And curl -6 ip.sb will request IPv6.
    #       This may solve some wierd full-cone problem if your are be your node support that. Sniffing will be disabled
    #       in this mode.
    # 2. "domain". Dial proxy using the domain from sniffing. This will relieve DNS pollution problem to a great extent
    #       if have impure DNS environment. Generally, this mode brings faster proxy response time because proxy will
    #       re-resolve the domain in remote, thus get better IP result to connect. This policy does not impact routing.
    #       That is to say, domain rewrite will be after traffic split of routing and dae will not re-route it.
    # 3. "domain+". Based on domain mode but do not check the reality of sniffed domain. It is useful for users whose
    #       DNS requests do not go through dae but want faster proxy response time. Notice that, if DNS requests do not
    #       go through dae, dae cannot split traffic by domain.
    # 4. "domain++". Based on domain+ mode but force to re-route traffic using sniffed domain to partially recover
    #       domain based traffic split ability. It doesn't work for direct traffic and consumes more CPU resources.
    dial_mode: domain

    # Allow insecure TLS certificates. It is not recommended to turn it on unless you have to.
    allow_insecure: false

    # Timeout to waiting for first data sending for sniffing. It is always 0 if dial_mode is ip. Set it higher is useful
    # in high latency LAN network.
    sniffing_timeout: 100ms

    # TLS implementation. tls is to use Go's crypto/tls. utls is to use uTLS, which can imitate browser's Client Hello.
    tls_implementation: tls

    # The Client Hello ID for uTLS to imitate. This takes effect only if tls_implementation is utls.
    # See more: https://github.com/daeuniverse/dae/blob/331fa23c16/component/outbound/transport/tls/utls.go#L17
    utls_imitate: chrome_auto
}

# See https://github.com/daeuniverse/dae/blob/main/docs/en/configuration/dns.md for full examples.
dns {
    upstream {
         # 这是上游adguardhome
        localdns: 'udp://127.0.0.1:1745'
    }
    routing {
        request {
            fallback: localdns
        }
        response {
            fallback: accept
        }
    }
}

# Node group (outbound).
group {
    proxy {
        # Filter nodes from the global node pool defined by the subscription and node section above.
        #filter: subtag(regex: '^my_', another_sub) && !name(keyword: 'ExpireAt:')

        # Filter nodes from the global node pool defined by tag.
        #filter: name(node1, node2)

        # Filter nodes and give a fixed latency offset to archive latency-based failover.
        # In this example, there is bigger possibility to choose US node even if original latency of US node is higher.
        filter: name(keyword: 'HK')
        #filter: name(US_node) [add_latency: -500ms]

        # Select the node with min average of the last 10 latencies from the group for every connection.
        policy: min_moving_avg
    }
}

# See https://github.com/daeuniverse/dae/blob/main/docs/en/configuration/routing.md for full examples.
routing {
    ### Preset rules.
    l4proto(udp) && dport(443) -> block
    pname(mosdns, dnsmasq) && l4proto(udp) && dport(53) -> must_direct

    dip(224.0.0.0/3, 'ff00::/8') -> direct
    dip(geoip:private) -> direct

    dip(223.5.5.5, 223.6.6.6) -> direct
    dip(8.8.8.8, 8.8.4.4) -> proxy
    domain(full: dns.alidns.com) -> direct
    domain(full: dns.googledns.com) -> proxy
    domain(full: dns.opendns.com) -> proxy
    domain(full: cloudflare-dns.com) -> proxy
    
    
    ########################## Must Direct Start #########################

    # Google GCM
    domain(suffix: mtalk.google.com) -> direct

    ########################## Must Direct End ############################

    ### GeoSite proxy

    # Goole Play
    domain(keyword: googleapis) -> proxy

    domain(geosite: linkedin) -> proxy
    domain(geosite: speedtest) -> proxy
    domain(geosite: yahoo) -> proxy
    domain(geosite: github) -> proxy
    domain(geosite: twitter) -> proxy
    domain(geosite: telegram) -> proxy
    domain(geosite: google) -> proxy
    domain(geosite: category-container) -> proxy
    domain(geosite: category-dev) -> proxy
    domain(geosite: google-scholar) -> proxy
    domain(geosite: category-scholar-!cn) -> proxy
    domain(geosite: category-cryptocurrency) -> proxy
    domain(geosite: geolocation-!cn) -> proxy

    ### GeoSite Direct

    domain(geosite: alibaba) -> direct
    domain(geosite: bilibili) -> direct
    domain(geosite: bilibili2) -> direct
    domain(geosite: tencent) -> direct
    domain(geosite: zhihu) -> direct
    domain(geosite: cloudflare-cn) -> direct
    domain(geosite: category-scholar-cn) -> direct
    domain(geosite: category-media-cn) -> direct
    domain(geosite: category-social-media-cn) -> direct
    domain(geosite: category-dev-cn) -> direct
    domain(geosite: category-bank-cn) -> direct
    domain(geosite: apple) -> direct
    domain(geosite: microsoft) -> direct
    domain(geosite: geolocation-cn) -> direct
    domain(geosite: cn) -> direct

    # GeoIP
    dip(geoip: cn) -> direct

    fallback: proxy
}

Anything else?

No response

@dae-prow
Copy link
Contributor

dae-prow bot commented Dec 3, 2023

Thanks for opening this issue!

@Basstorm
Copy link
Author

Basstorm commented Dec 3, 2023

❣️ This issue is marked as wontfix as you have not yet starred this repo. Please kindly consider giving a star to this repo. Your support means a lot to us. Thanks for your understanding. After you become a stargazer, please also reply to this message with the keyword understood. Afterward, I will reopen this issue for you. Once again, your support is much appreciated. Cheers.

understood

@mzz2017
Copy link
Contributor

mzz2017 commented Dec 4, 2023

啥节点

@Basstorm
Copy link
Author

Basstorm commented Dec 4, 2023

啥节点

机场ss

@mzz2017
Copy link
Contributor

mzz2017 commented Dec 4, 2023

@Basstorm 日志里怎么说,是不是在跑udp比方说bt下载

@Basstorm
Copy link
Author

Basstorm commented Dec 4, 2023

@Basstorm 日志里怎么说,是不是在跑udp比方说bt下载

没有任何bt下载相关的,倒是会有几个websocket长连接(binance网页版),日志里也没有udp流量

@ArnoChenFx
Copy link

我也遇到了

@phenixcxz
Copy link

同样有问题,用着用着内存爆炸

@Scirese
Copy link

Scirese commented Jun 6, 2024

仍然未修复
环境: OpenWRT 23.05-SNAPSHOT arm64 in lxc, Linux 6.10.0-rc2, daed v0.4.1
不过变成了现有的线程会无限消耗内存
image
image

@i-Eureka
Copy link

i-Eureka commented Sep 3, 2024

仍然未修复 环境: OpenWRT 23.05-SNAPSHOT arm64 in lxc, Linux 6.10.0-rc2, daed v0.4.1 不过变成了现有的线程会无限消耗内存 image image

想知道你是怎么突破主机对LXC容器内核权限的限制而运行dae的,LXC不是和主机共用内核吗,dae对内核的操作不会直接影响主机吗

@jschwinger233
Copy link
Member

现在最新的 dae 支持 reload 开启 pprof:

global {
    # Set non-zero value to enable pprof.
    pprof_port: 0
}

可以先检查 goroutine 是否有泄漏: 浏览器打开 http://localhost:$pprof_port/debug/pprof/goroutine?debug=2
然后检查堆对象: curl -s http://localhost:<port>/debug/pprof/heap > heap_profile.out && go tool pprof heap_profile.out 然后 top 看最大堆 (也可以动态看不需要 dump)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants
@jschwinger233 @ArnoChenFx @mzz2017 @phenixcxz @Basstorm @Scirese @i-Eureka and others