Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enable rustflags = "-C force-frame-pointers=yes" default for opendal release process #3756

Closed
Zheaoli opened this issue Dec 14, 2023 · 24 comments

Comments

@Zheaoli
Copy link
Member

Zheaoli commented Dec 14, 2023

For now, if we want to use the perf or any performance analysis tool to analysis the performance of the opendal process. We will be in trouble because of the symbol issue

For example, I try to use perf to analysis the following code

import opendal
import uuid

op = opendal.Operator("fs", root="/tmp")

uuid_str=str(uuid.uuid4)

for _ in range(10000000):
    op.write(f"/{uuid_str}", [0]*64*1024*1024)
    op.read(f"/{uuid_str}")
    op.delete(f"/{uuid_str}")

The perf record is here

+   63.46%     0.00%  python   [unknown]                                 [.] 0x8b48fb894828ec83
+   43.07%    13.11%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] <&pyo3::types::iterator::PyIterator a
+   14.66%     6.43%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] pyo3::conversions::std::num::<impl py
+   13.72%    13.72%  python   libpython3.12.so.1.0                      [.] listiter_next
+   12.49%     0.00%  python   [unknown]                                 [.] 0000000000000000
+   11.24%     0.00%  python   [kernel.kallsyms]                         [k] asm_exc_page_fault
+   11.23%     0.00%  python   [kernel.kallsyms]                         [k] exc_page_fault
+   11.22%     4.01%  python   [kernel.kallsyms]                         [k] do_user_addr_fault
+   11.06%    11.06%  python   ld-linux-x86-64.so.2                      [.] _dl_update_slotinfo
+   10.81%     0.00%  python   libc.so.6                                 [.] __libc_start_call_main
+   10.81%     0.00%  python   libpython3.12.so.1.0                      [.] Py_BytesMain
+   10.81%     0.00%  python   libpython3.12.so.1.0                      [.] Py_RunMain
+   10.79%     0.00%  python   libpython3.12.so.1.0                      [.] pymain_run_python.constprop.0
+   10.79%     0.00%  python   libpython3.12.so.1.0                      [.] _PyRun_AnyFileObject
+   10.79%     0.00%  python   libpython3.12.so.1.0                      [.] _PyRun_SimpleFileObject
+   10.79%     0.00%  python   libpython3.12.so.1.0                      [.] run_mod
+   10.79%     0.00%  python   libpython3.12.so.1.0                      [.] run_eval_code_obj
+   10.79%     0.00%  python   libpython3.12.so.1.0                      [.] PyEval_EvalCode
+   10.79%     0.00%  python   [JIT] tid 210049                          [.] py::<module>:/root/opendal-demo/demo.
+   10.79%     0.00%  python   libpython3.12.so.1.0                      [.] _PyEval_EvalFrameDefault
+    8.14%     8.14%  python   ld-linux-x86-64.so.2                      [.] __tls_get_addr
+    6.88%     1.97%  python   libc.so.6                                 [.] __memmove_avx_unaligned_erms
+    6.81%     0.11%  python   [kernel.kallsyms]                         [k] handle_mm_fault
+    6.64%     6.64%  python   ld-linux-x86-64.so.2                      [.] update_get_addr
+    6.53%     0.23%  python   [kernel.kallsyms]                         [k] __handle_mm_fault
+    6.27%     0.03%  python   [kernel.kallsyms]                         [k] handle_pte_fault
+    5.17%     0.00%  python   [unknown]                                 [.] 0x0000000000000098

We can find that we can not trace the rust code.

Since I enable the rustflag, here's new perf result

+   92.23%     0.00%  python   libc.so.6                                 [.] __libc_start_call_main              ◆
+   92.23%     0.00%  python   libpython3.12.so.1.0                      [.] Py_BytesMain                        ▒
+   92.23%     0.00%  python   libpython3.12.so.1.0                      [.] Py_RunMain                          ▒
+   92.22%     0.00%  python   libpython3.12.so.1.0                      [.] pymain_run_python.constprop.0       ▒
+   92.22%     0.00%  python   libpython3.12.so.1.0                      [.] _PyRun_AnyFileObject                ▒
+   92.22%     0.00%  python   libpython3.12.so.1.0                      [.] _PyRun_SimpleFileObject             ▒
+   92.22%     0.00%  python   libpython3.12.so.1.0                      [.] run_mod                             ▒
+   92.22%     0.00%  python   libpython3.12.so.1.0                      [.] run_eval_code_obj                   ▒
+   92.22%     0.00%  python   libpython3.12.so.1.0                      [.] PyEval_EvalCode                     ▒
+   92.22%     0.00%  python   [JIT] tid 211989                          [.] py::<module>:/root/opendal-demo/demo▒
+   92.22%     0.00%  python   libpython3.12.so.1.0                      [.] _PyEval_EvalFrameDefault            ▒
+   88.98%     0.00%  python   libpython3.12.so.1.0                      [.] PyObject_Vectorcall                 ▒
+   88.98%     0.00%  python   libpython3.12.so.1.0                      [.] method_vectorcall_VARARGS_KEYWORDS  ▒
+   88.98%     0.00%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] opendal_python::operator::_::<impl p▒
+   88.98%     0.00%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] pyo3::impl_::trampoline::cfunction_w▒
+   88.98%     0.00%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] pyo3::impl_::trampoline::trampoline ▒
+   83.73%     0.00%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] std::panic::catch_unwind            ▒
+   83.73%     0.00%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] std::panicking::try                 ▒
+   83.73%     0.00%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] __rust_try                          ▒
+   83.73%     0.00%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] std::panicking::try::do_call        ▒
+   83.73%     0.00%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] pyo3::impl_::trampoline::trampoline:▒
+   83.73%     0.00%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] pyo3::impl_::trampoline::cfunction_w▒
+   83.73%     0.00%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] opendal_python::operator::_::<impl o▒
+   83.22%     0.00%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] pyo3::impl_::extract_argument::extra▒
+   83.22%     0.00%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] <T as pyo3::impl_::extract_argument:▒
+   83.22%     0.00%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] pyo3::types::any::PyAny::extract    ▒
+   83.22%     0.00%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] pyo3::types::sequence::<impl pyo3::c▒
+   81.47%     7.67%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] pyo3::types::sequence::extract_seque▒
+   50.67%     2.93%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] <&pyo3::types::iterator::PyIterator ▒
+   41.45%     0.43%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] pyo3::marker::Python::from_owned_ptr▒
+   40.37%     2.06%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] <T as pyo3::conversion::FromPyPointe▒
+   36.33%     1.75%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] pyo3::gil::register_owned           ▒
+   24.69%     3.91%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] std::thread::local::LocalKey<T>::try▒
+   16.31%     0.34%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] pyo3::types::any::PyAny::extract    ▒
+   13.54%     3.58%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] pyo3::conversions::std::num::<impl p▒
+   10.77%     1.33%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] pyo3::gil::register_owned::{{closure▒
+    8.00%     4.11%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] pyo3::gil::OWNED_OBJECTS::__getit   ▒
+    7.77%     3.46%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] std::thread::local::LocalKey<T>::try▒
+    5.24%     0.00%  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] core::ptr::drop_in_place<pyo3::gil::▒
+    4.94%     4.94%  python   ld-linux-x86-64.so.2                      [.] _dl_update_slotinfo        
....
     0.05%     0.00%             0  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] opendal_python::operator::_::<impl opendal_python::operator::Operator>::__pymethod_delete__
     0.05%     0.00%             0  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] opendal_python::operator::Operator::delete
     0.05%     0.00%             0  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] opendal::types::operator::blocking_operator::BlockingOperator::delete
     0.05%     0.00%             0  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] opendal::types::operator::operator_functions::FunctionDelete::call
     0.05%     0.00%             0  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] opendal::types::operator::operator_functions::OperatorFunction<T,R>::call
     0.05%     0.00%             0  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] core::ops::function::FnOnce::call_once
     0.05%     0.00%             0  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] opendal::types::operator::blocking_operator::BlockingOperator::delete_with::{{closure}}
     0.05%     0.00%             0  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] <alloc::sync::Arc<T> as opendal::raw::accessor::Accessor>::blocking_delete
     0.05%     0.00%             0  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] opendal::raw::layer::<impl opendal::raw::accessor::Accessor for L>::blocking_delete
     0.05%     0.00%             0  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] opendal::raw::layer::LayeredAccessor::blocking_delete
     0.05%     0.00%             6  python   [kernel.kallsyms]                         [k] ext4_block_write_begin
     0.05%     0.00%             0  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] opendal::raw::layer::<impl opendal::raw::accessor::Accessor for L>::blocking_delete
     0.05%     0.00%             0  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] <opendal::layers::complete::CompleteAccessor<A> as opendal::raw::layer::LayeredAccessor>::blocking_delete
     0.05%     0.00%             0  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] opendal::raw::layer::<impl opendal::raw::accessor::Accessor for L>::blocking_delete
     0.05%     0.00%             0  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] <opendal::layers::error_context::ErrorContextAccessor<A> as opendal::raw::layer::LayeredAccessor>::blocking_delete
     0.05%     0.00%             0  python   _opendal.cpython-312-x86_64-linux-gnu.so  [.] <opendal::services::fs::backend::FsBackend as opendal::raw::accessor::Accessor>::blocking_delete

And

image

@Xuanwo
Copy link
Member

Xuanwo commented Dec 14, 2023

Users interested in benchmarking can compile the OpenDAL Python bindings themselves. Is there any benefit to enabling this by default?

@Zheaoli
Copy link
Member Author

Zheaoli commented Dec 14, 2023

Users interested in benchmarking can compile the OpenDAL Python bindings themselves. Is there any benefit to enabling this by default?

I can use the perf directly when it is enabled by default. This would be great when we find the process issue. We perf it and don't need to recompile the whole package

@Xuanwo
Copy link
Member

Xuanwo commented Dec 14, 2023

I can use the perf directly when it is enabled by default. This would be great when we find the process issue. We perf it and don't need to recompile the whole package

perf is not a normal use cases. I believe the workflow should be:

  • perf during developement.
  • release the package with the best result.

@messense
Copy link
Member

I agree with @Zheaoli that enabling frame pointers is very useful, I'm open to enable it by default.

FYI, Ubuntu is also going to enable frame pointers by default soon: https://ubuntu.com/blog/ubuntu-performance-engineering-with-frame-pointers-by-default

@Xuanwo
Copy link
Member

Xuanwo commented Dec 14, 2023

Will this change affect the performance or binary size? And why rustc disabled it by default?

@Zheaoli
Copy link
Member Author

Zheaoli commented Dec 14, 2023

Will this change affect the performance or binary size? And why rustc disabled it by default?

  1. binary size will not be changed
  2. the performance will lose from 1% to 2% (reported by the Ubuntu community)

From the Ubuntu report

Lower Overhead: Unwinding with frame pointers is significantly cheaper than using DWARF or DWARF-derived information.
Debugging Accessibility: Even those new to profiling can access high-quality data, democratising the process of performance optimisation. It will allow bcc-tools, bpftrace, perf and other such tooling to work out of the box.

I think those feature will help our user to diagnose the process problem more easy

@Xuanwo
Copy link
Member

Xuanwo commented Dec 14, 2023

the performance will lose from 1% to 2%

Seems fine. Is there any other reason for rustc not enabling it by default?

@suyanhanx
Copy link
Member

suyanhanx commented Dec 14, 2023

@Xuanwo
Copy link
Member

Xuanwo commented Dec 14, 2023

I'm not very familiar with the differences between DWARF and frame-pointers. Does this mean I don't need to add debug=1 when building OpenDAL to generate symbols?

@Zheaoli
Copy link
Member Author

Zheaoli commented Dec 14, 2023

@Xuanwo
Copy link
Member

Xuanwo commented Dec 14, 2023

If we add force-frame-pointers=yes, can we remove #3759?

@Zheaoli
Copy link
Member Author

Zheaoli commented Dec 14, 2023

If we add force-frame-pointers=yes, can we remove #3759?

No! Without symbol, we can't do anything!

@xxchan
Copy link
Contributor

xxchan commented Dec 14, 2023

I'm also doubtful about adding too many random rustflags. However, searching GitHub, I found tikv also enables this by default, so it seems a reasonable choice. https://github.com/tikv/tikv/blob/f9727af132109754e63fbb4910b73563d0b1da45/Makefile#L37-L51

ref: tikv/tikv#12480


Edit: But it seems that they used this to solve another niche issue

This PR introduces a new version of pprof-rs, which provides stack backtracking based on frame pointers,

@xxchan
Copy link
Contributor

xxchan commented Dec 14, 2023

I'm not very familiar with the differences between DWARF and frame-pointers.

FYI

  1. manjusaka.blog/posts/2023/08/22/a-little-bit-idea-about-unwind
  2. maskray.me/blog/2020-11-08-stack-unwinding#%E4%B8%AD%E6%96%87%E7%89%88

These went to deep. 🤪

I think basically:

  • frame pointer is part of debuginfo.
  • DWARF is the format of debuginfo.
  • debuginfo contains mainly mapping between source code and machine code.
    • debug=full: can use debugger to inspect variables and stuff
    • debug=1: can get line number, function name. so often needed for backtrace (e.g. panic) and profiling

perf is not a normal use cases

Agree. But I think profiling performance issues in production is also needed sometimes? Although in my experience, debug=1 is enough for generating flamegraphs (I havn't used perf though). Don't know why frame pointer is needed for opendal (or perf?).

@Xuanwo
Copy link
Member

Xuanwo commented Dec 14, 2023

I don't have any major objections to this. @Zheaoli, would you be willing to submit a PR for this change?

@oowl
Copy link
Member

oowl commented Dec 14, 2023

Just curious, Does perf need to know the frame point to expand and obtain debugging information? Isn't dwarf not enough?

@xxchan
Copy link
Contributor

xxchan commented Dec 14, 2023

Just curious, Does perf need to know the frame point to expand and obtain debugging information? Isn't dwarf not enough?

Also was curious. But seems the ubuntu blog explained it. https://ubuntu.com/blog/ubuntu-performance-engineering-with-frame-pointers-by-default

@Zheaoli
Copy link
Member Author

Zheaoli commented Dec 14, 2023

frame pointer is not a symbol information, it is more like a call convention. It's not just only for performance issue. The DWARF is too complicated to use for the unwind(aka stack backtracing). People may need to take care of the CFI,CFA and so many other details about DWARF.

Perf has DWARF call-graph support already, but not enough for other tools.

For example, I want to trace the delete action on /tmp/abcabdasdas and find out which program deleted it and get the program call stack.

I can simply use bpftrace to reach my target to get the result like following below

        7f992b234ecb unlink+11 (/usr/lib/libc.so.6)
        7f9929635b58 _$LT$opendal..services..fs..backend..FsBackend$u20$as$u20$opendal..raw..accessor..Accessor$GT$::blocking_delete::h29be0d409c0ac8fc+488 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f99291ff3d0 _$LT$opendal..layers..error_context..ErrorContextAccessor$LT$A$GT$$u20$as$u20$opendal..raw..layer..LayeredAccessor$GT$::blocking_delete::heeb7b043f40b40d2+80 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f9929210be5 opendal::raw::layer::_$LT$impl$u20$opendal..raw..accessor..Accessor$u20$for$u20$L$GT$::blocking_delete::hfe8ff62d64908e8c+21 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992977c1b7 _$LT$opendal..layers..complete..CompleteAccessor$LT$A$GT$$u20$as$u20$opendal..raw..layer..LayeredAccessor$GT$::blocking_delete::hf8df1f73a18ab5ee+279 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992978c8c5 opendal::raw::layer::_$LT$impl$u20$opendal..raw..accessor..Accessor$u20$for$u20$L$GT$::blocking_delete::hbfd7055d767e037f+21 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f9929944768 opendal::raw::layer::LayeredAccessor::blocking_delete::h9a0cabb31d9c83d5+136 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992994cef5 opendal::raw::layer::_$LT$impl$u20$opendal..raw..accessor..Accessor$u20$for$u20$L$GT$::blocking_delete::hd0452586f494ff0d+21 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992919fa91 _$LT$alloc..sync..Arc$LT$T$GT$$u20$as$u20$opendal..raw..accessor..Accessor$GT$::blocking_delete::he4406d9b46c5d061+145 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992980ce55 opendal::types::operator::blocking_operator::BlockingOperator::delete_with::_$u7b$$u7b$closure$u7d$$u7d$::h131b101950b83511+197 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f99294ce721 core::ops::function::FnOnce::call_once::h8aac70541118a702+81 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992981f5e7 opendal::types::operator::operator_functions::OperatorFunction$LT$T$C$R$GT$::call::hdaa1b6b2b17fd7ba+87 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f99293ae76b opendal::types::operator::operator_functions::FunctionDelete::call::hb7b9fc6b7c5fd5d8+43 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f9929693d1a opendal::types::operator::blocking_operator::BlockingOperator::delete::h29d3ad415820c9c2+58 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f9929113e63 opendal_python::operator::Operator::delete::hd67488a4058bb5bb+35 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f9929117228 opendal_python::operator::_::_$LT$impl$u20$opendal_python..operator..Operator$GT$::__pymethod_delete__::h67da0638ee468daf+600 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f9929158456 pyo3::impl_::trampoline::fastcall_with_keywords::_$u7b$$u7b$closure$u7d$$u7d$::he1f3cfdfb5a96d74+54 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992915801c pyo3::impl_::trampoline::trampoline::_$u7b$$u7b$closure$u7d$$u7d$::h265ebee4038c7f2d+44 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992915a4bb std::panicking::try::do_call::h0d63c3720ecb8a94+43 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992915d45d __rust_try+29 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992915a3d6 std::panicking::try::h8d397070821fac39+86 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992915a075 std::panic::catch_unwind::hba46f6d4392f497e+21 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f9929157c66 pyo3::impl_::trampoline::trampoline::h8e658ba9b211a1e9+278 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f99290a526e pyo3::impl_::trampoline::fastcall_with_keywords::h1a57159770804871+78 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f99291148b0 opendal_python::operator::_::_$LT$impl$u20$pyo3..impl_..pyclass..PyMethods$LT$opendal_python..operator..Operator$GT$$u20$for$u20$pyo3..impl_..pyclass..PyClassImplCollector$LT$opendal_python..operator..Operator$GT$$GT$::py_methods::ITEMS::trampoline::h05ff33749dc99c1b+16 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992b581d7e method_vectorcall_FASTCALL_KEYWORDS+142 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b574488 PyObject_Vectorcall+88 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b50fad2 _PyEval_EvalFrameDefault+19106 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b690b9f PyEval_EvalCode+543 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b6ec4d8 run_eval_code_obj+88 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b6ec5fc run_mod+140 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b6ef3e7 _PyRun_SimpleFileObject+375 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b6ef9e5 _PyRun_AnyFileObject+69 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b716e80 Py_RunMain+2336 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b7174e2 Py_BytesMain+82 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b158cd0 0x7f992b158cd0 ([unknown])

I can also write a tracing program based on the frame pointer. Actually there are so many community tools based on the frame pointer, not the DWARF.

@oowl
Copy link
Member

oowl commented Dec 14, 2023

Just curious, Does perf need to know the frame point to expand and obtain debugging information? Isn't dwarf not enough?

Also was curious. But seems the ubuntu blog explained it. https://ubuntu.com/blog/ubuntu-performance-engineering-with-frame-pointers-by-default

Lower Overhead: Unwinding with frame pointers is significantly cheaper than using DWARF or DWARF-derived information.

I think this means cheaper for DWARF, but this does not mean can not, it just says it's harder than frame pointer. I don't know.

@Zheaoli
Copy link
Member Author

Zheaoli commented Dec 14, 2023

Just curious, Does perf need to know the frame point to expand and obtain debugging information? Isn't dwarf not enough?

Also was curious. But seems the ubuntu blog explained it. https://ubuntu.com/blog/ubuntu-performance-engineering-with-frame-pointers-by-default

Lower Overhead: Unwinding with frame pointers is significantly cheaper than using DWARF or DWARF-derived information.

I think this means cheaper for DWARF, but this does not mean can not, it just says it's harder than frame pointer. I don't know.

Yep, DWARF can do the same thing we do by using the frame pointer. But it's too complicated to use. So many tools are not supporting it yet.

@oowl
Copy link
Member

oowl commented Dec 14, 2023

frame pointer is not a symbol information, it is more like a call convention. It's not just only for performance issue. The DWARF is too complicated to use for the unwind(aka stack backtracing). People may need to take care of the CFI,CFA and so many other details about DWARF.

Perf has DWARF call-graph support already, but not enough for other tools.

For example, I want to trace the delete action on /tmp/abcabdasdas and find out which program deleted it and get the program call stack.

I can simply use bpftrace to reach my target to get the result like following below

        7f992b234ecb unlink+11 (/usr/lib/libc.so.6)
        7f9929635b58 _$LT$opendal..services..fs..backend..FsBackend$u20$as$u20$opendal..raw..accessor..Accessor$GT$::blocking_delete::h29be0d409c0ac8fc+488 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f99291ff3d0 _$LT$opendal..layers..error_context..ErrorContextAccessor$LT$A$GT$$u20$as$u20$opendal..raw..layer..LayeredAccessor$GT$::blocking_delete::heeb7b043f40b40d2+80 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f9929210be5 opendal::raw::layer::_$LT$impl$u20$opendal..raw..accessor..Accessor$u20$for$u20$L$GT$::blocking_delete::hfe8ff62d64908e8c+21 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992977c1b7 _$LT$opendal..layers..complete..CompleteAccessor$LT$A$GT$$u20$as$u20$opendal..raw..layer..LayeredAccessor$GT$::blocking_delete::hf8df1f73a18ab5ee+279 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992978c8c5 opendal::raw::layer::_$LT$impl$u20$opendal..raw..accessor..Accessor$u20$for$u20$L$GT$::blocking_delete::hbfd7055d767e037f+21 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f9929944768 opendal::raw::layer::LayeredAccessor::blocking_delete::h9a0cabb31d9c83d5+136 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992994cef5 opendal::raw::layer::_$LT$impl$u20$opendal..raw..accessor..Accessor$u20$for$u20$L$GT$::blocking_delete::hd0452586f494ff0d+21 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992919fa91 _$LT$alloc..sync..Arc$LT$T$GT$$u20$as$u20$opendal..raw..accessor..Accessor$GT$::blocking_delete::he4406d9b46c5d061+145 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992980ce55 opendal::types::operator::blocking_operator::BlockingOperator::delete_with::_$u7b$$u7b$closure$u7d$$u7d$::h131b101950b83511+197 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f99294ce721 core::ops::function::FnOnce::call_once::h8aac70541118a702+81 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992981f5e7 opendal::types::operator::operator_functions::OperatorFunction$LT$T$C$R$GT$::call::hdaa1b6b2b17fd7ba+87 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f99293ae76b opendal::types::operator::operator_functions::FunctionDelete::call::hb7b9fc6b7c5fd5d8+43 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f9929693d1a opendal::types::operator::blocking_operator::BlockingOperator::delete::h29d3ad415820c9c2+58 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f9929113e63 opendal_python::operator::Operator::delete::hd67488a4058bb5bb+35 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f9929117228 opendal_python::operator::_::_$LT$impl$u20$opendal_python..operator..Operator$GT$::__pymethod_delete__::h67da0638ee468daf+600 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f9929158456 pyo3::impl_::trampoline::fastcall_with_keywords::_$u7b$$u7b$closure$u7d$$u7d$::he1f3cfdfb5a96d74+54 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992915801c pyo3::impl_::trampoline::trampoline::_$u7b$$u7b$closure$u7d$$u7d$::h265ebee4038c7f2d+44 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992915a4bb std::panicking::try::do_call::h0d63c3720ecb8a94+43 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992915d45d __rust_try+29 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992915a3d6 std::panicking::try::h8d397070821fac39+86 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992915a075 std::panic::catch_unwind::hba46f6d4392f497e+21 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f9929157c66 pyo3::impl_::trampoline::trampoline::h8e658ba9b211a1e9+278 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f99290a526e pyo3::impl_::trampoline::fastcall_with_keywords::h1a57159770804871+78 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f99291148b0 opendal_python::operator::_::_$LT$impl$u20$pyo3..impl_..pyclass..PyMethods$LT$opendal_python..operator..Operator$GT$$u20$for$u20$pyo3..impl_..pyclass..PyClassImplCollector$LT$opendal_python..operator..Operator$GT$$GT$::py_methods::ITEMS::trampoline::h05ff33749dc99c1b+16 (/home/manjusaka/.pyenv/versions/3.12.0/envs/jupyter/lib/python3.12/site-packages/opendal/_opendal.cpython-312-x86_64-linux-gnu.so)
        7f992b581d7e method_vectorcall_FASTCALL_KEYWORDS+142 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b574488 PyObject_Vectorcall+88 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b50fad2 _PyEval_EvalFrameDefault+19106 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b690b9f PyEval_EvalCode+543 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b6ec4d8 run_eval_code_obj+88 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b6ec5fc run_mod+140 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b6ef3e7 _PyRun_SimpleFileObject+375 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b6ef9e5 _PyRun_AnyFileObject+69 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b716e80 Py_RunMain+2336 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b7174e2 Py_BytesMain+82 (/home/manjusaka/.pyenv/versions/3.12.0/lib/libpython3.12.so.1.0)
        7f992b158cd0 0x7f992b158cd0 ([unknown])

I can also write a tracing program based on the frame pointer. Actually there are so many community tools based on the frame pointer, not the DWARF.

Thanks, that‘s answer my question. 🚀

@YangKeao
Copy link

YangKeao commented Dec 15, 2023

I think this means cheaper for DWARF, but this does not mean can not, it just says it's harder than frame pointer. I don't know.

But it's too complicated to use. So many tools are not supporting it yet.

Not only too complicated. The DWARF information can be seen as instructions of a stack virtual machine, which can have endless-loop or any harmful logic, so it's unsafe to evaluate these instructions inside kernel or any other serious scenario.

perf also doesn't work perfectly with DWARF. As perf cannot safely handle the DWARF information in kernel at sampling time. perf record has to copy and store the stack for each sample on disk (by default 8KB, and maximum 64KB, which is actually not big enough so some frames are lost) . Therefore, the perf data is really big, and profiling over-head is also significant, which can harm the profiling accuracy.

Also, the dependent libraries may have fault DWARF information, as it's hard to generate a correct DWARF information for every instructions. From my experience, the VDSO on ubuntu 18.04, some system libraries on Mac OS can have fault information, which will break the profilers (as the profilers may want to read some unexpected address while evaluating the DWARF instructions).

The LLVM may also generate wrong dwarf information e.g. rust-lang/rust#83139 . I think these problems come from the complexity of DWARF and it's hard to overcome.

BTW, if you are about to enable frame-pointer, remember to also rebuild standard libraries of rust (with --build-std flag of cargo). The pre-build rust standard libraries don't ship with frame-pointer. You can reference tikv/tikv#12480 .

Edit: But it seems that they used this to solve another niche issue

This PR introduces a new version of pprof-rs, which provides stack backtracking based on frame pointers,

TiKV has a continuous profiling feature, which will grab a profile periodically for the developers to understand the system states in history. It uses the https://github.com/tikv/pprof-rs to implement this feature, which actually supports both DWARF and frame-pointer. However, it turns out that it's much stabler to unwind through frame-pointer than DWARF, so we finally migrated to use frame-pointer.

If you have more interests in profiling / sampling, I had a share about the technical choices in profilers in PingCAP internally, and here is the slides and recorded video (in Chinese). It also describes the choices between DWARF and frame-pointer.

(Sorry I mis-configured the priviledges. These docs are shared to anyone who knows the url now.)

@Xuanwo
Copy link
Member

Xuanwo commented Dec 15, 2023

If you have more interests in profiling / sampling, I had a share about the technical choices in profilers in PingCAP internally, and here is the slides and recorded video (in Chinese). It also describes the choices between DWARF and frame-pointer.

Fantastic! Thank you so much for sharing!

@Xuanwo
Copy link
Member

Xuanwo commented Dec 19, 2023

Implemented in #3772.

Thank you all!

@Xuanwo Xuanwo closed this as completed Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants