Skip to content

Conversation

@WuSiYu
Copy link
Collaborator

@WuSiYu WuSiYu commented Nov 20, 2025

use --enable_profiling=MODE to enable, currently support torch_profile and nvtx (use with NVIDIA Nsight system) mode

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @WuSiYu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances LightLLM's observability by adding comprehensive profiling capabilities. It allows developers to easily enable and control performance profiling across the entire distributed system, supporting both PyTorch's native profiler and NVIDIA NVTX. This feature is crucial for identifying performance bottlenecks and optimizing the execution flow of the LLM inference server.

Highlights

  • New Profiling Module: Introduced a new lightllm/utils/profiler.py module containing ProfilerCmd and ProcessProfiler classes to centralize profiling logic for torch_profiler and nvtx.
  • Command-Line Argument for Profiling: Added a --enable_profiling command-line argument to api_cli.py allowing users to specify torch_profiler or nvtx modes for system-wide profiling.
  • HTTP API for Dynamic Profiling Control: Implemented new HTTP GET endpoints, /profiler_start and /profiler_stop, in api_http.py to enable dynamic control over profiling sessions during runtime.
  • Distributed Profiling Command Propagation: Integrated profiling command propagation across various LightLLM components, including the HTTP server manager, router manager, visual server manager, and visual model RPCs, ensuring profiling commands are distributed throughout the system.
  • Profiler Integration in Core Components: Initialized ProcessProfiler instances within the router, visual server, and model backend components, enabling them to respond to profiling commands and manage their respective profiling activities.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces profiler support, which is a great feature for debugging and performance analysis. The changes span across the API layer, server managers, and model backends to enable torch_profiler and nvtx profiling modes. The implementation is comprehensive, adding new CLI arguments, API endpoints, and internal command propagation.

My review focuses on ensuring correctness and maintainability. I've identified a potential race condition in the model backend, a few typos in the CLI arguments and help text that could affect functionality and clarity, and some minor inconsistencies in API responses and type hints. I've also pointed out a busy-wait loop that could be optimized. Addressing these points will make the new profiler feature more robust and easier to maintain.

mode=self.args.enable_profiling,
name=f"lightllm-model_backend-node{self.node_rank}_dev{get_current_device_id()}",
)
self.profiling_active = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The self.profiling_active attribute is accessed by multiple threads (infer_loop_thread and infer_loop_thread1) without a lock, which can lead to a race condition. One thread might read an outdated value while another is writing to it. You should introduce a threading.Lock to protect access to this attribute.

Initialize it in init_model:

self.profiling_active = False
self.profiling_lock = threading.Lock()

Then use it when accessing self.profiling_active in _try_read_new_reqs and _read_reqs_buffer_and_init_reqs:

In _try_read_new_reqs:

with self.profiling_lock:
    if self.profiler.is_active != self.profiling_active:
        if self.profiling_active:
            self.profiler.start()
        else:
            self.profiler.stop()

In _read_reqs_buffer_and_init_reqs:

with self.profiling_lock:
    if obj.cmd == "start":
        self.profiling_active = True
    elif obj.cmd == "stop":
        self.profiling_active = False

if self.mode == "torch_profiler":
self._torch_profiler_start()
elif self.mode == "nvtx":
self._nxtx_start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a typo in the method name _nxtx_start. It should be _nvtx_start to match the NVTX profiling mode.

Suggested change
self._nxtx_start()
self._nvtx_start()

self._torch_profiler.start()
torch.cuda.synchronize()

def _nxtx_start(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a typo in the method name _nxtx_start. It should be _nvtx_start to match the NVTX profiling mode. This typo will cause a crash when using the nvtx profiler mode. This also needs to be corrected where it's called.

Suggested change
def _nxtx_start(self) -> None:
def _nvtx_start(self) -> None:

async def profiler_stop() -> Response:
if g_objs.args.enable_profiling:
await g_objs.httpserver_manager.profiler_cmd("stop")
return {"status": "ok"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency, it's better to explicitly use JSONResponse for success responses, just as you do for error responses.

Suggested change
return {"status": "ok"}
return JSONResponse({"status": "ok"})

await g_objs.httpserver_manager.profiler_cmd("stop")
return {"status": "ok"}
else:
return JSONResponse({"message": "Profiling support not enabled"}, status_code=500)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Returning a 500 Internal Server Error for a disabled feature can be misleading. A 400 Bad Request or 404 Not Found would be more appropriate as the feature is not available, which is not a server failure.

Suggested change
return JSONResponse({"message": "Profiling support not enabled"}, status_code=500)
return JSONResponse({"message": "Profiling support not enabled"}, status_code=400)


@dataclass
class ProfilerCmd:
cmd: str # "start" or "stop" for now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type hint for cmd is str, but the comment indicates it only accepts 'start' or 'stop'. Using typing.Literal would make this more explicit and type-safe.

Suggested change
cmd: str # "start" or "stop" for now
cmd: Literal["start", "stop"] # "start" or "stop" for now

)
elif self.mode == "nvtx":
self._nvtx_toplevel_mark = "LIGHTLLM_PROFILE"
self._nvtx_toplevel_id = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The instance variable self._nvtx_toplevel_id is initialized but never used. It can be safely removed to improve code clarity.

self._nvtx_toplevel_mark,
)
elif self.mode is not None:
assert False, "invalid profiler mode"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

assert False statements are removed when Python is run with the -O (optimize) flag, which can happen in production environments. It's better to raise an exception like ValueError or NotImplementedError for invalid profiler modes to ensure the error is always caught.

Suggested change
assert False, "invalid profiler mode"
raise ValueError("invalid profiler mode")

await g_objs.httpserver_manager.profiler_cmd("start")
return {"status": "ok"}
else:
return JSONResponse({"message": "Profiling support not enabled"}, status_code=500)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Returning a 500 Internal Server Error for a disabled feature can be misleading. A 400 Bad Request or 404 Not Found would be more appropriate as the feature is not available, which is not a server failure.

Suggested change
return JSONResponse({"message": "Profiling support not enabled"}, status_code=500)
return JSONResponse({"message": "Profiling support not enabled"}, status_code=400)

async def profiler_start() -> Response:
if g_objs.args.enable_profiling:
await g_objs.httpserver_manager.profiler_cmd("start")
return {"status": "ok"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency, it's better to explicitly use JSONResponse for success responses, just as you do for error responses.

Suggested change
return {"status": "ok"}
return JSONResponse({"status": "ok"})

use --enable_profiling=MODE to enable, currently support torch_profile
and nvtx (use with NVIDIA Nsight system) mode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant